[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36083: [SPARK-38581][PYTHON][DOCS] List of supported pandas APIs for pandas-on-Spark docs.

GitBox Wed, 13 Apr 2022 18:15:45 -0700


HyukjinKwon commented on code in PR #36083:
URL: https://github.com/apache/spark/pull/36083#discussion_r850008434



##########
python/docs/source/user_guide/pandas_on_spark/supported_pandas_api.rst:
##########
@@ -0,0 +1,1320 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+=====================
+Supported pandas APIs
+=====================
+
+.. currentmodule:: pyspark.pandas
+
+The following table shows the pandas APIs that implemented or non-implemented 
from pandas API on
+Spark.
+
+Some pandas APIs do not implement full parameters, so the third column shows 
missing parameters for
+each API.
+
+If there is non-implemented pandas API or parameter you want, you can create 
an `Apache Spark
+JIRA <https://issues.apache.org/jira/projects/SPARK/summary>`__ to request or 
to contribute by your
+own.
+
+The API list is updated based on the `latest pandas official API
+reference <https://pandas.pydata.org/docs/reference/index.html#>`__.
+
+Supported I/O APIs
+------------------
+
+In general, the ``read_*`` APIs can be very expensive if the ``index_col`` 
parameter is not
+specified since it attaches the default index.
+
+In addition, the index will be lost is the ``index_col`` parameter is not 
specified for
+``DataFrame.to_*`` APIs.
+
++----------------------+----------+---------------------------------------------------------------+
+| API                  | Imp      | Missing parameters                         
                   |
+|                      | lemented |                                            
                   |
++======================+==========+===============================================================+
+| read_pickle          | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_pickle  | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_table           | O        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_csv             | O        | ``converters``, ``true_values``, 
``false_values``,            |
+|                      |          | ``skipinitialspace``, ``skiprows`` and 
more. See the          |
+|                      |          | `pandas.read_csv <https://                 
                   |
+|                      |          | 
pandas.pydata.org/docs/reference/api/pandas.read_csv.html>`__ |
+|                      |          | and                                        
                   |
+|                      |          | `pyspark.pand                              
                   |
+|                      |          | as.read_csv 
<https://spark.apache.org/docs/latest/api/python/ |
+|                      |          | 
reference/pyspark.pandas/api/pyspark.pandas.read_csv.html>`__ |
+|                      |          | for detail.                                
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_csv     | O        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_fwf             | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_clipboard       | O        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| Da                   | O        |                                            
                   |
+| taFrame.to_clipboard |          |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_excel           | O        | ``skiprows``, ``na_filter``, ``decimal``, 
``skipfooter``,     |
+|                      |          | ``storage_options``                        
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_excel   | O        | ``storage_options``                        
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_json            | O        | ``orient``, ``typ``, ``dtype``, 
``convert_axes``,             |
+|                      |          | ``convert_dates`` and more. See the        
                   |
+|                      |          | `pandas.read_json <https://p               
                   |
+|                      |          | 
andas.pydata.org/docs/reference/api/pandas.read_json.html>`__ |
+|                      |          | and                                        
                   |
+|                      |          | `pyspark.pandas                            
                   |
+|                      |          | .read_json 
<https://spark.apache.org/docs/latest/api/python/r |
+|                      |          | 
eference/pyspark.pandas/api/pyspark.pandas.read_json.html>`__ |
+|                      |          | for detail.                                
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_json    | O        | ``date_format``, ``double_precision``, 
``force_ascii``,       |
+|                      |          | ``date_unit``, ``default_handler`` and 
more. See the          |
+|                      |          | `pandas.DataFrame.to_json 
<https://pandas.py                  |
+|                      |          | 
data.org/docs/reference/api/pandas.DataFrame.to_json.html>`__ |
+|                      |          | and                                        
                   |
+|                      |          | `pyspark.pandas.to_js                      
                   |
+|                      |          | on 
<https://spark.apache.org/docs/latest/api/python/reference |
+|                      |          | 
/pyspark.pandas/api/pyspark.pandas.DataFrame.to_json.html>`__ |
+|                      |          | for detail.                                
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_html            | O        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_html    | O        | ``encoding``                               
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_xml             | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_xml     | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_latex   | O        | ``caption``, ``label``, ``position``       
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_hdf             | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_feather         | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_feather | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_parquet         | O        | ``engine``, ``storage_options``, 
``use_nullable_dtypes``      |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_parquet | O        | ``engine``, ``storage_options``            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_orc             | O        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_sas             | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_spss            | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_sql_table       | O        | ``coerce_float``, ``parse_dates``, 
``chunksize``              |
++----------------------+----------+---------------------------------------------------------------+
+| read_sql_query       | O        | ``coerce_float``, ``params``, 
``parse_dates``, ``chunksize``, |
+|                      |          | ``dtype``                                  
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_sql             | O        | ``coerce_float``, ``params``, 
``parse_dates``, ``chunksize``  |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_sql     | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_gbq             | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| read_stata           | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+| DataFrame.to_stata   | X        |                                            
                   |
++----------------------+----------+---------------------------------------------------------------+
+
+Supported General Function APIs
+-------------------------------
+
+=============== =========== 
===============================================================

Review Comment:
   Can we have the same style of chart? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36083: [SPARK-38581][PYTHON][DOCS] List of supported pandas APIs for pandas-on-Spark docs.

Reply via email to