HyukjinKwon commented on code in PR #36083: URL: https://github.com/apache/spark/pull/36083#discussion_r850007755
########## python/docs/source/user_guide/pandas_on_spark/supported_pandas_api.rst: ########## @@ -0,0 +1,1320 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + + +===================== +Supported pandas APIs +===================== + +.. currentmodule:: pyspark.pandas + +The following table shows the pandas APIs that implemented or non-implemented from pandas API on +Spark. + +Some pandas APIs do not implement full parameters, so the third column shows missing parameters for +each API. + +If there is non-implemented pandas API or parameter you want, you can create an `Apache Spark +JIRA <https://issues.apache.org/jira/projects/SPARK/summary>`__ to request or to contribute by your +own. + +The API list is updated based on the `latest pandas official API +reference <https://pandas.pydata.org/docs/reference/index.html#>`__. + +Supported I/O APIs +------------------ + +In general, the ``read_*`` APIs can be very expensive if the ``index_col`` parameter is not +specified since it attaches the default index. + +In addition, the index will be lost is the ``index_col`` parameter is not specified for +``DataFrame.to_*`` APIs. + ++----------------------+----------+---------------------------------------------------------------+ +| API | Imp | Missing parameters | +| | lemented | | ++======================+==========+===============================================================+ +| read_pickle | X | | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_pickle | X | | ++----------------------+----------+---------------------------------------------------------------+ +| read_table | O | | ++----------------------+----------+---------------------------------------------------------------+ +| read_csv | O | ``converters``, ``true_values``, ``false_values``, | +| | | ``skipinitialspace``, ``skiprows`` and more. See the | +| | | `pandas.read_csv <https:// | +| | | pandas.pydata.org/docs/reference/api/pandas.read_csv.html>`__ | +| | | and | +| | | `pyspark.pand | +| | | as.read_csv <https://spark.apache.org/docs/latest/api/python/ | +| | | reference/pyspark.pandas/api/pyspark.pandas.read_csv.html>`__ | +| | | for detail. | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_csv | O | | ++----------------------+----------+---------------------------------------------------------------+ +| read_fwf | X | | ++----------------------+----------+---------------------------------------------------------------+ +| read_clipboard | O | | ++----------------------+----------+---------------------------------------------------------------+ +| Da | O | | +| taFrame.to_clipboard | | | ++----------------------+----------+---------------------------------------------------------------+ +| read_excel | O | ``skiprows``, ``na_filter``, ``decimal``, ``skipfooter``, | +| | | ``storage_options`` | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_excel | O | ``storage_options`` | ++----------------------+----------+---------------------------------------------------------------+ +| read_json | O | ``orient``, ``typ``, ``dtype``, ``convert_axes``, | +| | | ``convert_dates`` and more. See the | +| | | `pandas.read_json <https://p | +| | | andas.pydata.org/docs/reference/api/pandas.read_json.html>`__ | +| | | and | +| | | `pyspark.pandas | +| | | .read_json <https://spark.apache.org/docs/latest/api/python/r | +| | | eference/pyspark.pandas/api/pyspark.pandas.read_json.html>`__ | +| | | for detail. | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_json | O | ``date_format``, ``double_precision``, ``force_ascii``, | +| | | ``date_unit``, ``default_handler`` and more. See the | +| | | `pandas.DataFrame.to_json <https://pandas.py | +| | | data.org/docs/reference/api/pandas.DataFrame.to_json.html>`__ | +| | | and | +| | | `pyspark.pandas.to_js | +| | | on <https://spark.apache.org/docs/latest/api/python/reference | +| | | /pyspark.pandas/api/pyspark.pandas.DataFrame.to_json.html>`__ | +| | | for detail. | ++----------------------+----------+---------------------------------------------------------------+ +| read_html | O | | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_html | O | ``encoding`` | ++----------------------+----------+---------------------------------------------------------------+ +| read_xml | X | | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_xml | X | | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_latex | O | ``caption``, ``label``, ``position`` | ++----------------------+----------+---------------------------------------------------------------+ +| read_hdf | X | | ++----------------------+----------+---------------------------------------------------------------+ +| read_feather | X | | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_feather | X | | ++----------------------+----------+---------------------------------------------------------------+ +| read_parquet | O | ``engine``, ``storage_options``, ``use_nullable_dtypes`` | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_parquet | O | ``engine``, ``storage_options`` | ++----------------------+----------+---------------------------------------------------------------+ +| read_orc | O | | ++----------------------+----------+---------------------------------------------------------------+ +| read_sas | X | | ++----------------------+----------+---------------------------------------------------------------+ +| read_spss | X | | ++----------------------+----------+---------------------------------------------------------------+ +| read_sql_table | O | ``coerce_float``, ``parse_dates``, ``chunksize`` | ++----------------------+----------+---------------------------------------------------------------+ +| read_sql_query | O | ``coerce_float``, ``params``, ``parse_dates``, ``chunksize``, | +| | | ``dtype`` | ++----------------------+----------+---------------------------------------------------------------+ +| read_sql | O | ``coerce_float``, ``params``, ``parse_dates``, ``chunksize`` | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_sql | X | | ++----------------------+----------+---------------------------------------------------------------+ +| read_gbq | X | | ++----------------------+----------+---------------------------------------------------------------+ +| read_stata | X | | ++----------------------+----------+---------------------------------------------------------------+ +| DataFrame.to_stata | X | | ++----------------------+----------+---------------------------------------------------------------+ + +Supported General Function APIs +------------------------------- + +=============== =========== =============================================================== +API Implemented Missing parameters +=============== =========== =============================================================== +melt O ``col_level``, ``ignore_index`` +pivot X +pivot_table X +crosstab X +cut X +qcut X +merge O ``copy``, ``indicator``, ``validate`` +merge_ordered X +merge_asof O +concat O ``keys``, ``levels``, ``names``, ``verify_integrity``, ``copy`` +get_dummies O +factorize X +unique X +wide_to_long X +isna O +isnull O +notna O +notnull O +to_numeric O ``errors``, ``downcast`` +to_datetime O ``dayfirst``, ``yearfirst``, ``utc``, ``exact`` +date_range O +bdate_range X +period_range X +timedelta_range O +infer_freq X +interval_range X +eval X +=============== =========== =============================================================== + +Supported Series APIs +--------------------- + ++-----------------------+----------+--------------------------------------------------------------+ +| API | Imp | Missing parameters | +| | lemented | | ++=======================+==========+==============================================================+ +| T | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| abs | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| add | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| add_prefix | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| add_suffix | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| agg | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| aggregate | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| align | O | ``level``, ``fill_value``, ``method``, ``limit``, | +| | | ``fill_axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| all | O | ``bool_only``, ``skipna``, ``level`` | ++-----------------------+----------+--------------------------------------------------------------+ +| any | O | ``bool_only``, ``skipna``, ``level`` | ++-----------------------+----------+--------------------------------------------------------------+ +| append | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| apply | O | ``convert_dtype`` | ++-----------------------+----------+--------------------------------------------------------------+ +| argmax | O | ``axis``, ``skipna`` | ++-----------------------+----------+--------------------------------------------------------------+ +| argmin | O | ``axis``, ``skipna`` | ++-----------------------+----------+--------------------------------------------------------------+ +| argsort | O | ``axis``, ``kind``, ``order`` | ++-----------------------+----------+--------------------------------------------------------------+ +| array | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| asfreq | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| asof | O | ``subset`` | ++-----------------------+----------+--------------------------------------------------------------+ +| astype | O | ``copy``, ``errors`` | ++-----------------------+----------+--------------------------------------------------------------+ +| at | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| at_time | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| attrs | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| autocorr | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| axes | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| backfill | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| between | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| between_time | O | ``inclusive`` | ++-----------------------+----------+--------------------------------------------------------------+ +| bfill | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| bool | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| cat | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| clip | O | ``axis``, ``inplace`` | ++-----------------------+----------+--------------------------------------------------------------+ +| combine | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| combine_first | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| compare | O | ``align_axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| convert_dtypes | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| copy | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| corr | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| count | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| cov | O | ``ddof`` | ++-----------------------+----------+--------------------------------------------------------------+ +| cummax | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| cummin | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| cumprod | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| cumsum | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| describe | O | ``include``, ``exclude``, ``datetime_is_numeric`` | ++-----------------------+----------+--------------------------------------------------------------+ +| diff | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| div | O | ``fill_value``, ``level`` | ++-----------------------+----------+--------------------------------------------------------------+ +| divide | O | ``fill_value``, ``level`` | ++-----------------------+----------+--------------------------------------------------------------+ +| divmod | O | ``fill_value``, ``level`` | ++-----------------------+----------+--------------------------------------------------------------+ +| dot | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| drop | O | ``columns``, ``inplace``, ``errors`` | ++-----------------------+----------+--------------------------------------------------------------+ +| drop_duplicates | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| droplevel | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| dropna | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| dt | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| dtype | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| dtypes | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| duplicated | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| empty | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| eq | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| equals | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| ewm | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| expanding | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| explode | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| factorize | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| ffill | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| fillna | O | ``downcast`` | ++-----------------------+----------+--------------------------------------------------------------+ +| filter | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| first | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| first_valid_index | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| flags | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| floordiv | O | ``fill_value``, ``level`` | ++-----------------------+----------+--------------------------------------------------------------+ +| ge | O | ``fill_value``, ``level`` | ++-----------------------+----------+--------------------------------------------------------------+ +| get | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| groupby | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| gt | O | ``fill_value``, ``level`` | ++-----------------------+----------+--------------------------------------------------------------+ +| hasnans | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| head | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| hist | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| iat | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| idxmax | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| idxmin | O | ``axis`` | ++-----------------------+----------+--------------------------------------------------------------+ +| iloc | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| index | O | | ++-----------------------+----------+--------------------------------------------------------------+ +| infer_objects | X | | ++-----------------------+----------+--------------------------------------------------------------+ +| interpolate | X | | Review Comment: Let's keep the PR up to date. e.g.) this is implemented https://github.com/apache/spark/commit/30fc0ba2307dcb71de1ebf785b96b4e5f05d96bd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
