[GitHub] [spark] beobest2 commented on a diff in pull request #36509: [SPARK-38961][PYTHON][DOCS] Enhance to automatically generate the the pandas API support list

GitBox Thu, 12 May 2022 19:21:23 -0700


beobest2 commented on code in PR #36509:
URL: https://github.com/apache/spark/pull/36509#discussion_r871947591



##########
python/pyspark/pandas/supported_api_gen.py:
##########
@@ -0,0 +1,363 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Generate 'Supported pandas APIs' documentation file
+"""
+import os
+from enum import Enum, unique
+from inspect import getmembers, isclass, isfunction, signature
+from typing import Any, Callable, Dict, List, Set, TextIO, Tuple
+
+import pyspark.pandas as ps
+import pyspark.pandas.groupby as psg
+import pyspark.pandas.window as psw
+from pyspark.find_spark_home import _find_spark_home
+from pyspark.sql.pandas.utils import require_minimum_pandas_version
+
+import pandas as pd
+import pandas.core.groupby as pdg
+import pandas.core.window as pdw
+
+MAX_MISSING_PARAMS_SIZE = 5
+COMMON_PARAMETER_SET = {"kwargs", "args", "cls"}
+MODULE_GROUP_MATCH = [(pd, ps), (pdw, psw), (pdg, psg)]
+
+SPARK_HOME = _find_spark_home()
+TARGET_RST_FILE = os.path.join(
+    SPARK_HOME, 
"python/docs/source/user_guide/pandas_on_spark/supported_pandas_api.rst"
+)
+RST_HEADER = """
+=====================
+Supported pandas APIs
+=====================
+
+.. currentmodule:: pyspark.pandas
+
+The following table shows the pandas APIs that implemented or non-implemented 
from pandas API on
+Spark.
+
+Some pandas APIs do not implement full parameters, so the third column shows 
missing parameters for
+each API.
+
+'Y' in the second column means it's implemented including its whole parameter.
+'N' means it's not implemented yet.
+'P' means it's partially implemented with the missing of some parameters.
+
+If there is non-implemented pandas API or parameter you want, you can create 
an `Apache Spark
+JIRA <https://issues.apache.org/jira/projects/SPARK/summary>`__ to request or 
to contribute by your
+own.
+
+The API list is updated based on the `latest pandas official API
+reference <https://pandas.pydata.org/docs/reference/index.html#>`__.
+
+All implemented APIs listed here are distributed except the ones that requires 
the local
+computation by design. For example, `DataFrame.to_numpy() 
<https://spark.apache.org
+/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.
+to_numpy.html>`__ requires to collect the data to the driver side.
+
+"""
+
+
+@unique
+class Implemented(Enum):
+    IMPLEMENTED = "Y"
+    NOT_IMPLEMENTED = "N"
+    PARTIALLY_IMPLEMENTED = "P"
+
+
+class SupportedStatus:
+    """
+    SupportedStatus class that defines a supported status for a specific 
pandas API
+    """
+
+    def __init__(self, implemented: str, missing: str = ""):
+        self.implemented = implemented
+        self.missing = missing
+
+
+def generate_supported_api() -> None:
+    """
+    Generate supported APIs status dictionary.
+
+    Write supported APIs documentation.
+    """
+    require_minimum_pandas_version()

Review Comment:
   I've filed a child JIRA (https://issues.apache.org/jira/browse/SPARK-39170) 
for the future and fixed it into a warning for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beobest2 commented on a diff in pull request #36509: [SPARK-38961][PYTHON][DOCS] Enhance to automatically generate the the pandas API support list

Reply via email to