beobest2 commented on PR #36729: URL: https://github.com/apache/spark/pull/36729#issuecomment-1141632078
@HyukjinKwon The current 'supported API generation' function dynamically compares the modules of `PySpark.pandas` and `pandas` to find the difference. At this time, the inherited class is also aggregated, and the link is not generated correctly (such as `CategoricalIndex.all()`) because it does not match the pattern of each API document. ex> <img width="779" alt="Screen Shot 2022-05-30 at 11 27 55 PM" src="https://user-images.githubusercontent.com/7010554/171086960-0a7c9465-7366-4d0f-a823-a0826e2512ab.png"> ``` .../reference/pyspark.pandas/api/pyspark.pandas.CategoricalIndex.add_categories.html >> exists .../reference/pyspark.pandas/api/pyspark.pandas.CategoricalIndex.all.html >> not exists ``` So, I thought about the options below: 1. Creates by excluding methods that exist in the parent class. - For example, in the list of CategoricalIndex class, the list of functions available by inheriting the Index function (methods of base class) is removed. 2. Includes all methods, and creates a document link to the parent class by identifying whether a document corresponding to the path of the parent class exists. - In my opinion, the part of "determining whether the corresponding document exists" seems to be difficult, and option 1 seems appropriate because the existing pandas documentation does not document all methods of the parent class. (ex> https://pandas.pydata.org/docs/reference/api/pandas.CategoricalIndex.categories.html?highlight=category) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
