[GitHub] [spark] beobest2 commented on pull request #36729: [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API suppo…

GitBox Mon, 30 May 2022 20:30:57 -0700


beobest2 commented on PR #36729:
URL: https://github.com/apache/spark/pull/36729#issuecomment-1141632078


   @HyukjinKwon The current 'supported API generation' function dynamically 
compares the modules of `PySpark.pandas` and `pandas` to find the difference. 
At this time, the inherited class is also aggregated, and the link is not 
generated correctly (such as `CategoricalIndex.all()`) because it does not 
match the pattern of each API document. 
   
   ex>
   <img width="779" alt="Screen Shot 2022-05-30 at 11 27 55 PM" 
src="https://user-images.githubusercontent.com/7010554/171086960-0a7c9465-7366-4d0f-a823-a0826e2512ab.png";>
   
   ```
   
.../reference/pyspark.pandas/api/pyspark.pandas.CategoricalIndex.add_categories.html
       >> exists
   .../reference/pyspark.pandas/api/pyspark.pandas.CategoricalIndex.all.html
       >> not exists 
   ```
   
   So, I thought about the options below:
   
   1. Creates by excluding methods that exist in the parent class.
    
      - For example, in the list of CategoricalIndex class, the list of 
functions available by inheriting the Index function (methods of base class) is 
removed.
   
   2. Includes all methods, and creates a document link to the parent class by 
identifying whether a document corresponding to the path of the parent class 
exists.
   
       - In my opinion, the part of "determining whether the corresponding 
document exists" seems to be difficult, and option 1 seems appropriate because 
the existing pandas documentation does not document all methods of the parent 
class. (ex> 
https://pandas.pydata.org/docs/reference/api/pandas.CategoricalIndex.categories.html?highlight=category)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] beobest2 commented on pull request #36729: [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API suppo…

Reply via email to