Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20089#discussion_r158796151
  
    --- Diff: python/README.md ---
    @@ -29,4 +29,4 @@ The Python packaging for Spark is not intended to replace 
all of the other use c
     
     ## Python Requirements
     
    -At its core PySpark depends on Py4J (currently version 0.10.6), but 
additional sub-packages have their own requirements (including numpy and 
pandas).
    +At its core PySpark depends on Py4J (currently version 0.10.6), but 
additional sub-packages might have their own requirements declared as "Extras" 
(including numpy, pandas, and pyarrow). You can install the requirements by 
specifying their extra names.
    --- End diff --
    
    Ah, I see. How about simply:
    
    ```
    At its core PySpark depends on Py4J (currently version 0.10.6), but some 
additional sub-packages have their own 
    extra requirements for some features (including numpy, pandas, and pyarrow).
    ```
    
    for now? I just noticed we are a bit unclear on this (e.g., actually I have 
been under impression that NumPy is required for ML/MLlib so far) but I think 
this roughly describes it correctly and is good enough.
    
    Will maybe try to make a followup to fully describe it later. This PR 
targets PyArrow anyway.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to