itholic opened a new pull request #31280:
URL: https://github.com/apache/spark/pull/31280


   ### What changes were proposed in this pull request?
   
   This PR supplements the contents in the "Python Package Management".
   
   If there is no Python installed in the local for all nodes when using 
`venv-pack`, job would fail as below.
   
   ```python
   >>> from pyspark.sql.functions import pandas_udf
   >>> @pandas_udf('double')
   ... def pandas_plus_one(v: pd.Series) -> pd.Series:
   ...     return v + 1
   ...
   >>> spark.range(10).select(pandas_plus_one("id")).show()
   ...
   Cannot run program "./environment/bin/python": error=2, No such file or 
directory
   ...
   ```
   
   This is because the Python in the [packed environment via `venv-pack` has a 
symbolic link](https://github.com/jcrist/venv-pack/issues/5) that connects 
Python to the local one.
   
   To avoid this confusion, it seems better to have an additional explanation 
for this.
    
   
   ### Why are the changes needed?
   
   To provide more detailed information to users so that they don’t get confused
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, this PR fixes the part of "Python Package Management"  in the "User 
Guide" documents.
   
   ### How was this patch tested?
   
   Manually built the doc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to