HyukjinKwon opened a new pull request, #41278:
URL: https://github.com/apache/spark/pull/41278

   ### What changes were proposed in this pull request?
   
   This PR proposes to add the support of pyfiles (`.zip`, `.py`, `.jar`, 
`.egg` files) in `SparkSession.addArtifacts`.
   
   ### Why are the changes needed?
   
   In order for end users to add the dependencies in Python Spark Connect 
client.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it adds the support of pyfiles (`.zip`, `.py`, `.jar`, `.egg` files) in 
`SparkSession.addArtifacts`.
   
   ### How was this patch tested?
   
   Manually tested via `local-cluster`.
   
   ```bash
   ./sbin/start-connect-server.sh --jars `ls 
connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` --master 
"local-cluster[2,2,1024]"
   ./bin/pyspark --remote "sc://localhost:15002"
   ```
   
   ```python
   import os
   import tempfile
   from pyspark.sql.functions import udf
   import shutil
   
   with tempfile.TemporaryDirectory() as d:
       package_path = os.path.join(d, "my_zipfile")
       os.mkdir(package_path)
       pyfile_path = os.path.join(package_path, "__init__.py")
       with open(pyfile_path, "w") as f:
           _ = f.write("my_func = lambda: 5")
       shutil.make_archive(package_path, 'zip', d, "my_zipfile")
       @udf("long")
       def func(x):
           import my_zipfile
           return my_zipfile.my_func()
       spark.addArtifacts(f"{package_path}.zip", pyfile=True)
       spark.range(1).select(func("id")).show()
   
   ```
   
   Also added a couple of unittests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to