Re: Add python library

2020-06-08 Thread Patrick McCarthy
I've found Anaconda encapsulates modules and dependencies and such nicely, and you can deploy all the needed .so files and such by deploying a whole conda environment. I've used this method with success: https://community.cloudera.com/t5/Community-Articles/Running-PySpark-with-Conda-Env/ta-p/24755

Re: Add python library with native code

2020-06-06 Thread Stone Zhong
Great, thank you Masood, will look into it. Regards, Stone On Fri, Jun 5, 2020 at 7:47 PM Masood Krohy wrote: > Not totally sure it's gonna help your use case, but I'd recommend that you > consider these too: > >- pex A library and tool for >generati

Re: Add python library with native code

2020-06-05 Thread Masood Krohy
Not totally sure it's gonna help your use case, but I'd recommend that you consider these too: * pex A library and tool for generating .pex (Python EXecutable) files * cluster-pack cluster-pack is a library on t

Re: Add python library with native code

2020-06-05 Thread Stone Zhong
Thanks Dark. Looked at that article. I think the article described approach B, let me summary both approach A and approach B A) Put libraries in a network share, mount on each node, and in your code, manually set PYTHONPATH B) In your code, manually install the necessary package using "pip install

Re: Add python library with native code

2020-06-05 Thread Dark Crusader
Hi Stone, Have you looked into this article? https://medium.com/@SSKahani/pyspark-applications-dependencies-99415e0df987 I haven't tried it with .so files however I did use the approach he recommends to install my other dependencies. I Hope it helps. On Fri, Jun 5, 2020 at 1:12 PM Stone Zhong w