subject:"Re\: Pyspark and multiprocessing"

Re: Pyspark and multiprocessing

2022-07-21 Thread Khalid Mammadov

Pool.map requires 2 arguments. 1st a function and 2nd an iterable i.e. list, set etc. Check out examples from official docs how to use it: https://docs.python.org/3/library/multiprocessing.html On Thu, 21 Jul 2022, 21:25 Bjørn Jørgensen, wrote: > Thank you. > The reason for using spark local is

Re: Pyspark and multiprocessing

2022-07-21 Thread Bjørn Jørgensen

Thank you. The reason for using spark local is to test the code, and as in this case I find the bottlenecks and fix them before I spinn up a K8S cluster. I did test it now with 16 cores and 10 files import time tic = time.perf_counter() json_to_norm_with_null("/home/jovyan/notebooks/falk/test",

Re: Pyspark and multiprocessing

2022-07-21 Thread Khalid Mammadov

One quick observation is that you allocate all your local CPUs to Spark then execute that app with 10 Threads i.e 10 spark apps and so you will need 160cores in total as each will need 16CPUs IMHO. Wouldn't that create CPU bottleneck? Also on the side note, why you need Spark if you use that on lo

Re: Pyspark and multiprocessing

Re: Pyspark and multiprocessing

Re: Pyspark and multiprocessing

3 matches

Site Navigation

Mail list logo

Footer information