Re: PySpark working with Generators

2017-07-05 Thread Saatvik Shah
rdd = file_paths_rdd.map(lambda x: foo(x,"wholeFile")).flatMap(lambda >> x:x) >> >> I'd like to now do something similar but with the generator, so that I can >> work with more cores and a lower memory. I'm not sure how to tackle this >> since generator

Re: PySpark working with Generators

2017-06-30 Thread Jörn Franke
ou load the files in hdfs and let the partitions be >> equal to the amount of parallelism you want? >> >> >> >> From: Saatvik Shah [mailto:saatvikshah1...@gmail.com] >> Sent: Friday, June 30, 2017 8:55 AM >> To: ayan guha >> Cc: user >> Subj

Re: PySpark working with Generators

2017-06-30 Thread Saatvik Shah
ilto:saatvikshah1...@gmail.com] > *Sent:* Friday, June 30, 2017 8:55 AM > *To:* ayan guha > *Cc:* user > *Subject:* Re: PySpark working with Generators > > > > Hey Ayan, > > > > This isnt a typical text file - Its a proprietary data format for which a > nati

RE: PySpark working with Generators

2017-06-29 Thread Mahesh Sawaiker
Wouldn’t this work if you load the files in hdfs and let the partitions be equal to the amount of parallelism you want? From: Saatvik Shah [mailto:saatvikshah1...@gmail.com] Sent: Friday, June 30, 2017 8:55 AM To: ayan guha Cc: user Subject: Re: PySpark working with Generators Hey Ayan, This

Re: PySpark working with Generators

2017-06-29 Thread ayan guha
to tackle this >>> since generators cannot be pickled and thus I'm not sure how to ditribute >>> the work of reading each file_path on the rdd? >>> >>> >>> >>> -- >>> View this message in context: http://apache-spark-user-list. >>&g

Re: PySpark working with Generators

2017-06-29 Thread Saatvik Shah
rator, so that I can >> work with more cores and a lower memory. I'm not sure how to tackle this >> since generators cannot be pickled and thus I'm not sure how to ditribute >> the work of reading each file_path on the rdd? >> >> >> >> -- >> Vi

Re: PySpark working with Generators

2017-06-29 Thread ayan guha
nerator, so that I can > work with more cores and a lower memory. I'm not sure how to tackle this > since generators cannot be pickled and thus I'm not sure how to ditribute > the work of reading each file_path on the rdd? > > > > -- > View this message in cont

PySpark working with Generators

2017-06-29 Thread saatvikshah1994
the rdd? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-working-with-Generators-tp28810.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscr