rdd = file_paths_rdd.map(lambda x: foo(x,"wholeFile")).flatMap(lambda
>> x:x)
>>
>> I'd like to now do something similar but with the generator, so that I can
>> work with more cores and a lower memory. I'm not sure how to tackle this
>> since generator
ou load the files in hdfs and let the partitions be
>> equal to the amount of parallelism you want?
>>
>>
>>
>> From: Saatvik Shah [mailto:saatvikshah1...@gmail.com]
>> Sent: Friday, June 30, 2017 8:55 AM
>> To: ayan guha
>> Cc: user
>> Subj
ilto:saatvikshah1...@gmail.com]
> *Sent:* Friday, June 30, 2017 8:55 AM
> *To:* ayan guha
> *Cc:* user
> *Subject:* Re: PySpark working with Generators
>
>
>
> Hey Ayan,
>
>
>
> This isnt a typical text file - Its a proprietary data format for which a
> nati
Wouldn’t this work if you load the files in hdfs and let the partitions be
equal to the amount of parallelism you want?
From: Saatvik Shah [mailto:saatvikshah1...@gmail.com]
Sent: Friday, June 30, 2017 8:55 AM
To: ayan guha
Cc: user
Subject: Re: PySpark working with Generators
Hey Ayan,
This
to tackle this
>>> since generators cannot be pickled and thus I'm not sure how to ditribute
>>> the work of reading each file_path on the rdd?
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>&g
rator, so that I can
>> work with more cores and a lower memory. I'm not sure how to tackle this
>> since generators cannot be pickled and thus I'm not sure how to ditribute
>> the work of reading each file_path on the rdd?
>>
>>
>>
>> --
>> Vi
nerator, so that I can
> work with more cores and a lower memory. I'm not sure how to tackle this
> since generators cannot be pickled and thus I'm not sure how to ditribute
> the work of reading each file_path on the rdd?
>
>
>
> --
> View this message in cont
the rdd?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-working-with-Generators-tp28810.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscr