Re: override collect_list

2019-12-01 Thread Driesprong, Fokko
Hi Abhnav, this sounds to me like a bad design, since it isn't scalable. Would it be possible to store all the data in a database like hbase/bigtable/cassandra? This would allow you to write the data from all the workers in parallel to the database/ Cheers, Fokko Op wo 27 nov. 2019 om 06:58 schr

override collect_list

2019-11-26 Thread Ranjan, Abhinav
Hi all, I want to collect some rows in a list by using the spark's collect_list function. However, the no. of rows getting in the list is overflowing the memory. Is there any way to force the collection of rows onto the disk rather than in memory, or else instead of collecting it as a list,