How to parallelize zip file processing?

mytramesh Fri, 10 Aug 2018 13:55:35 -0700

I know, spark doesn’t support zip file directly since it not distributable.
Any techniques to process this file quickly?


I am trying to process around 4GB zip file. All data is moving one executor,
and only one task is getting assigned to process all the data. 

Even when I run repartition method, data is getting portioned but on same
executor. 


How to distribute data to other executors? 
How to get assigned more tasks/threads when It got portioned on same
executor? 




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How to parallelize zip file processing?

Reply via email to