Re: How to parallelize zip file processing?

2018-08-13 Thread mytramesh
Thanks for your reply. DataSet I am receiving from MainFrames system which I don't have control . Tried below things to move data to other executors but not succeeded 1. Called repartition method, data got re-partitioned but on same executor. Only one core is processing all these

Re: How to parallelize zip file processing?

2018-08-10 Thread Jörn Franke
Does the zip file contain only one file? I fear in this case you can only have one core. Do you mean by the way gzip? In this case you cannot decompress it in parallel... How is the zip file created ? Can’t you create several ones? > On 10. Aug 2018, at 22:54, mytramesh wrote: > > I know,

How to parallelize zip file processing?

2018-08-10 Thread mytramesh
I know, spark doesn’t support zip file directly since it not distributable. Any techniques to process this file quickly? I am trying to process around 4GB zip file. All data is moving one executor, and only one task is getting assigned to process all the data. Even when I run repartition