Thanks for your reply. DataSet I am receiving from MainFrames system which I
don't have control .
Tried below things to move data to other executors but not succeeded
1. Called repartition method, data got re-partitioned but on same
executor. Only one core is processing all these
Does the zip file contain only one file? I fear in this case you can only have
one core.
Do you mean by the way gzip? In this case you cannot decompress it in
parallel...
How is the zip file created ? Can’t you create several ones?
> On 10. Aug 2018, at 22:54, mytramesh wrote:
>
> I know,
I know, spark doesn’t support zip file directly since it not distributable.
Any techniques to process this file quickly?
I am trying to process around 4GB zip file. All data is moving one executor,
and only one task is getting assigned to process all the data.
Even when I run repartition