I know, spark doesn’t support zip file directly since it not distributable. Any techniques to process this file quickly?
I am trying to process around 4GB zip file. All data is moving one executor, and only one task is getting assigned to process all the data. Even when I run repartition method, data is getting portioned but on same executor. How to distribute data to other executors? How to get assigned more tasks/threads when It got portioned on same executor? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org