Ok I've got a much larger file now. 2.5 gigs uncompressed 1.2 gigs compressed (46 blocks, 1mb page size, using snappy). I updated parquet and to the distributed spark 0.9.0 It's still not parallelising without a partition.
Partitioning the file before every processing cycle is becoming really annoying as it takes forever to load the file. Any ideas of what I could try ? is it because of some incompatibility with parquet or avro ? Regards Hassan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-get-a-local-job-to-parallelise-using-0-9-0-from-git-with-parquet-and-avro-tp1130p1306.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
