Ok I've got a much larger file now. 2.5 gigs uncompressed 1.2 gigs compressed
(46 blocks, 1mb page size, using snappy).  I updated parquet and to the
distributed spark 0.9.0 It's still not parallelising without a partition.

Partitioning the file before every processing cycle is becoming really
annoying as it takes forever to load the file. 

Any ideas of what I could try ? is it because of some incompatibility with
parquet or avro ? 

Regards

Hassan



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-get-a-local-job-to-parallelise-using-0-9-0-from-git-with-parquet-and-avro-tp1130p1306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to