Hello,
​
I have managed to speed up the read stage when loading CSV files using the
classic "newAPIHadoopFile" method, the issue is that I would like to use the
spark-csv package and it seams that its not taking into consideration the
LZO Index file / Splittable reads.

/# Using the classic method the read is fully parallelized (Splittable)/
sc.newAPIHadoopFile("/user/sy/data.csv.lzo", .... ).count

/# When spark-csv is used the file is read only from one node (No Splittable
reads)/
sqlContext.read.format("com.databricks.spark.csv").options(Map("path" ->
"/user/sy/data.csv.lzo", "header" -> "true", "inferSchema" ->
"false")).load().count()

Does anyone know if this is currently supported?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-lzo-index-with-spark-csv-Splittable-reads-tp26103.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to