Re: Reading lzo+index with spark-csv (Splittable reads)

Hyukjin Kwon Sun, 31 Jan 2016 17:12:07 -0800

Hm.. As I said here
https://github.com/databricks/spark-csv/issues/245#issuecomment-177682354,


It sounds reasonable in a way though. For me, this might be to deal with
some narrow use-cases.

How about using csvRdd(),
https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvParser.scala#L143-L162
?

I think you can do this like below:


val rdd = sc.newAPIHadoopFile("/file.csv.lzo",
                    classOf[com.hadoop.mapreduce.LzoTextInputFormat],
                    classOf[org.apache.hadoop.io.LongWritable],
                    classOf[org.apache.hadoop.io.Text])
val df = new CsvParser()
      .csvRdd(sqlContext, rdd)



2016-01-30 10:04 GMT+09:00 syepes <[email protected]>:

> Well looking at the src it look like its not implemented:
>
>
> https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/util/TextFile.scala#L34-L36
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Reading-lzo-index-with-spark-csv-Splittable-reads-tp26103p26105.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Reading lzo+index with spark-csv (Splittable reads)

Reply via email to