How to read LZO file in Spark?

孫澤恩 Wed, 27 Sep 2017 03:36:37 -0700

Hi All,

Currently, I follow this blog 
http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
 
<http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/>
 that I could use hdfs dfs -text to read the LZO file.
But I want to know how to use Spark to read lzo file?
I put the hadoop-lzo.jar to spark/jars and follow the blog 
https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/reading-lzo-files.md
 
<https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/reading-lzo-files.md>.


Here are my script
sc.newAPIHadoopFile(“hfs://<my_path_to_file>", 
classOf[com.hadoop.mapreduce.LzoTextInputFormat],classOf[org.apache.hadoop.io.LongWritable],classOf[org.apache.hadoop.io.Text])
val lzoRDD = files.map(_._2.toString)

The result of it is null.

Does anyone has some experience of this?

Sean Sun

How to read LZO file in Spark?

Reply via email to