Hi All, Currently, I follow this blog http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ <http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/> that I could use hdfs dfs -text to read the LZO file. But I want to know how to use Spark to read lzo file? I put the hadoop-lzo.jar to spark/jars and follow the blog https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/reading-lzo-files.md <https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/reading-lzo-files.md>.
Here are my script sc.newAPIHadoopFile(“hfs://<my_path_to_file>", classOf[com.hadoop.mapreduce.LzoTextInputFormat],classOf[org.apache.hadoop.io.LongWritable],classOf[org.apache.hadoop.io.Text]) val lzoRDD = files.map(_._2.toString) The result of it is null. Does anyone has some experience of this? Sean Sun