Re: reading LZO compressed file in spark

Stephen Haberman Tue, 10 Dec 2013 13:21:52 -0800

> System.setProperty("spark.io.compression.codec",
> "com.hadoop.compression.lzo.LzopCodec")


This spark.io.compression.codec is a completely different setting than the
codecs that are used for reading/writing from HDFS. (It is for compressing
Spark's internal/non-HDFS intermediate output.)

> Hope this helps and someone can help read a LZO file

Spark just uses the regular Hadoop File System API, so any issues with reading
LZO files would be Hadoop issues. I would search in the Hadoop issue tracker,
and look for information on using LZO files with Hadoop/Hive, and whatever works
for them, should magically work for Spark as well.

This looks like a good place to start:

https://github.com/twitter/hadoop-lzo

IANAE, but I would try passing one of these:

https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/mapreduce/LzoTextInputFormat.java

To the SparkContext.hadoopFile method.

- Stephen

Re: reading LZO compressed file in spark

Reply via email to