Thanks for your suggestion. I will try this and update by late evening. regards Rajeev
Rajeev Srivastava Silverline Design Inc 2118 Walsh ave, suite 204 Santa Clara, CA, 95050 cell : 408-409-0940 On Mon, Dec 16, 2013 at 11:24 AM, Andrew Ash <[email protected]> wrote: > Hi Rajeev, > > It looks like you're using the com.hadoop.mapred.DeprecatedLzoTextInputFormat > input format above, while Stephen referred to com.hadoop.mapreduce. > LzoTextInputFormat > > I think the way to use this in Spark would be to use the > SparkContext.hadoopFile() or SparkContext.newAPIHadoopFile() methods with > the path and the InputFormat as parameters. Can you give those a shot? > > Andrew > > > On Wed, Dec 11, 2013 at 8:59 PM, Rajeev Srivastava < > [email protected]> wrote: > >> Hi Stephen, >> I tried the same lzo file with a simple hadoop script >> this seems to work fine >> >> HADOOP_HOME=/usr/lib/hadoop >> /usr/bin/hadoop jar >> /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop-mapreduce/hadoop-streaming.jar >> \ >> -libjars >> /opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/hadoop-lzo-cdh4-0.4.15-gplextras.jar >> \ >> -input /tmp/ldpc.sstv3.lzo \ >> -inputformat com.hadoop.mapred.DeprecatedLzoTextInputFormat \ >> -output wc_test \ >> -mapper 'cat' \ >> -reducer 'wc -l' >> >> This means hadoop is able to handle the lzo file correctly >> >> Can you suggest me what i should do in spark for it to work >> >> regards >> Rajeev >> >> >> Rajeev Srivastava >> Silverline Design Inc >> 2118 Walsh ave, suite 204 >> Santa Clara, CA, 95050 >> cell : 408-409-0940 >> >> >> On Tue, Dec 10, 2013 at 1:20 PM, Stephen Haberman < >> [email protected]> wrote: >> >>> >>> > System.setProperty("spark.io.compression.codec", >>> > "com.hadoop.compression.lzo.LzopCodec") >>> >>> This spark.io.compression.codec is a completely different setting than >>> the >>> codecs that are used for reading/writing from HDFS. (It is for >>> compressing >>> Spark's internal/non-HDFS intermediate output.) >>> >>> > Hope this helps and someone can help read a LZO file >>> >>> Spark just uses the regular Hadoop File System API, so any issues with >>> reading >>> LZO files would be Hadoop issues. I would search in the Hadoop issue >>> tracker, >>> and look for information on using LZO files with Hadoop/Hive, and >>> whatever works >>> for them, should magically work for Spark as well. >>> >>> This looks like a good place to start: >>> >>> https://github.com/twitter/hadoop-lzo >>> >>> IANAE, but I would try passing one of these: >>> >>> >>> https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/mapreduce/LzoTextInputFormat.java >>> >>> To the SparkContext.hadoopFile method. >>> >>> - Stephen >>> >>> >> >
