Issac,

I have all these entries in my core-site.xml and as I mentioned before my Pig 
jobs are running just fine. And the JAVA_LIBRARY_PATH already points to the lzo 
lib directory. 
Not sure what to change/add and where.

Thanks,
Vipul



On Jan 22, 2014, at 1:37 PM, Issac Buenrostro <[email protected]> wrote:

> You need a core-site.xml file in the classpath with these lines
> 
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> 
> <configuration>
> 
>   <property>
>     <name>io.compression.codecs</name>
>     
> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
>   </property>
>   <property>
>     <name>io.compression.codec.lzo.class</name>
>     <value>com.hadoop.compression.lzo.LzoCodec</value>
>   </property>
> 
> </configuration>
> 
> 
> I also added both the native libraries path and the path to lzoc library to 
> JAVA_LIBRARY_PATH, but I don't know if this is necessary. This is the command 
> I used in mac:
> 
> export 
> JAVA_LIBRARY_PATH=/Users/*/hadoop-lzo/target/native/Mac_OS_X-x86_64-64/lib:/usr/local/Cellar/lzo/2.06/lib
> 
> 
> On Wed, Jan 22, 2014 at 12:28 PM, Vipul Pandey <[email protected]> wrote:
> 
>> Have you tried looking at the HBase and Cassandra examples under the spark 
>> example project? These use custom InputFormats and may provide guidance as 
>> to how to go about using the relevant Protobuf inputformat.
> 
> 
> Thanks for the pointer Nick, I will look at it once I get past the LZO stage. 
> 
> 
> Issac,
> 
> How did you get Spark to use the LZO native libraries. I have a fully 
> functional hadoop deployment with pig and scalding crunching the lzo files. 
> But even after adding the lzo library folder to SPARK_CLASSPATH I get the 
> following error : 
> 
> java.io.IOException: No codec for file 
> hdfs://abc.xxx.com:8020/path/to/lzo/file.lzo found, cannot run
>       at 
> com.twitter.elephantbird.mapreduce.input.LzoRecordReader.initialize(LzoRecordReader.java:80)
>       at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:86)
> 
> 
> 
> Thanks
> Vipul
> 
> On Jan 21, 2014, at 9:32 AM, Issac Buenrostro <[email protected]> wrote:
> 
>> Hi Vipul,
>> 
>> I use something like this to read from LZO compressed text files, it may be 
>> helpful:
>> 
>> import com.twitter.elephantbird.mapreduce.input.LzoTextInputFormat
>> import org.apache.hadoop.io.{LongWritable, Text}
>> import org.apache.hadoop.mapreduce.Job
>> 
>> val sc = new SparkContext(sparkMaster, "lzoreader", sparkDir, 
>> List(config.getString("spark.jar")))
>> sc.newAPIHadoopFile(logFile,classOf[LzoTextInputFormat],classOf[LongWritable],classOf[Text],
>>  new Job().getConfiguration()).map(line => line._2)
>> 
>> Additionally I had to compile LZO native libraries, so keep that in mind.
>> 
>> 
>> On Tue, Jan 21, 2014 at 6:57 AM, Nick Pentreath <[email protected]> 
>> wrote:
>> Hi Vipul
>> 
>> Have you tried looking at the HBase and Cassandra examples under the spark 
>> example project? These use custom InputFormats and may provide guidance as 
>> to how to go about using the relevant Protobuf inputformat.
>> 
>> 
>> 
>> 
>> On Mon, Jan 20, 2014 at 11:48 PM, Vipul Pandey <[email protected]> wrote:
>> Any suggestions, anyone? 
>> Core team / contributors / spark-developers - any thoughts?
>> 
>> On Jan 17, 2014, at 4:45 PM, Vipul Pandey <[email protected]> wrote:
>> 
>>> Hi All,
>>> 
>>> Can someone please share (sample) code to read lzo compressed protobufs 
>>> from hdfs (using elephant bird)? I'm trying whatever I see in the forum and 
>>> on the web but it doesn't seem comprehensive to me. 
>>> 
>>> I'm using Spark0.8.0 . My pig scripts are able to read protobuf just fine 
>>> so the hadoop layer is setup alright.  It will be really helpful if someone 
>>> can list out what needs to be done with/in spark. 
>>> 
>>> ~Vipul
>>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> --
>> Issac Buenrostro
>> Software Engineer | 
>> [email protected] | (617) 997-3350
>> www.ooyala.com | blog | @ooyala
> 
> 
> 
> 
> -- 
> --
> Issac Buenrostro
> Software Engineer | 
> [email protected] | (617) 997-3350
> www.ooyala.com | blog | @ooyala

Reply via email to