> Have you tried looking at the HBase and Cassandra examples under the spark 
> example project? These use custom InputFormats and may provide guidance as to 
> how to go about using the relevant Protobuf inputformat.


Thanks for the pointer Nick, I will look at it once I get past the LZO stage. 


Issac,

How did you get Spark to use the LZO native libraries. I have a fully 
functional hadoop deployment with pig and scalding crunching the lzo files. But 
even after adding the lzo library folder to SPARK_CLASSPATH I get the following 
error : 

java.io.IOException: No codec for file 
hdfs://abc.xxx.com:8020/path/to/lzo/file.lzo found, cannot run
        at 
com.twitter.elephantbird.mapreduce.input.LzoRecordReader.initialize(LzoRecordReader.java:80)
        at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:86)



Thanks
Vipul

On Jan 21, 2014, at 9:32 AM, Issac Buenrostro <[email protected]> wrote:

> Hi Vipul,
> 
> I use something like this to read from LZO compressed text files, it may be 
> helpful:
> 
> import com.twitter.elephantbird.mapreduce.input.LzoTextInputFormat
> import org.apache.hadoop.io.{LongWritable, Text}
> import org.apache.hadoop.mapreduce.Job
> 
> val sc = new SparkContext(sparkMaster, "lzoreader", sparkDir, 
> List(config.getString("spark.jar")))
> sc.newAPIHadoopFile(logFile,classOf[LzoTextInputFormat],classOf[LongWritable],classOf[Text],
>  new Job().getConfiguration()).map(line => line._2)
> 
> Additionally I had to compile LZO native libraries, so keep that in mind.
> 
> 
> On Tue, Jan 21, 2014 at 6:57 AM, Nick Pentreath <[email protected]> 
> wrote:
> Hi Vipul
> 
> Have you tried looking at the HBase and Cassandra examples under the spark 
> example project? These use custom InputFormats and may provide guidance as to 
> how to go about using the relevant Protobuf inputformat.
> 
> 
> 
> 
> On Mon, Jan 20, 2014 at 11:48 PM, Vipul Pandey <[email protected]> wrote:
> Any suggestions, anyone? 
> Core team / contributors / spark-developers - any thoughts?
> 
> On Jan 17, 2014, at 4:45 PM, Vipul Pandey <[email protected]> wrote:
> 
>> Hi All,
>> 
>> Can someone please share (sample) code to read lzo compressed protobufs from 
>> hdfs (using elephant bird)? I'm trying whatever I see in the forum and on 
>> the web but it doesn't seem comprehensive to me. 
>> 
>> I'm using Spark0.8.0 . My pig scripts are able to read protobuf just fine so 
>> the hadoop layer is setup alright.  It will be really helpful if someone can 
>> list out what needs to be done with/in spark. 
>> 
>> ~Vipul
>> 
> 
> 
> 
> 
> 
> -- 
> --
> Issac Buenrostro
> Software Engineer | 
> [email protected] | (617) 997-3350
> www.ooyala.com | blog | @ooyala

Reply via email to