Re: Lzo + Protobuf

2014-01-29 Thread Issac Buenrostro
Good! I'll keep your experience in mind in case we have problems in the future :) On Tue, Jan 28, 2014 at 5:55 PM, Vipul Pandey wrote: > I got this to run, maybe in a tad twisted way. Here is what I did to get > to read Lzo compressed Protobufs in spark (I'm on 0.8.0) : > > - I added hadoop's c

Re: Lzo + Protobuf

2014-01-28 Thread Vipul Pandey
I got this to run, maybe in a tad twisted way. Here is what I did to get to read Lzo compressed Protobufs in spark (I'm on 0.8.0) : - I added hadoop's conf folder to spark classpath (in spark-env.sh) in all the nodes and the shell as well - but that didn't help either. So I just added the prop

Re: Lzo + Protobuf

2014-01-22 Thread Vipul Pandey
Issac, I have all these entries in my core-site.xml and as I mentioned before my Pig jobs are running just fine. And the JAVA_LIBRARY_PATH already points to the lzo lib directory. Not sure what to change/add and where. Thanks, Vipul On Jan 22, 2014, at 1:37 PM, Issac Buenrostro wrote: > Y

Re: Lzo + Protobuf

2014-01-22 Thread Issac Buenrostro
You need a core-site.xml file in the classpath with these lines io.compression.codecs org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,or

Re: Lzo + Protobuf

2014-01-22 Thread Vipul Pandey
> Have you tried looking at the HBase and Cassandra examples under the spark > example project? These use custom InputFormats and may provide guidance as to > how to go about using the relevant Protobuf inputformat. Thanks for the pointer Nick, I will look at it once I get past the LZO stage.

Re: Lzo + Protobuf

2014-01-21 Thread Issac Buenrostro
Hi Vipul, I use something like this to read from LZO compressed text files, it may be helpful: import com.twitter.elephantbird.mapreduce.input.LzoTextInputFormat import org.apache.hadoop.io.{LongWritable, Text} import org.apache.hadoop.mapreduce.Job val sc = new SparkContext(sparkMaster, "lzorea

Re: Lzo + Protobuf

2014-01-21 Thread Nick Pentreath
Hi Vipul Have you tried looking at the HBase and Cassandra examples under the spark example project? These use custom InputFormats and may provide guidance as to how to go about using the relevant Protobuf inputformat. On Mon, Jan 20, 2014 at 11:48 PM, Vipul Pandey wrote: > Any suggestions,

Re: Lzo + Protobuf

2014-01-20 Thread Vipul Pandey
Any suggestions, anyone? Core team / contributors / spark-developers - any thoughts? On Jan 17, 2014, at 4:45 PM, Vipul Pandey wrote: > Hi All, > > Can someone please share (sample) code to read lzo compressed protobufs from > hdfs (using elephant bird)? I'm trying whatever I see in the forum

Lzo + Protobuf

2014-01-17 Thread Vipul Pandey
Hi All, Can someone please share (sample) code to read lzo compressed protobufs from hdfs (using elephant bird)? I'm trying whatever I see in the forum and on the web but it doesn't seem comprehensive to me. I'm using Spark0.8.0 . My pig scripts are able to read protobuf just fine so the hado