John, any chance that still applies to trunk? Looks like there was some work left in that one. Seems like a good idea though...
On Tue, Sep 24, 2013 at 6:21 AM, John Meagher <[email protected]>wrote: > There's a patch available to allow using any available javax.script > language to do the conversion from any Java object type in the > sequence file to pig types. See > https://issues.apache.org/jira/browse/PIG-1777 > > On Tue, Sep 24, 2013 at 5:22 AM, Dmitriy Ryaboy <[email protected]> > wrote: > > I assume by scala you mean scalding? > > If so, yeah, scalding should be much easier for working with custom data > > types. > > > > Pig doesn't handle generic "objects" well. You have to write converters > to > > and from, like the ones we created in ElephantBird for Protocol Buffers > and > > Thrift (and a bunch of writables, as Pradeep pointed out). > > > > D > > > > > > On Tue, Sep 17, 2013 at 9:20 AM, Yang <[email protected]> wrote: > > > >> Thanks Pradeep. > >> > >> it seems in this case just using scala/cascalog is easier for my > purposes. > >> I tried out scala yesterday, works fine for me in local mode > >> > >> > >> On Mon, Sep 16, 2013 at 7:47 PM, Pradeep Gollakota < > [email protected] > >> >wrote: > >> > >> > It doesn't look like the SequenceFileLoader from the piggybank has > much > >> > support. The elephant bird version looks like it does what you need > it to > >> > do. > >> > > >> > > >> > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java > >> > > >> > You'll have to write the converters from your types to Pig data types > and > >> > pass it into the constructor of the SequenceFileLoader. > >> > > >> > Hope this helps! > >> > > >> > > >> > On Mon, Sep 16, 2013 at 6:56 PM, Pradeep Gollakota < > [email protected] > >> > >wrote: > >> > > >> > > Thats correct... > >> > > > >> > > The "load ... AS (k:chararray, v:charrary);" doesn't actually do > what > >> you > >> > > think it does. The AS statement tell Pig what the schema types are, > so > >> it > >> > > will call the appropriate LoadCaster method to get it into the right > >> > type. > >> > > A LoadCaster object defines how to map byte[] into appropriate Pig > >> > > datatypes. If the LoadFunc is not schema aware and you don't have > the > >> > > schema defined when you load, everything will be loaded as a > bytearray. > >> > > > >> > > The problem you have is that the custom writable isn't a Pig > datatype. > >> I > >> > > don't think you'll be able to do this without writing some custom > code. > >> > > I'll take a look at the source code for the SequenceFileLoader and > see > >> if > >> > > there's a way to specify your own LoadCaster. If there is, then > you'll > >> > just > >> > > have to write a custom LoadCaster and specify it in the > configuration. > >> If > >> > > not, you'll have to extend and roll out your own SequenceFileLoader. > >> > > > >> > > > >> > > On Mon, Sep 16, 2013 at 6:43 PM, Yang <[email protected]> > wrote: > >> > > > >> > >> I think my custom type has toString(), well at least writable() > says > >> > it's > >> > >> writable to bytes, so supposedly if I force it to bytes or string, > pig > >> > >> should be able to cast > >> > >> like > >> > >> > >> > >> load ... AS ( k:chararray, v:chararray); > >> > >> > >> > >> but this actually fails > >> > >> > >> > >> > >> > >> On Mon, Sep 16, 2013 at 6:22 PM, Pradeep Gollakota < > >> > [email protected] > >> > >> >wrote: > >> > >> > >> > >> > The problem is that pig only speaks its data types. So you need > to > >> > tell > >> > >> it > >> > >> > how to translate from your custom writable to a pig datatype. > >> > >> > > >> > >> > Apparently elephant-bird has some support for doing this type of > >> > >> thing... > >> > >> > take a look at this SO post > >> > >> > > >> > >> > > >> > >> > >> > > >> > http://stackoverflow.com/questions/16540651/apache-pig-can-we-convert-a-custom-writable-object-to-pig-format > >> > >> > > >> > >> > > >> > >> > On Mon, Sep 16, 2013 at 5:37 PM, Yang <[email protected]> > >> wrote: > >> > >> > > >> > >> > > I tried to do a quick and dirty inspection of some of our data > >> > feeds, > >> > >> > which > >> > >> > > are encoded in gzipped SequenceFile. > >> > >> > > > >> > >> > > basically I did > >> > >> > > > >> > >> > > a = load 'myfile' using ......SequenceFileLoader() AS ( mykey, > >> > >> myvalue); > >> > >> > > > >> > >> > > but it gave me some error: > >> > >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO > >> > >> > > org.apache.hadoop.io.compress.CodecPool - Got brand-new > >> > decompressor > >> > >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO > >> > >> > > org.apache.hadoop.io.compress.CodecPool - Got brand-new > >> > decompressor > >> > >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO > >> > >> > > org.apache.hadoop.io.compress.CodecPool - Got brand-new > >> > decompressor > >> > >> > > 2013-09-16 17:34:28,961 [Thread-5] WARN > >> > >> > > org.apache.pig.piggybank.storage.SequenceFileLoader - Unable > to > >> > >> > translate > >> > >> > > key class com.mycompany.model.VisitKey to a Pig datatype > >> > >> > > 2013-09-16 17:34:28,962 [Thread-5] WARN > >> > >> > > org.apache.hadoop.mapred.FileOutputCommitter - Output path is > >> null > >> > in > >> > >> > > cleanup > >> > >> > > 2013-09-16 17:34:28,963 [Thread-5] WARN > >> > >> > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 > >> > >> > > org.apache.pig.backend.BackendException: ERROR 0: Unable to > >> > translate > >> > >> > class > >> > >> > > com.mycompany.model.VisitKey to a Pig datatype > >> > >> > > at > >> > >> > > > >> > >> > > > >> > >> > > >> > >> > >> > > >> > org.apache.pig.piggybank.storage.SequenceFileLoader.setKeyType(SequenceFileLoader.java:78) > >> > >> > > at > >> > >> > > > >> > >> > > > >> > >> > > >> > >> > >> > > >> > org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:133) > >> > >> > > > >> > >> > > > >> > >> > > in the pig file, I have already REGISTERED the jar that > contains > >> the > >> > >> > class > >> > >> > > com.mycompany.model.VisitKey > >> > >> > > > >> > >> > > > >> > >> > > if PIG doesn't work, the only other approach is probably to use > >> some > >> > >> of > >> > >> > the > >> > >> > > newer "pseudo-scripting " languages like cascalog or scala > >> > >> > > thanks > >> > >> > > Yang > >> > >> > > > >> > >> > > >> > >> > >> > > > >> > > > >> > > >> >
