John, any chance that still applies to trunk? Looks like there was some
work left in that one. Seems like a good idea though...


On Tue, Sep 24, 2013 at 6:21 AM, John Meagher <[email protected]>wrote:

> There's a patch available to allow using any available javax.script
> language to do the conversion from any Java object type in the
> sequence file to pig types.  See
> https://issues.apache.org/jira/browse/PIG-1777
>
> On Tue, Sep 24, 2013 at 5:22 AM, Dmitriy Ryaboy <[email protected]>
> wrote:
> > I assume by scala you mean scalding?
> > If so, yeah, scalding should be much easier for working with custom data
> > types.
> >
> > Pig doesn't handle generic "objects" well. You have to write converters
> to
> > and from, like the ones we created in ElephantBird for Protocol Buffers
> and
> > Thrift (and a bunch of writables, as Pradeep pointed out).
> >
> > D
> >
> >
> > On Tue, Sep 17, 2013 at 9:20 AM, Yang <[email protected]> wrote:
> >
> >> Thanks Pradeep.
> >>
> >> it seems in this case just using scala/cascalog is easier for my
> purposes.
> >> I tried out scala yesterday, works fine for me in local mode
> >>
> >>
> >> On Mon, Sep 16, 2013 at 7:47 PM, Pradeep Gollakota <
> [email protected]
> >> >wrote:
> >>
> >> > It doesn't look like the SequenceFileLoader from the piggybank has
> much
> >> > support. The elephant bird version looks like it does what you need
> it to
> >> > do.
> >> >
> >> >
> >>
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
> >> >
> >> > You'll have to write the converters from your types to Pig data types
> and
> >> > pass it into the constructor of the SequenceFileLoader.
> >> >
> >> > Hope this helps!
> >> >
> >> >
> >> > On Mon, Sep 16, 2013 at 6:56 PM, Pradeep Gollakota <
> [email protected]
> >> > >wrote:
> >> >
> >> > > Thats correct...
> >> > >
> >> > > The "load ... AS (k:chararray, v:charrary);" doesn't actually do
> what
> >> you
> >> > > think it does. The AS statement tell Pig what the schema types are,
> so
> >> it
> >> > > will call the appropriate LoadCaster method to get it into the right
> >> > type.
> >> > > A LoadCaster object defines how to map byte[] into appropriate Pig
> >> > > datatypes. If the LoadFunc is not schema aware and you don't have
> the
> >> > > schema defined when you load, everything will be loaded as a
> bytearray.
> >> > >
> >> > > The problem you have is that the custom writable isn't a Pig
> datatype.
> >> I
> >> > > don't think you'll be able to do this without writing some custom
> code.
> >> > > I'll take a look at the source code for the SequenceFileLoader and
> see
> >> if
> >> > > there's a way to specify your own LoadCaster. If there is, then
> you'll
> >> > just
> >> > > have to write a custom LoadCaster and specify it in the
> configuration.
> >> If
> >> > > not, you'll have to extend and roll out your own SequenceFileLoader.
> >> > >
> >> > >
> >> > > On Mon, Sep 16, 2013 at 6:43 PM, Yang <[email protected]>
> wrote:
> >> > >
> >> > >> I think my custom type has toString(), well at least writable()
> says
> >> > it's
> >> > >> writable to bytes, so supposedly if I force it to bytes or string,
> pig
> >> > >> should be able to cast
> >> > >> like
> >> > >>
> >> > >> load ... AS ( k:chararray, v:chararray);
> >> > >>
> >> > >> but this actually fails
> >> > >>
> >> > >>
> >> > >> On Mon, Sep 16, 2013 at 6:22 PM, Pradeep Gollakota <
> >> > [email protected]
> >> > >> >wrote:
> >> > >>
> >> > >> > The problem is that pig only speaks its data types. So you need
> to
> >> > tell
> >> > >> it
> >> > >> > how to translate from your custom writable to a pig datatype.
> >> > >> >
> >> > >> > Apparently elephant-bird has some support for doing this type of
> >> > >> thing...
> >> > >> > take a look at this SO post
> >> > >> >
> >> > >> >
> >> > >>
> >> >
> >>
> http://stackoverflow.com/questions/16540651/apache-pig-can-we-convert-a-custom-writable-object-to-pig-format
> >> > >> >
> >> > >> >
> >> > >> > On Mon, Sep 16, 2013 at 5:37 PM, Yang <[email protected]>
> >> wrote:
> >> > >> >
> >> > >> > > I tried to do a quick and dirty inspection of some of our data
> >> > feeds,
> >> > >> > which
> >> > >> > > are encoded in gzipped SequenceFile.
> >> > >> > >
> >> > >> > > basically I did
> >> > >> > >
> >> > >> > > a = load 'myfile' using ......SequenceFileLoader() AS ( mykey,
> >> > >> myvalue);
> >> > >> > >
> >> > >> > > but it gave me some error:
> >> > >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO
> >> > >> > >  org.apache.hadoop.io.compress.CodecPool - Got brand-new
> >> > decompressor
> >> > >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO
> >> > >> > >  org.apache.hadoop.io.compress.CodecPool - Got brand-new
> >> > decompressor
> >> > >> > > 2013-09-16 17:34:28,915 [Thread-5] INFO
> >> > >> > >  org.apache.hadoop.io.compress.CodecPool - Got brand-new
> >> > decompressor
> >> > >> > > 2013-09-16 17:34:28,961 [Thread-5] WARN
> >> > >> > >  org.apache.pig.piggybank.storage.SequenceFileLoader - Unable
> to
> >> > >> > translate
> >> > >> > > key class com.mycompany.model.VisitKey to a Pig datatype
> >> > >> > > 2013-09-16 17:34:28,962 [Thread-5] WARN
> >> > >> > >  org.apache.hadoop.mapred.FileOutputCommitter - Output path is
> >> null
> >> > in
> >> > >> > > cleanup
> >> > >> > > 2013-09-16 17:34:28,963 [Thread-5] WARN
> >> > >> > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> >> > >> > > org.apache.pig.backend.BackendException: ERROR 0: Unable to
> >> > translate
> >> > >> > class
> >> > >> > > com.mycompany.model.VisitKey to a Pig datatype
> >> > >> > > at
> >> > >> > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.pig.piggybank.storage.SequenceFileLoader.setKeyType(SequenceFileLoader.java:78)
> >> > >> > >  at
> >> > >> > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:133)
> >> > >> > >
> >> > >> > >
> >> > >> > > in the pig file, I have already REGISTERED the jar that
> contains
> >> the
> >> > >> > class
> >> > >> > >  com.mycompany.model.VisitKey
> >> > >> > >
> >> > >> > >
> >> > >> > > if PIG doesn't work, the only other approach is probably to use
> >> some
> >> > >> of
> >> > >> > the
> >> > >> > > newer "pseudo-scripting " languages like cascalog or scala
> >> > >> > > thanks
> >> > >> > > Yang
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
>

Reply via email to