Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

pig Wed, 22 Sep 2010 12:01:52 -0700

[foo]$ echo $CLASSPATH
:/usr/lib/elephant-bird/lib/*

This has been set for both user foo and hadoop but I still get the same
error.  Is this the correct environment variable to be setting?


Thank you!

~Ed


On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> elephant-bird/lib/* (the * is important)
>
> On Wed, Sep 22, 2010 at 11:42 AM, pig <hadoopn...@gmail.com> wrote:
>
> > Well I thought that would be a simple enough fix but no luck so far.
> >
> > I've added the elephant-bird/lib directory (which I made world readable
> and
> > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH as
> > both
> > the user running grunt and the hadoop user. (sort of a shotgun approach)
> >
> > I still get the error where it complains about nogplcompression and in
> the
> > log it has an error where it can't find com.google.common.collect.Maps
> >
> > Are these two separate problems or is it one problem that is causing two
> > different errors?  Thank you for the help!
> >
> > ~Ed
> >
> > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> > wrote:
> >
> > > You need the jars in elephant-bird's lib/ on your classpath to run
> > > Elephant-Bird.
> > >
> > >
> > > On Wed, Sep 22, 2010 at 10:35 AM, pig <hadoopn...@gmail.com> wrote:
> > >
> > > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7
> branch
> > a
> > > > shot and have run into a problem when trying to run the following
> test
> > > pig
> > > > script:
> > > >
> > > > REGISTER elephant-bird-1.0.jar
> > > > A = LOAD '/user/foo/input' USING
> > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > > > B = LIMIT A 100;
> > > > DUMP B;
> > > >
> > > > When I try to run this I get the following error:
> > > >
> > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> java.library.path
> > > >  ....
> > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> > > without
> > > > native-hadoop
> > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
> internal
> > > > error.  could not instantiate
> > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments
> > '[
> > > > ]'
> > > >
> > > > Looking at the log file it gives the following:
> > > >
> > > > java.lang.RuntimeException: could not instantiate
> > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments
> > '[
> > > > ]'
> > > > ...
> > > > Caused by: java.lang.reflect.InvocationTargetException
> > > > ...
> > > > Caused by: java.lang.NoClassDefFoundError:
> > com/google/common/collect/Maps
> > > > ...
> > > > Caused by: java.lang.ClassNotFoundException:
> > > com.google.common.collect.Maps
> > > >
> > > > What is confusing me is that LZO compression and decompression works
> > fine
> > > > when I'm running a normal java based map reduce program so I feel as
> > > though
> > > > the libraries have to be in the right place with the right settings
> for
> > > > java.library.path.  Otherwise how would normal java map-reduce work?
> >  Is
> > > > there some other location I need to set JAVA_LIBRARY_PATH for pig to
> > pick
> > > > it
> > > > up?  My understanding was that it would get this from hadoop-env.sh.
> >  Are
> > > > the missing com.google.common.collect.Maps the real problem here?
> >  Thank
> > > > you
> > > > for any help!
> > > >
> > > > ~Ed
> > > >
> > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ed,
> > > > > Elephant-bird only works with 0.6 at the moment. There's a branch
> for
> > > 0.7
> > > > > that I haven't tested: http://github.com/hirohanin/elephant-bird/
> > > > > Try it, let me know if it works.
> > > > >
> > > > > -D
> > > > >
> > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I have a small cluster up and running with LZO compressed files
> in
> > > it.
> > > > >  I'm
> > > > > > using the lzo compression libraries available at
> > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> maintaining
> > > > this!)
> > > > > >
> > > > > > So far everything works fine when I write regular map-reduce
> jobs.
> >  I
> > > > can
> > > > > > read in lzo files and write out lzo files without any problem.
> > > > > >
> > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO
> files
> > > out
> > > > of
> > > > > > the box using the default LoadFunc (PigStorage).  However, I am
> > > > currently
> > > > > > testing a large LZO file (20GB) which I indexed using the
> > LzoIndexer
> > > > and
> > > > > > Pig
> > > > > > does not appear to be making use of the indexes.  The pig scripts
> > > that
> > > > > I've
> > > > > > run so far only have 3 mappers when processing the 20GB file.  My
> > > > > > understanding was that there should be 1 map for each block
> (256MB
> > > > > blocks)
> > > > > > so about 80 mappers when processing the 20GB lzo file.  Does Pig
> > 0.7
> > > > > > support
> > > > > > indexed lzo files with the default load function?
> > > > > >
> > > > > > If not, I was looking at elephant-bird and noticed it is only
> > > > compatible
> > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> > > > > recommended
> > > > > > solution for processing index lzo files using Pig 0.7.
> > > > > >
> > > > > > Thank you for any assistance!
> > > > > >
> > > > > > ~Ed
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Reply via email to