Hi Gerrit,
I don't know about GPB, but I am using LZO, and using using
com.twitter.elephantbird.pig8.load.LzoTextLoader.
Thanks
chaitanya
*
*
Please find attached the hadoop mapred/ core site configuration files and a
pig-error-log.
The error log resembles close to
http://www.mail-archive.com/[email protected]/msg01009.html mail thread,
but suggestions made there doesn’t seem to work.
Please find my environment config /versions below,
*csharma $ hadoop version*
*Hadoop 0.20.2-cdh3u0*
Subversion -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14
Compiled by root on Sat Mar 26 00:12:30 UTC 2011
>From source with checksum 6c1f62dddc4eac69b6b973c18bbc0f55
*csharma $ pig -version*
*Apache Pig version 0.8.0-cdh3u0 (rexported)*
compiled Mar 25 2011, 16:16:24
For LZO I am using, with build output snippets.
*csharma@hadoop-lzo $ git remote -v*
origin https://github.com/kevinweil/hadoop-lzo.git (fetch)
origin https://github.com/kevinweil/hadoop-lzo.git (push)
csharma@hadoop-lzo $ tree build/hadoop-lzo-0.4.10/lib/
build/hadoop-lzo-0.4.10/lib/
|-- commons-logging-1.0.4.jar
|-- commons-logging-api-1.0.4.jar
|-- junit-3.8.1.jar
`-- native
`-- Linux-i386-32
|-- libgplcompression.a
|-- libgplcompression.la
|-- libgplcompression.so -> libgplcompression.so.0.0.0
|-- libgplcompression.so.0 -> libgplcompression.so.0.0.0
`-- libgplcompression.so.0.0.0
csharma@hadoop-lzo $ ls -l build/
drwxr-xr-x 6 csharma csharma 4096 2011-04-18 08:11 hadoop-lzo-0.4.10
-rw-r--r-- 1 csharma csharma 59855 2011-04-18 08:11 hadoop-lzo-0.4.10.jar
-rw-r--r-- 1 csharma csharma 1810286 2011-04-18 08:11
hadoop-lzo-0.4.10.tar.gz
*For pig - 0.8 support, elephant bird :*
*csharma@elephant-bird-gerritjvv $ git remote –v *
origin https://github.com/gerritjvv/elephant-bird.git (fetch)
origin https://github.com/gerritjvv/elephant-bird.git (push)
https://github.com/dvryaboy/elephant-bird/tree/pig-08
*csharma@ubuntu:~/Projects/elephant-bird-dvryaboy$ git remote –v *
origin https://github.com/dvryaboy/elephant-bird.git (fetch)
origin https://github.com/dvryaboy/elephant-bird.git (push)
Also, been having a lot of problems building elephant bird *without* thrift
/ protobuf.
Now I did find this project,
http://code.google.com/p/hadoop-gpl-packing/, saying
EB can handle even Pig 0.8.
This confuses me - can I or can I not use Elephant Bird with Pig 0.8?
-------------------------------------------------------------------
I created a smaple lzo compressed data file clean.txt.lzo, then indexed the
lzo file to verify correct LZO installation for my pseudo cluster,
*hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-0.4.10.jar
com.hadoop.compression.lzo.DistributedLzoIndexer /data/clean.txt.lzo*
but, when running pig jobs, with pig-lzo-loaders from elephant bird, I’ve
always been running into problems; following either just crashes or keeps
running forever, spitting gb’s of garbage data.
*grunt> register
/usr/lib/hadoop/lib/elephant-bird-2.0-SNAPSHOT.jar;
*
*grunt> d = load '/data/clean.txt.lzo' using
com.twitter.elephantbird.pig8.load.LzoTextLoader();*
|
|--Throws a boatload of errors, attached.
Please do let me know, how my configuration / versions are conflicting, and
which should I change to get this to work?
On Mon, Apr 18, 2011 at 2:52 PM, Gerrit Jansen van Vuuren <
[email protected]> wrote:
> also,
>
> If your using GPB and LZO, I use
> the com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore .
>
>
> On Mon, Apr 18, 2011 at 8:47 PM, Gerrit Jansen van Vuuren <
> [email protected]> wrote:
>
>> Hi,
>>
>> I've used LzoTextLoader int the past and it seems to hang. Haven't looked
>> into why.
>>
>> Please try the com.twitter.elephantbird.pig.load.LzoPigStorage.
>>
>> Cheers,
>> Gerrit
>>
>> On Mon, Apr 18, 2011 at 8:23 PM, Chaitanya Sharma
>> <[email protected]>wrote:
>>
>>> Hi,
>>>
>>> I am trying to get LZO support for my little pig - 0.8 project ,
>>> I'm using https://github.com/gerritjvv/elephant-bird.git for the
>>> pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the
>>> hadoop
>>> lzo support.
>>>
>>> The pig-loader,
>>> com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
>>> be upto job, and the map-reduce job created doesn't proceed
>>> any further than 0%.
>>>
>>>
>>> Has anyone been successful at this, is so please share you experience.
>>>
>>> Or what patch levels / git source branches should I be using to get this
>>> to
>>> work?
>>>
>>>
>>> Thanks,
>>> Chaitanya
>>>
>>
>>
>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-0.20/cache/${user.name}</value>
</property>
<!-- OOZIE proxy user setting -->
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<!-- compression codec -->
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<!-- hadoop lzo compression codec. -->
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
<!-- Enable Hue plugins -->
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
<description>Comma-separated list of jobtracker plug-ins to be activated.
</description>
</property>
<property>
<name>jobtracker.thrift.address</name>
<value>0.0.0.0:9290</value>
</property>
<!-- native libraries! -->
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m -Xms1024m -Djava.library.path=/usr/lib/hadoop/lib/native/Linux-i386-32</value>
<final>true</final>
</property>
<!-- for intermediate compression setup -->
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>