Hi Gerrit,

I don't know about GPB, but I am using LZO, and using using
com.twitter.elephantbird.pig8.load.LzoTextLoader.

Thanks
chaitanya

*
*

Please find attached the hadoop mapred/ core site configuration files and a
pig-error-log.

The error log resembles close to
http://www.mail-archive.com/[email protected]/msg01009.html mail thread,
but suggestions made there doesn’t seem to work.

Please find my environment config /versions below,



*csharma $ hadoop version*

*Hadoop 0.20.2-cdh3u0*

Subversion  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14

Compiled by root on Sat Mar 26 00:12:30 UTC 2011

>From source with checksum 6c1f62dddc4eac69b6b973c18bbc0f55



*csharma $ pig -version*

*Apache Pig version 0.8.0-cdh3u0 (rexported)*

compiled Mar 25 2011, 16:16:24



For LZO I am using, with build output snippets.

*csharma@hadoop-lzo $ git remote -v*

origin     https://github.com/kevinweil/hadoop-lzo.git (fetch)

origin     https://github.com/kevinweil/hadoop-lzo.git (push)



csharma@hadoop-lzo $ tree build/hadoop-lzo-0.4.10/lib/

build/hadoop-lzo-0.4.10/lib/

|-- commons-logging-1.0.4.jar

|-- commons-logging-api-1.0.4.jar

|-- junit-3.8.1.jar

`-- native

    `-- Linux-i386-32

        |-- libgplcompression.a

        |-- libgplcompression.la

        |-- libgplcompression.so -> libgplcompression.so.0.0.0

        |-- libgplcompression.so.0 -> libgplcompression.so.0.0.0

        `-- libgplcompression.so.0.0.0



csharma@hadoop-lzo $ ls -l build/

drwxr-xr-x 6 csharma csharma    4096 2011-04-18 08:11 hadoop-lzo-0.4.10

-rw-r--r-- 1 csharma csharma   59855 2011-04-18 08:11 hadoop-lzo-0.4.10.jar

-rw-r--r-- 1 csharma csharma 1810286 2011-04-18 08:11
hadoop-lzo-0.4.10.tar.gz





*For pig - 0.8 support, elephant bird :*

*csharma@elephant-bird-gerritjvv $ git remote –v  *

origin     https://github.com/gerritjvv/elephant-bird.git (fetch)

origin     https://github.com/gerritjvv/elephant-bird.git (push)



https://github.com/dvryaboy/elephant-bird/tree/pig-08

*csharma@ubuntu:~/Projects/elephant-bird-dvryaboy$ git remote –v  *

origin     https://github.com/dvryaboy/elephant-bird.git (fetch)

origin     https://github.com/dvryaboy/elephant-bird.git (push)



Also, been having a lot of problems building elephant bird *without* thrift
/ protobuf.



Now I did find this project,
http://code.google.com/p/hadoop-gpl-packing/, saying
EB can handle even Pig 0.8.

This confuses me - can I or can I not use Elephant Bird with Pig 0.8?

-------------------------------------------------------------------



I created a smaple lzo compressed data file clean.txt.lzo, then indexed the
lzo file to verify correct LZO installation for my pseudo cluster,

*hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-0.4.10.jar
com.hadoop.compression.lzo.DistributedLzoIndexer /data/clean.txt.lzo*



but, when running pig jobs, with pig-lzo-loaders from elephant bird, I’ve
always been running into problems; following either just crashes or keeps
running forever, spitting gb’s of garbage data.

*grunt> register
/usr/lib/hadoop/lib/elephant-bird-2.0-SNAPSHOT.jar;
*

*grunt> d = load '/data/clean.txt.lzo' using
com.twitter.elephantbird.pig8.load.LzoTextLoader();*

|

|--Throws a boatload of errors, attached.




Please do let me know, how my configuration / versions are conflicting, and
which should I change to get this to work?






On Mon, Apr 18, 2011 at 2:52 PM, Gerrit Jansen van Vuuren <
[email protected]> wrote:

> also,
>
> If your using GPB and LZO, I use
> the com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore .
>
>
> On Mon, Apr 18, 2011 at 8:47 PM, Gerrit Jansen van Vuuren <
> [email protected]> wrote:
>
>> Hi,
>>
>> I've used LzoTextLoader int the past and it seems to hang. Haven't looked
>> into why.
>>
>> Please try the com.twitter.elephantbird.pig.load.LzoPigStorage.
>>
>> Cheers,
>>  Gerrit
>>
>> On Mon, Apr 18, 2011 at 8:23 PM, Chaitanya Sharma 
>> <[email protected]>wrote:
>>
>>> Hi,
>>>
>>> I am trying to get LZO support for my little pig - 0.8 project ,
>>> I'm using https://github.com/gerritjvv/elephant-bird.git  for the
>>> pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for the
>>> hadoop
>>> lzo support.
>>>
>>> The pig-loader,
>>> com.twitter.elephantbird.mapreduce.input.LzoTextLoaderdoesn't seem to
>>> be upto job, and the map-reduce job created doesn't proceed
>>> any further than 0%.
>>>
>>>
>>> Has anyone been successful at this, is so please share you experience.
>>>
>>> Or what patch levels / git source branches should I be using to get this
>>> to
>>> work?
>>>
>>>
>>> Thanks,
>>> Chaitanya
>>>
>>
>>
>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
  </property>

  <property>
     <name>hadoop.tmp.dir</name>
     <value>/var/lib/hadoop-0.20/cache/${user.name}</value>
  </property>

  <!-- OOZIE proxy user setting -->
  <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>

  <!-- compression codec -->
  <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
  </property>

  <!-- hadoop lzo compression codec. -->
  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>

  <!-- Enable Hue plugins -->
  <property>
    <name>mapred.jobtracker.plugins</name>
    <value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
    <description>Comma-separated list of jobtracker plug-ins to be activated.
    </description>
  </property>
  <property>
    <name>jobtracker.thrift.address</name>
    <value>0.0.0.0:9290</value>
  </property>

  <!-- native libraries! -->
  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m -Xms1024m -Djava.library.path=/usr/lib/hadoop/lib/native/Linux-i386-32</value>
    <final>true</final>
  </property>

  <!-- for intermediate compression setup -->
  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
    <final>true</final>
  </property>

  <property>
    <name>mapred.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

</configuration>

Reply via email to