it worked after adding one more dependency jar

REGISTER /path/to/lephant-bird/elephant-bird/guava-r07.jar

thanks andy!

On May 15, 2012, at 8:29 AM, Andy Schlaikjer wrote:

> Looking at my setup, I register Mahout jars for mahout-collections,
> mahout-math, and mahout-core when using VectorWritableConverter, so the set
> of register statements might look something like this:
> 
> {{{
> 
> REGISTER 'hdfs:///path/to/jars/com.twitter-elephant-bird-*.jar';
> REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-collections-*.jar';
> REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-math-*.jar';
> REGISTER 'hdfs:///path/to/jars/org.apache.mahout-mahout-core-*.jar';
> 
> }}}
> 
> 
> On Tue, May 15, 2012 at 8:15 AM, Andy Schlaikjer <
> [email protected]> wrote:
> 
>> Yohan, Sounds like you're almost there--
>> 
>> You need to register both EB and Mahout jars so that when
>> SequenceFileLoader class-loads VectorWritableConverter, the Mahout
>> VectorWritable and Vector classes (and all of their dependencies) are also
>> available.
>> 
>> Andy
>> 
>> 
>> On Tue, May 15, 2012 at 7:59 AM, Yohan Chin <[email protected]> wrote:
>> 
>>> Andy,
>>> thanks for your response.
>>> 
>>> I've tried it again with your suggestion.
>>> still error (as below). seems like, need to solve "mahout class"
>>> dependency which used in VectorWritableConverter.
>>> 
>>> When I set-up elephant-bird, followed  "
>>> https://github.com/kevinweil/elephant-bird"; and completed quick-start
>>> and protocol-buffer, thrift 0.5 dependencies.
>>> so got  path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>>> 
>>> in the pig code, register path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar
>>> 
>>> Should I set-up for mahout-class dependencies separately?
>>> 
>>> Thanks!
>>> 
>>> 
>>> error message)
>>> 
>>> Unexpected internal error. could not instantiate
>>> 'com.twitter.elephantbird.pig.load.SequenceFileLoader' with arguments '[-c
>>> com.twitter.elephantbird.pig.util.IntWritableConverter, -c
>>> com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- -sparse]'
>>> 
>>> 
>>> Caused by: java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
>>>       at java.lang.Class.forName0(Native Method)
>>>       at java.lang.Class.forName(Class.java:247)
>>>       at
>>> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
>>>       at
>>> com.twitter.elephantbird.pig.load.SequenceFileLoader.getWritableConverter(SequenceFileLoader.java:233)
>>>       at
>>> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:152)
>>>       at
>>> com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:175)
>>>       ... 21 more
>>> Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>       at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>       at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> 
>>> 
>>> On May 15, 2012, at 7:01 AM, Andy Schlaikjer wrote:
>>> 
>>>> Yohan, that's a typo in VectorWritableConverter javadoc. I'll update
>>> today.
>>>> 
>>>> The SequenceFileStorage and ...Loader classes are in separate packages:
>>>> 
>>>> com.twitter.elephantbird.pig.*load*.SequenceFileLoader<
>>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
>>>> 
>>>> com.twitter.elephantbird.pig.*store*.SequenceFileStorage<
>>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
>>>> 
>>>> 
>>>> Both of these classes rely on the
>>>> WritableConverter<
>>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java
>>>> interface.
>>>> They classload converters at runtime, given the classname of the
>>>> converters you'd like to use for key and value Writable instances. When
>>>> dealing with SequenceFile<IntWritable, VectorWritable> data, do this:
>>>> 
>>>> {{{
>>>> 
>>>> %declare SEQFILE_LOADER
>>>> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>>>> %declare INT_CONVERTER
>>>> 'com.twitter.elephantbird.pig.util.IntWritableConverter';
>>>> %declare VECTOR_CONVERTER
>>>> 'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';
>>>> 
>>>> pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER (
>>>> '-c $INT_CONVERTER',
>>>> '-c $VECTOR_CONVERTER -- -sparse'
>>>> );
>>>> 
>>>> }}}
>>>> 
>>>> Hope this helps!
>>>> 
>>>> Andy
>>>> 
>>>> 
>>>> On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <[email protected]>
>>> wrote:
>>>>> Sounds like a class path issue.
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On May 15, 2012, at 2:43 AM, Yohan Chin <[email protected]> wrote:
>>>>> 
>>>>>> 
>>>>>> Hi,
>>>>>> Recently, I've tried to utilize elephant-bird for loading mahout
>>> result
>>>> into pig.
>>>>>> I could install elephant-bird and got .jar file.
>>>>>> and followed instructions as appears in below; (written by Andy
>>>> Schlaikjer)
>>>>>> 
>>>> 
>>> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java
>>>>>> ex)
>>>>>> pair = LOAD '$data' USING
>>>> com.twitter.elephantbird.pig.store.SequenceFileLoader (
>>>>>> '-c $INT_CONVERTER',
>>>>>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2'
>>>>>> );
>>>>>> however,  there is no sequenceFileLoader in store folder,  and
>>>> load/sequencefileloader.java doesn't import
>>>> "com.twitter.elephantbird.pig.mahout.VectorWritableConverter"
>>>>>> 
>>>>>> Is there any points I've missed?
>>>>>> 
>>>>>> Thanks a lot for this awesome api!
>>>>>> 
>>> 
>>> 
>> 

Reply via email to