Yohan, Sounds like you're almost there-- You need to register both EB and Mahout jars so that when SequenceFileLoader class-loads VectorWritableConverter, the Mahout VectorWritable and Vector classes (and all of their dependencies) are also available.
Andy On Tue, May 15, 2012 at 7:59 AM, Yohan Chin <[email protected]> wrote: > Andy, > thanks for your response. > > I've tried it again with your suggestion. > still error (as below). seems like, need to solve "mahout class" > dependency which used in VectorWritableConverter. > > When I set-up elephant-bird, followed " > https://github.com/kevinweil/elephant-bird" and completed quick-start and > protocol-buffer, thrift 0.5 dependencies. > so got path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar > > in the pig code, register path/to/build/elephant-bird-2.2.3-SNAPSHOT.jar > > Should I set-up for mahout-class dependencies separately? > > Thanks! > > > error message) > > Unexpected internal error. could not instantiate > 'com.twitter.elephantbird.pig.load.SequenceFileLoader' with arguments '[-c > com.twitter.elephantbird.pig.util.IntWritableConverter, -c > com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- -sparse]' > > > Caused by: java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247) > at > org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426) > at > com.twitter.elephantbird.pig.load.SequenceFileLoader.getWritableConverter(SequenceFileLoader.java:233) > at > com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:152) > at > com.twitter.elephantbird.pig.load.SequenceFileLoader.<init>(SequenceFileLoader.java:175) > ... 21 more > Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > > On May 15, 2012, at 7:01 AM, Andy Schlaikjer wrote: > > > Yohan, that's a typo in VectorWritableConverter javadoc. I'll update > today. > > > > The SequenceFileStorage and ...Loader classes are in separate packages: > > > > com.twitter.elephantbird.pig.*load*.SequenceFileLoader< > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java > > > > com.twitter.elephantbird.pig.*store*.SequenceFileStorage< > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java > > > > > > Both of these classes rely on the > > WritableConverter< > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/WritableConverter.java > >interface. > > They classload converters at runtime, given the classname of the > > converters you'd like to use for key and value Writable instances. When > > dealing with SequenceFile<IntWritable, VectorWritable> data, do this: > > > > {{{ > > > > %declare SEQFILE_LOADER > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader'; > > %declare INT_CONVERTER > > 'com.twitter.elephantbird.pig.util.IntWritableConverter'; > > %declare VECTOR_CONVERTER > > 'com.twitter.elephantbird.pig.mahout.VectorWritableConverter'; > > > > pair = LOAD '$INPUT_PATH' USING $SEQFILE_LOADER ( > > '-c $INT_CONVERTER', > > '-c $VECTOR_CONVERTER -- -sparse' > > ); > > > > }}} > > > > Hope this helps! > > > > Andy > > > > > > On Mon, May 14, 2012 at 11:57 PM, Ted Dunning <[email protected]> > wrote: > >> Sounds like a class path issue. > >> > >> Sent from my iPhone > >> > >> On May 15, 2012, at 2:43 AM, Yohan Chin <[email protected]> wrote: > >> > >>> > >>> Hi, > >>> Recently, I've tried to utilize elephant-bird for loading mahout result > > into pig. > >>> I could install elephant-bird and got .jar file. > >>> and followed instructions as appears in below; (written by Andy > > Schlaikjer) > >>> > > > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/mahout/VectorWritableConverter.java > >>> ex) > >>> pair = LOAD '$data' USING > > com.twitter.elephantbird.pig.store.SequenceFileLoader ( > >>> '-c $INT_CONVERTER', > >>> '-c $VECTOR_CONVERTER -- -dense -cardinality 2' > >>> ); > >>> however, there is no sequenceFileLoader in store folder, and > > load/sequencefileloader.java doesn't import > > "com.twitter.elephantbird.pig.mahout.VectorWritableConverter" > >>> > >>> Is there any points I've missed? > >>> > >>> Thanks a lot for this awesome api! > >>> > >
