Re: Pig & Cassandra integration

Jeremy Hanna Tue, 02 Aug 2011 07:01:13 -0700

afaik, amazon still uses Pig 0.6 on emr, though they've said they were in the 
process of upgrading in discussion threads.
http://aws.amazon.com/elasticmapreduce/faqs/#pig-7
https://forums.aws.amazon.com/thread.jspa?messageID=233903&#249998


Pig 0.6 doesn't have the concept of loadfunc/storefunc, which was added in 0.7. 
 That's the extension point that Cassandra uses.

I've heard that you can just deploy a newer version of pig yourself in your emr 
cluster, but I haven't messed with doing that.  We just went with our own 
cluster in ec2 so that we would control versions after we got some odd errors 
with emr that we couldn't track down or reproduce.

Sorry I can't be of more help there.

On Aug 2, 2011, at 7:40 AM, Shai Harel wrote:

> Jeremy, where you able to make it run on AMAZON elastic map reduce
> machines?
> 
> i'v tried to copy the jars (both pig's and cassandra) to the new machine
> set the PIG_HOME environment variable
> even added the hadoop config files to the class path
> and I'm getting this error
> 
> Error before Pig is launched
> ----------------------------
> ERROR 2999: Unexpected internal error. Failed to create DataStorage
> 
> java.lang.RuntimeException: Failed to create DataStorage
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
>        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:213)
>        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:133)
>        at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
>        at org.apache.pig.PigServer.<init>(PigServer.java:225)
>        at org.apache.pig.PigServer.<init>(PigServer.java:214)
>        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
>        at org.apache.pig.Main.run(Main.java:462)
>        at org.apache.pig.Main.main(Main.java:107)
> Caused by: java.io.IOException: Call to
> ip-10-56-51-167.eu-west-1.compute.internal/10.56.51.167:9000 failed on local
> exception: java.io.EOFExc
> eption
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
>        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>        at $Proxy0.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
>        at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>        at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
>        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
>        at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
>        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
>        ... 9 more
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812)
>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720)
> ================================================================================
> 
> Amazon claims to run hadoop v 0.20, what am i doing wrong?
> 
> 
> 
> On Mon, Aug 1, 2011 at 5:55 PM, Jeremy Hanna 
> <[email protected]>wrote:
> 
>> Ah - just saw this, glad you got it working - cheers.
>> 
>> On Aug 1, 2011, at 5:43 AM, Shai Harel wrote:
>> 
>>> hey all, i'v successfully fixed this problem,
>>> i was missing the cassandra jars,
>>> so you actually need to build cassandra (ant) and then you need to jar it
>>> (ant jar)
>>> and only then it'll work
>>> 
>>> BTW if you have hue installed, remove it first!
>>> 
>>> 
>>> 
>>> On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel <[email protected]>
>> wrote:
>>> 
>>>> thanks for the help, i'v tried to be conservative and i'm using pig 0.8
>> &
>>>> cassandra 0.8
>>>> and still getting this error
>>>> 
>>>> Pig Stack Trace
>>>> ---------------
>>>> ERROR 2998: Unhandled internal error. Could not initialize class
>>>> org.apache.cassandra.thrift.SliceRange
>>>> 
>>>> java.lang.NoClassDefFoundError: Could not initialize class
>>>> org.apache.cassandra.thrift.SliceRange
>>>> 
>>>>   at
>> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>>>> Source)
>>>>   at
>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>>>>   at
>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>>>>   at
>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>>>>   at
>>>> 
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>>   at
>>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>>   at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>>   at org.apache.pig.PigServer.store(PigServer.java:816)
>>>>   at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>>   at
>>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>   at
>>>> 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>   at
>>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>   at
>>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>   at org.apache.pig.Main.run(Main.java:465)
>>>>   at org.apache.pig.Main.main(Main.java:107)
>>>> 
>>>> does anyone else have this problem?
>>>> 
>>>> 
>>>> 
>>>> On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <
>> [email protected]>wrote:
>>>> 
>>>>> Try following this and see if it helps getting started:
>>>>> https://github.com/jeromatron/pygmalion/wiki/Getting-Started
>>>>> 
>>>>> I haven't tried it with 0.9 yet but I plan to this week.  We use the
>>>>> CassandraStorage jar in production.  If you can, validate your data
>> with
>>>>> Cassandra's schema validators.  CassandraStorage gets the schema from
>>>>> Cassandra and tries to unmarshal the data into Pig data types with the
>>>>> schema information.
>>>>> 
>>>>> See if that helps.
>>>>> 
>>>>> On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:
>>>>> 
>>>>>> hey all, i'v been trying to query cassandra using my pig script,
>>>>>> so i used the contrib jar from cassandra. and i'm getting the
>> following
>>>>>> error...
>>>>>> some thrift failure err.... :|
>>>>>> 
>>>>>> ERROR 2998: Unhandled internal error.
>>>>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>>>>> 
>>>>>> java.lang.NoSuchMethodError:
>>>>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>>>>>  at
>>>>> org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
>>>>>>  at
>>>>> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>>>>>> Source)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>>>>  at
>>>>>> 
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>>>>  at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>>>>  at org.apache.pig.PigServer.store(PigServer.java:816)
>>>>>>  at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>>>>  at
>>>>>> 
>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>>>  at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>>>  at org.apache.pig.Main.run(Main.java:465)
>>>>>>  at org.apache.pig.Main.main(Main.java:107)
>>>>>> 
>>>>>> 
>>>>>> does anyone managed to get this up and running?
>>>>>> i'm considering to rewrite the CassandraStorage.jar using Hector,
>>>>>> Any thoughts about that?
>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Pig & Cassandra integration

Reply via email to