Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT variable
still.
If you're running this in <cassandra_src>/contrib/pig:
'bin/pig_cassandra -x local myscript.pig'
then you should only need to set PIG_HOME, and the other environment variables
for connecting to cassandra.
If you want to run it against a cluster, what I've done is had a hadoop
configuration locally and point PIG_CONF to <hadoop_home>/conf and put those
three variables in the mapred-site.xml like this:
<property>
<name>cassandra.thrift.address</name>
<value>123.45.67.89</value>
</property>
<property>
<name>cassandra.thrift.port</name>
<value>9160</value>
</property>
<property>
<name>cassandra.partitioner.class</name>
<value>org.apache.cassandra.dht.RandomPartitioner</value>
</property>
I would make sure you can get it to run locally first though.
On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:
> Hi,
>
> I had a bad enviroment variable
> PIG_PARTITIONER=RandomPartitioner
> instead of
> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> but I correct this and still not working. I have the same error
>
> Just in case I have this on my ~/.bash_profile
>
> export HADOOPDIR=/etc/hadoop-0.20/conf
> export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
> export CLASSPATH=$HADOOPDIR:$CLASSPATH
>
> export PIG_CONF_DIR=$HADOOPDIR
> export PIG_CLASSPATH=/etc/hadoop/conf
> export PIG_CONF_DIR=$HADOOPDIR
>
> export PIG_INITIAL_ADDRESS=localhost
> export PIG_RPC_PORT=9160
> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>
>
> BTW I'm using the pig version that comes with Cassandra, the one in
> cassandra/contrib/pig
>
> Thanks for your time Jeremy! :)
> Fabio
>
> On 05/04/2011, at 17:04, Jeremy Hanna wrote:
>
>> Fabio,
>>
>> It looks like you need to set your environment variables to connect to
>> cassandra. Check out the readme. Quoting here:
>> Finally, set the following as environment variables (uppercase,
>> underscored), or as Hadoop configuration variables (lowercase, dotted):
>> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
>> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to
>> connect to
>> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>>
>> So you'll probably want to do:
>> export PIG_INITIAL_ADDRESS=localhost
>> export PIG_RPC_PORT=9160
>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>
>> Tante belle cose and let me know if this doesn't work,
>>
>> Jeremy
>>
>> On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:
>>
>>> Hi Jeremy,
>>>
>>> Of course, here it is:
>>>
>>> Backend error message
>>> ---------------------
>>> java.lang.NumberFormatException: null
>>> at java.lang.Integer.parseInt(Integer.java:417)
>>> at java.lang.Integer.parseInt(Integer.java:499)
>>> at
>>> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>>> at
>>> org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown
>>> Source)
>>> at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>>> Source)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>> at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>>
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 2997: Unable to recreate exception from backed error:
>>> java.lang.NumberFormatException: null
>>>
>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
>>> open iterator for alias A. Backend error : Unable to recreate exception
>>> from backed error: java.lang.NumberFormatException: null
>>> at org.apache.pig.PigServer.openIterator(PigServer.java:742)
>>> at
>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>> at
>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>> at
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>> at
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>> at org.apache.pig.Main.run(Main.java:465)
>>> at org.apache.pig.Main.main(Main.java:107)
>>> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
>>> 2997: Unable to recreate exception from backed error:
>>> java.lang.NumberFormatException: null
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>> at
>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>> at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>> at org.apache.pig.PigServer.store(PigServer.java:816)
>>> at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>> ... 7 more
>>> ================================================================================
>>>
>>>
>>> Thanks for all,
>>> Fabio
>>>
>>>
>>> On 05/04/2011, at 16:19, Jeremy Hanna wrote:
>>>
>>>> Fabio,
>>>>
>>>> Could you post the full stack trace that's found in the pig_<long
>>>> number>.log that's in the directory that you ran pig?
>>>>
>>>> Thanks,
>>>>
>>>> Jeremy
>>>>
>>>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read
>>>>> data from cassandra. I write a simple query just to test:
>>>>>
>>>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING
>>>>> org.apache.cassandra.hadoop.pig.CassandraStorage();
>>>>>
>>>>> grunt> dump A;
>>>>>
>>>>>
>>>>> And i'm getting the following error:
>>>>> ==========================================================================
>>>>> 2011-04-05 15:33:57,669 [main] INFO
>>>>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>>>>> script: UNKNOWN
>>>>> 2011-04-05 15:33:57,669 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>>>> pig.usenewlogicalplan is set to true. New logical plan will be used.
>>>>> 2011-04-05 15:33:57,819 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
>>>>> A:
>>>>> Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage)
>>>>> - scope-1 Operator Key: scope-1)
>>>>> 2011-04-05 15:33:57,850 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
>>>>> File concatenation threshold: 100 optimistic? false
>>>>> 2011-04-05 15:33:57,877 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - MR plan size before optimization: 1
>>>>> 2011-04-05 15:33:57,877 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>>> - MR plan size after optimization: 1
>>>>> 2011-04-05 15:33:57,969 [main] INFO
>>>>> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
>>>>> to the job
>>>>> 2011-04-05 15:33:57,990 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to default
>>>>> 0.3
>>>>> 2011-04-05 15:34:03,376 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>>> - Setting up single store job
>>>>> 2011-04-05 15:34:03,416 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>> - 1 map-reduce job(s) waiting for submission.
>>>>> 2011-04-05 15:34:03,929 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>> - 0% complete
>>>>> 2011-04-05 15:34:04,597 [Thread-5] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>>>>> input paths (combined) to process : 1
>>>>> 2011-04-05 15:34:05,942 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>> - HadoopJobId: job_201104051459_0008
>>>>> 2011-04-05 15:34:05,943 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>> - More information at:
>>>>> http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>>>>> 2011-04-05 15:34:35,912 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>> - job job_201104051459_0008 has failed! Stop running all dependent jobs
>>>>> 2011-04-05 15:34:35,918 [main] INFO
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>> - 100% complete
>>>>> 2011-04-05 15:34:35,931 [main] ERROR
>>>>> org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate
>>>>> exception from backed error: java.lang.NumberFormatException: null
>>>>> 2011-04-05 15:34:35,931 [main] ERROR
>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>>>>> 2011-04-05 15:34:35,933 [main] INFO
>>>>> org.apache.pig.tools.pigstats.PigStats - Script Statistics:
>>>>>
>>>>> HadoopVersion PigVersion UserId StartedAt FinishedAt
>>>>> Features
>>>>> 0.20.2-CDH3B4 0.8.0-SNAPSHOT root 2011-04-05 15:33:57
>>>>> 2011-04-05 15:34:35 UNKNOWN
>>>>>
>>>>> Failed!
>>>>>
>>>>> Failed Jobs:
>>>>> JobId Alias Feature Message Outputs
>>>>> job_201104051459_0008 A MAP_ONLY Message: Job failed!
>>>>> Error - NA hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>>>>>
>>>>> Input(s):
>>>>> Failed to read data from "cassandra://msg_keyspace/messages"
>>>>>
>>>>> Output(s):
>>>>> Failed to produce result in
>>>>> "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>>>>> ==========================================================================
>>>>>
>>>>> Any idea how to fix this?
>>>>> Cheers
>>>>
>>>
>>
>