Re: Reading Cassandra Data From Pig/Hadoop

2014-05-30 Thread Kevin Burton
There's a pig-with-cassandra script somewhere you should be using.

It adds the jars, etc.

One issue, is that you need to call register on the .jars from your pig
scripts.

Honestly, someone should write an example pig setup with modern hadoop, all
the right register commands, real UPDATE queries encoded, and explain the
whole thing.

Took me like 2 days to get working and there are also gotchas in your pig
scripts.

And the fact that the output from cql is not encoded in tuples but the
input must be is insane and maddening and VERY VERY VERY prone to error.




On Fri, May 30, 2014 at 10:10 AM, James Schappet 
wrote:

> To specify your cassandra cluster, you only need to define one node:
>
> In you profile or batch command set and export these variables:
>
> export PIG_HOME=
>
> export PIG_INITIAL_ADDRESS=localhost
>
> export PIG_RPC_PORT=9160
>
> # the partitioner must match your cassandra partitioner
> export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
>
>
>
>
> http://www.schappet.com/pig_cassandra_bulk_load/
>
> —Jimmy
>
>
>
> On May 30, 2014, at 11:50 AM, Alex McLintock  wrote:
>
> I am reasonably experienced with Hadoop and Pig but less so with
> Cassandra. I have been banging my head against the wall as all the
> documentation assumes I know something...
>
> I am using Apache's tarball of Cassandra 1.something and I see that there
> are some example pig scripts and a shell script to run them with the
> cassandra jars.
>
> What I don't understand is how you tell the pig script which machine the
> cassandra cluster talks to. You only specify the keyspace right - which
> roughly corresponds to the database/table, but not which cluster.
>
> Can you tell what I have missed? Does the hadoop nodes HAVE to be on the
> same machines as the Cassandra nodes?
>
> I am using CQL storage I think.
>
> eg
>
>
>
> -- CqlStorage
> libdata = LOAD 'cql://libdata/libout' USING CqlStorage();
>
> book_by_mail = FILTER libdata BY C_OUT_TY == 'BM';
>
> etc etc
>
>
>
> Thanks all...
>
>
>
>
>
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile


War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


Re: Reading Cassandra Data From Pig/Hadoop

2014-05-30 Thread James Schappet
To specify your cassandra cluster, you only need to define one node:

In you profile or batch command set and export these variables:

export PIG_HOME=

export PIG_INITIAL_ADDRESS=localhost

export PIG_RPC_PORT=9160

# the partitioner must match your cassandra partitioner

export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner




http://www.schappet.com/pig_cassandra_bulk_load/

—Jimmy 



On May 30, 2014, at 11:50 AM, Alex McLintock  wrote:

> I am reasonably experienced with Hadoop and Pig but less so with Cassandra. I 
> have been banging my head against the wall as all the documentation assumes I 
> know something...
> 
> I am using Apache's tarball of Cassandra 1.something and I see that there are 
> some example pig scripts and a shell script to run them with the cassandra 
> jars. 
> 
> What I don't understand is how you tell the pig script which machine the 
> cassandra cluster talks to. You only specify the keyspace right - which 
> roughly corresponds to the database/table, but not which cluster. 
> 
> Can you tell what I have missed? Does the hadoop nodes HAVE to be on the same 
> machines as the Cassandra nodes?
> 
> I am using CQL storage I think.
> 
> eg
> 
> 
> -- CqlStorage
> libdata = LOAD 'cql://libdata/libout' USING CqlStorage();
> 
> book_by_mail = FILTER libdata BY C_OUT_TY == 'BM';
> 
> etc etc
> 
> 
> 
> Thanks all...
> 
> 
> 
> 



Reading Cassandra Data From Pig/Hadoop

2014-05-30 Thread Alex McLintock
I am reasonably experienced with Hadoop and Pig but less so with Cassandra.
I have been banging my head against the wall as all the documentation
assumes I know something...

I am using Apache's tarball of Cassandra 1.something and I see that there
are some example pig scripts and a shell script to run them with the
cassandra jars.

What I don't understand is how you tell the pig script which machine the
cassandra cluster talks to. You only specify the keyspace right - which
roughly corresponds to the database/table, but not which cluster.

Can you tell what I have missed? Does the hadoop nodes HAVE to be on the
same machines as the Cassandra nodes?

I am using CQL storage I think.

eg

-- CqlStorage
libdata = LOAD 'cql://libdata/libout' USING CqlStorage();
book_by_mail = FILTER libdata BY C_OUT_TY == 'BM';
etc etc


Thanks all...


Re: pig + hadoop

2011-04-20 Thread pob
e that I
>> run the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf
>> directory and in my mapred-site.xml file found there, I set the three
>> variables.
>> >
>> > I don't use environment variables when I run against a cluster.
>> >
>> > On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote:
>> >
>> >> Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error
>> for a while before I added that.
>> >>
>> >> -Jeffrey
>> >>
>> >> From: pob [mailto:peterob...@gmail.com]
>> >> Sent: Tuesday, April 19, 2011 6:42 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: pig + hadoop
>> >>
>> >> Hey Aaron,
>> >>
>> >> I read it, and all of 3 env variables was exported. The results are
>> same.
>> >>
>> >> Best,
>> >> P
>> >>
>> >> 2011/4/20 aaron morton 
>> >> Am guessing but here goes. Looks like the cassandra RPC port is not
>> set, did you follow these steps in contrib/pig/README.txt
>> >>
>> >> Finally, set the following as environment variables (uppercase,
>> >> underscored), or as Hadoop configuration variables (lowercase, dotted):
>> >> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening
>> on
>> >> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to
>> connect to
>> >> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>> >>
>> >> Hope that helps.
>> >> Aaron
>> >>
>> >>
>> >> On 20 Apr 2011, at 11:28, pob wrote:
>> >>
>> >>
>> >> Hello,
>> >>
>> >> I did cluster configuration by
>> http://wiki.apache.org/cassandra/HadoopSupport. When I run pig
>> example-script.pig
>> >> -x local, everything is fine and i get correct results.
>> >>
>> >> Problem is occurring with -x mapreduce
>> >>
>> >> Im getting those errors :>
>> >>
>> >>
>> >> 2011-04-20 01:24:21,791 [main] ERROR
>> org.apache.pig.tools.pigstats.PigStats - ERROR:
>> java.lang.NumberFormatException: null
>> >> 2011-04-20 01:24:21,792 [main] ERROR
>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>> >> 2011-04-20 01:24:21,793 [main] INFO
>>  org.apache.pig.tools.pigstats.PigStats - Script Statistics:
>> >>
>> >> Input(s):
>> >> Failed to read data from "cassandra://Keyspace1/Standard1"
>> >>
>> >> Output(s):
>> >> Failed to produce result in
>> "hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"
>> >>
>> >> Counters:
>> >> Total records written : 0
>> >> Total bytes written : 0
>> >> Spillable Memory Manager spill count : 0
>> >> Total bags proactively spilled: 0
>> >> Total records proactively spilled: 0
>> >>
>> >> Job DAG:
>> >> job_201104200056_0005   ->  null,
>> >> null->  null,
>> >> null
>> >>
>> >>
>> >> 2011-04-20 01:24:21,793 [main] INFO
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - Failed!
>> >> 2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1066: Unable to open iterator for alias topnames. Backend error :
>> java.lang.NumberFormatException: null
>> >>
>> >>
>> >>
>> >> 
>> >> thats from jobtasks web management - error  from task directly:
>> >>
>> >> java.lang.RuntimeException: java.lang.NumberFormatException: null
>> >> at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
>> >> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
>> >> at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
>> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> >> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >> Caused by: java.lang.NumberFormatException: null
>> >> at java.lang.Integer.parseInt(Integer.java:417)
>> >> at java.lang.Integer.parseInt(Integer.java:499)
>> >> at
>> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>> >> at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
>> >> ... 5 more
>> >>
>> >>
>> >>
>> >> Any suggestions where should be problem?
>> >>
>> >> Thanks,
>> >>
>> >>
>> >>
>> >
>>
>>
>


Re: pig + hadoop

2011-04-20 Thread pob
Hi,

everything works fine with cassandra 0.7.5, but when I tried with 0.7.3
another errors showed up, but task finished with success whats strange.


2011-04-20 11:45:40,674 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201104201139_0004_m_00_3: Error: java.lang.ClassNotF
oundException: org.apache.thrift.TException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



2011-04-20 11:45:43,629 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201104201139_0004_m_01_3: org.apache.pig.backend.exe
cutionengine.ExecException: ERROR 2044: The type null cannot be collected as
a Key type
at
org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:143)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:105)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


2011-04-20 11:42:49,498 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201104201139_0001_m_00_1: Error: java.lang.ClassNotF
oundException: org.apache.commons.lang.ArrayUtils
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at
org.apache.cassandra.utils.ByteBufferUtil.(ByteBufferUtil.java:75)
at org.apache.cassandra.hadoop.pig.CassandraStorage.(Unknown
Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)





2011/4/20 Jeremy Hanna 

> Just as an example:
>
>  
>cassandra.thrift.address
>10.12.34.56
>  
>  
>cassandra.thrift.port
>9160
>  
>  
>cassandra.partitioner.class
>org.apache.cassandra.dht.RandomPartitioner
>  
>
>
> On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote:
>
> > oh yeah - that's what's going on.  what I do is on the machine that I run
> the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf
> directory and in my mapred-site.xml file found there, I set the three
> variables.
> >
> > I don't use environment variables when I run against a cluster.
> >
> > On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote:
> >
> >> Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error
> for a while before I added that.
> >>
> >> -Jeffrey
> >>

Re: pig + hadoop

2011-04-20 Thread pob
Hi,

that was the problem! Thanks, you should pick that stuff into your
documentation.


Thanks for help!


Best,
P

2011/4/20 Jeremy Hanna 

> Just as an example:
>
>  
>cassandra.thrift.address
>10.12.34.56
>  
>  
>cassandra.thrift.port
>9160
>  
>  
>cassandra.partitioner.class
>org.apache.cassandra.dht.RandomPartitioner
>  
>
>
> On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote:
>
> > oh yeah - that's what's going on.  what I do is on the machine that I run
> the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf
> directory and in my mapred-site.xml file found there, I set the three
> variables.
> >
> > I don't use environment variables when I run against a cluster.
> >
> > On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote:
> >
> >> Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error
> for a while before I added that.
> >>
> >> -Jeffrey
> >>
> >> From: pob [mailto:peterob...@gmail.com]
> >> Sent: Tuesday, April 19, 2011 6:42 PM
> >> To: user@cassandra.apache.org
> >> Subject: Re: pig + hadoop
> >>
> >> Hey Aaron,
> >>
> >> I read it, and all of 3 env variables was exported. The results are
> same.
> >>
> >> Best,
> >> P
> >>
> >> 2011/4/20 aaron morton 
> >> Am guessing but here goes. Looks like the cassandra RPC port is not set,
> did you follow these steps in contrib/pig/README.txt
> >>
> >> Finally, set the following as environment variables (uppercase,
> >> underscored), or as Hadoop configuration variables (lowercase, dotted):
> >> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening
> on
> >> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to
> connect to
> >> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
> >>
> >> Hope that helps.
> >> Aaron
> >>
> >>
> >> On 20 Apr 2011, at 11:28, pob wrote:
> >>
> >>
> >> Hello,
> >>
> >> I did cluster configuration by
> http://wiki.apache.org/cassandra/HadoopSupport. When I run pig
> example-script.pig
> >> -x local, everything is fine and i get correct results.
> >>
> >> Problem is occurring with -x mapreduce
> >>
> >> Im getting those errors :>
> >>
> >>
> >> 2011-04-20 01:24:21,791 [main] ERROR
> org.apache.pig.tools.pigstats.PigStats - ERROR:
> java.lang.NumberFormatException: null
> >> 2011-04-20 01:24:21,792 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> >> 2011-04-20 01:24:21,793 [main] INFO
>  org.apache.pig.tools.pigstats.PigStats - Script Statistics:
> >>
> >> Input(s):
> >> Failed to read data from "cassandra://Keyspace1/Standard1"
> >>
> >> Output(s):
> >> Failed to produce result in
> "hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"
> >>
> >> Counters:
> >> Total records written : 0
> >> Total bytes written : 0
> >> Spillable Memory Manager spill count : 0
> >> Total bags proactively spilled: 0
> >> Total records proactively spilled: 0
> >>
> >> Job DAG:
> >> job_201104200056_0005   ->  null,
> >> null->  null,
> >> null
> >>
> >>
> >> 2011-04-20 01:24:21,793 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> >> 2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1066: Unable to open iterator for alias topnames. Backend error :
> java.lang.NumberFormatException: null
> >>
> >>
> >>
> >> 
> >> thats from jobtasks web management - error  from task directly:
> >>
> >> java.lang.RuntimeException: java.lang.NumberFormatException: null
> >> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
> >> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
> >> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >> Caused by: java.lang.NumberFormatException: null
> >> at java.lang.Integer.parseInt(Integer.java:417)
> >> at java.lang.Integer.parseInt(Integer.java:499)
> >> at
> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
> >> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
> >> ... 5 more
> >>
> >>
> >>
> >> Any suggestions where should be problem?
> >>
> >> Thanks,
> >>
> >>
> >>
> >
>
>


Re: pig + hadoop

2011-04-19 Thread Jeremy Hanna
Just as an example:

  
cassandra.thrift.address
10.12.34.56
  
  
cassandra.thrift.port
9160
  
  
cassandra.partitioner.class
org.apache.cassandra.dht.RandomPartitioner
  


On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote:

> oh yeah - that's what's going on.  what I do is on the machine that I run the 
> pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf directory 
> and in my mapred-site.xml file found there, I set the three variables.
> 
> I don't use environment variables when I run against a cluster.
> 
> On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote:
> 
>> Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error for 
>> a while before I added that.
>> 
>> -Jeffrey
>> 
>> From: pob [mailto:peterob...@gmail.com] 
>> Sent: Tuesday, April 19, 2011 6:42 PM
>> To: user@cassandra.apache.org
>> Subject: Re: pig + hadoop
>> 
>> Hey Aaron,
>> 
>> I read it, and all of 3 env variables was exported. The results are same.
>> 
>> Best,
>> P
>> 
>> 2011/4/20 aaron morton 
>> Am guessing but here goes. Looks like the cassandra RPC port is not set, did 
>> you follow these steps in contrib/pig/README.txt
>> 
>> Finally, set the following as environment variables (uppercase,
>> underscored), or as Hadoop configuration variables (lowercase, dotted):
>> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
>> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to 
>> connect to
>> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>> 
>> Hope that helps. 
>> Aaron
>> 
>> 
>> On 20 Apr 2011, at 11:28, pob wrote:
>> 
>> 
>> Hello, 
>> 
>> I did cluster configuration by 
>> http://wiki.apache.org/cassandra/HadoopSupport. When I run pig 
>> example-script.pig 
>> -x local, everything is fine and i get correct results.
>> 
>> Problem is occurring with -x mapreduce 
>> 
>> Im getting those errors :>
>> 
>> 
>> 2011-04-20 01:24:21,791 [main] ERROR org.apache.pig.tools.pigstats.PigStats 
>> - ERROR: java.lang.NumberFormatException: null
>> 2011-04-20 01:24:21,792 [main] ERROR 
>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>> 2011-04-20 01:24:21,793 [main] INFO  org.apache.pig.tools.pigstats.PigStats 
>> - Script Statistics: 
>> 
>> Input(s):
>> Failed to read data from "cassandra://Keyspace1/Standard1"
>> 
>> Output(s):
>> Failed to produce result in 
>> "hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"
>> 
>> Counters:
>> Total records written : 0
>> Total bytes written : 0
>> Spillable Memory Manager spill count : 0
>> Total bags proactively spilled: 0
>> Total records proactively spilled: 0
>> 
>> Job DAG:
>> job_201104200056_0005   ->  null,
>> null->  null,
>> null
>> 
>> 
>> 2011-04-20 01:24:21,793 [main] INFO  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>  - Failed!
>> 2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
>> ERROR 1066: Unable to open iterator for alias topnames. Backend error : 
>> java.lang.NumberFormatException: null
>> 
>> 
>> 
>> 
>> thats from jobtasks web management - error  from task directly:
>> 
>> java.lang.RuntimeException: java.lang.NumberFormatException: null
>> at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
>> at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
>> at 
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.NumberFormatException: null
>> at java.lang.Integer.parseInt(Integer.java:417)
>> at java.lang.Integer.parseInt(Integer.java:499)
>> at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>> at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
>> ... 5 more
>> 
>> 
>> 
>> Any suggestions where should be problem?
>> 
>> Thanks,
>> 
>> 
>> 
> 



Re: pig + hadoop

2011-04-19 Thread Jeremy Hanna
oh yeah - that's what's going on.  what I do is on the machine that I run the 
pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf directory 
and in my mapred-site.xml file found there, I set the three variables.

I don't use environment variables when I run against a cluster.

On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote:

> Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error for a 
> while before I added that.
>  
> -Jeffrey
>  
> From: pob [mailto:peterob...@gmail.com] 
> Sent: Tuesday, April 19, 2011 6:42 PM
> To: user@cassandra.apache.org
> Subject: Re: pig + hadoop
>  
> Hey Aaron,
>  
> I read it, and all of 3 env variables was exported. The results are same.
>  
> Best,
> P
> 
> 2011/4/20 aaron morton 
> Am guessing but here goes. Looks like the cassandra RPC port is not set, did 
> you follow these steps in contrib/pig/README.txt
>  
> Finally, set the following as environment variables (uppercase,
> underscored), or as Hadoop configuration variables (lowercase, dotted):
> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to 
> connect to
> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>  
> Hope that helps. 
> Aaron
>  
>  
> On 20 Apr 2011, at 11:28, pob wrote:
> 
> 
> Hello, 
>  
> I did cluster configuration by 
> http://wiki.apache.org/cassandra/HadoopSupport. When I run pig 
> example-script.pig 
> -x local, everything is fine and i get correct results.
>  
> Problem is occurring with -x mapreduce 
>  
> Im getting those errors :>
>  
>  
> 2011-04-20 01:24:21,791 [main] ERROR org.apache.pig.tools.pigstats.PigStats - 
> ERROR: java.lang.NumberFormatException: null
> 2011-04-20 01:24:21,792 [main] ERROR 
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-20 01:24:21,793 [main] INFO  org.apache.pig.tools.pigstats.PigStats - 
> Script Statistics: 
>  
> Input(s):
> Failed to read data from "cassandra://Keyspace1/Standard1"
>  
> Output(s):
> Failed to produce result in 
> "hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"
>  
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>  
> Job DAG:
> job_201104200056_0005   ->  null,
> null->  null,
> null
>  
>  
> 2011-04-20 01:24:21,793 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Failed!
> 2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1066: Unable to open iterator for alias topnames. Backend error : 
> java.lang.NumberFormatException: null
>  
>  
>  
> 
> thats from jobtasks web management - error  from task directly:
>  
> java.lang.RuntimeException: java.lang.NumberFormatException: null
> at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.NumberFormatException: null
> at java.lang.Integer.parseInt(Integer.java:417)
> at java.lang.Integer.parseInt(Integer.java:499)
> at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
> at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
> ... 5 more
>  
>  
>  
> Any suggestions where should be problem?
>  
> Thanks,
>  
>  
>  



RE: pig + hadoop

2011-04-19 Thread Jeffrey Wang
Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error for a 
while before I added that.

-Jeffrey

From: pob [mailto:peterob...@gmail.com]
Sent: Tuesday, April 19, 2011 6:42 PM
To: user@cassandra.apache.org
Subject: Re: pig + hadoop

Hey Aaron,

I read it, and all of 3 env variables was exported. The results are same.

Best,
P
2011/4/20 aaron morton mailto:aa...@thelastpickle.com>>
Am guessing but here goes. Looks like the cassandra RPC port is not set, did 
you follow these steps in contrib/pig/README.txt

Finally, set the following as environment variables (uppercase,
underscored), or as Hadoop configuration variables (lowercase, dotted):
* PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
* PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect 
to
* PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

Hope that helps.
Aaron


On 20 Apr 2011, at 11:28, pob wrote:


Hello,

I did cluster configuration by http://wiki.apache.org/cassandra/HadoopSupport. 
When I run pig example-script.pig
-x local, everything is fine and i get correct results.

Problem is occurring with -x mapreduce

Im getting those errors :>


2011-04-20 01:24:21,791 [main] ERROR org.apache.pig.tools.pigstats.PigStats - 
ERROR: java.lang.NumberFormatException: null
2011-04-20 01:24:21,792 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil 
- 1 map reduce job(s) failed!
2011-04-20 01:24:21,793 [main] INFO  org.apache.pig.tools.pigstats.PigStats - 
Script Statistics:

Input(s):
Failed to read data from "cassandra://Keyspace1/Standard1"

Output(s):
Failed to produce result in "hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201104200056_0005   ->  null,
null->  null,
null


2011-04-20 01:24:21,793 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!
2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1066: Unable to open iterator for alias topnames. Backend error : 
java.lang.NumberFormatException: null




thats from jobtasks web management - error  from task directly:

java.lang.RuntimeException: java.lang.NumberFormatException: null
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:417)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
... 5 more



Any suggestions where should be problem?

Thanks,





Re: pig + hadoop

2011-04-19 Thread pob
and one more thing...

2011-04-20 04:09:23,412 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201104200406_0001/attempt_201104200406_0001_m_02_0/output/file.out
in any of the configured local directories


2011/4/20 pob 

> Thats from jobtracker:
>
>
> 2011-04-20 03:36:39,519 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing rack-local task task_201104200331_0002_m_00
> 2011-04-20 03:36:42,521 INFO org.apache.hadoop.mapred.TaskInProgress: Error
> from attempt_201104200331_0002_m_00_3: java.lang.NumberFormatException:
> null
> at java.lang.Integer.parseInt(Integer.java:417)
> at java.lang.Integer.parseInt(Integer.java:499)
> at
> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:250)
> at
> org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown
> Source)
> at
> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
> and tasktracker
>
> 2011-04-20 03:33:10,942 INFO org.apache.hadoop.mapred.TaskTracker:  Using
> MemoryCalculatorPlugin :
> org.apache.hadoop.util.LinuxMemoryCalculatorPlugin@3c1fc1a6
> 2011-04-20 03:33:10,945 WARN org.apache.hadoop.mapred.TaskTracker:
> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
> disabled.
> 2011-04-20 03:33:10,946 INFO org.apache.hadoop.mapred.IndexCache:
> IndexCache created with max memory = 10485760
> 2011-04-20 03:33:11,069 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201104200331_0001_m_00_1 task's
> state:UNASSIGNED
> 2011-04-20 03:33:11,072 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201104200331_0001_m_00_1
> 2011-04-20 03:33:11,072 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 2 and trying to launch
> attempt_201104200331_0001_m_00_1
> 2011-04-20 03:33:11,986 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201104200331_0001_m_-926908110
> 2011-04-20 03:33:11,986 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201104200331_0001_m_-926908110 spawned.
> 2011-04-20 03:33:12,400 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
> ID: jvm_201104200331_0001_m_-926908110 given task:
> attempt_201104200331_0001_m_00_1
> 2011-04-20 03:33:12,895 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201104200331_0001_m_00_1 0.0%
> 2011-04-20 03:33:12,918 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201104200331_0001_m_-926908110 exited. Number of tasks it ran: 0
> 2011-04-20 03:33:15,919 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201104200331_0001_m_00_1 done; removing files.
> 2011-04-20 03:33:15,920 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2011-04-20 03:33:38,090 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for job: job_201104200331_0001
> 2011-04-20 03:36:32,199 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201104200331_0002_m_00_2 task's
> state:UNASSIGNED
> 2011-04-20 03:36:32,199 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201104200331_0002_m_00_2
> 2011-04-20 03:36:32,199 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 2 and trying to launch
> attempt_201104200331_0002_m_00_2
> 2011-04-20 03:36:32,813 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201104200331_0002_m_-134007035
> 2011-04-20 03:36:32,814 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201104200331_0002_m_-134007035 spawned.
> 2011-04-20 03:36:33,214 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
> ID: jvm_201104200331_0002_m_-134007035 given task:
> attempt_201104200331_0002_m_00_2
> 2011-04-20 03:36:33,711 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201104200331_0002_m_00_2 0.0%
> 2011-04-20 03:36:33,731 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201104200331_0002_m_-134007035 exited. Number of tasks it ran: 0
> 2011-04-20 03:36:36,732 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201104200331_0002_m_00_2 done; removing files.
> 2011-04-20 03:36:36,733 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2011-04-20 03:36:50,210 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for 

Re: pig + hadoop

2011-04-19 Thread pob
Thats from jobtracker:


2011-04-20 03:36:39,519 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201104200331_0002_m_00
2011-04-20 03:36:42,521 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201104200331_0002_m_00_3: java.lang.NumberFormatException:
null
at java.lang.Integer.parseInt(Integer.java:417)
at java.lang.Integer.parseInt(Integer.java:499)
at
org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:250)
at
org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown
Source)
at
org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


and tasktracker

2011-04-20 03:33:10,942 INFO org.apache.hadoop.mapred.TaskTracker:  Using
MemoryCalculatorPlugin :
org.apache.hadoop.util.LinuxMemoryCalculatorPlugin@3c1fc1a6
2011-04-20 03:33:10,945 WARN org.apache.hadoop.mapred.TaskTracker:
TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
disabled.
2011-04-20 03:33:10,946 INFO org.apache.hadoop.mapred.IndexCache: IndexCache
created with max memory = 10485760
2011-04-20 03:33:11,069 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201104200331_0001_m_00_1 task's
state:UNASSIGNED
2011-04-20 03:33:11,072 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
launch : attempt_201104200331_0001_m_00_1
2011-04-20 03:33:11,072 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 2 and trying to launch
attempt_201104200331_0001_m_00_1
2011-04-20 03:33:11,986 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201104200331_0001_m_-926908110
2011-04-20 03:33:11,986 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
jvm_201104200331_0001_m_-926908110 spawned.
2011-04-20 03:33:12,400 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
ID: jvm_201104200331_0001_m_-926908110 given task:
attempt_201104200331_0001_m_00_1
2011-04-20 03:33:12,895 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201104200331_0001_m_00_1 0.0%
2011-04-20 03:33:12,918 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201104200331_0001_m_-926908110 exited. Number of tasks it ran: 0
2011-04-20 03:33:15,919 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_201104200331_0001_m_00_1 done; removing files.
2011-04-20 03:33:15,920 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2011-04-20 03:33:38,090 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201104200331_0001
2011-04-20 03:36:32,199 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201104200331_0002_m_00_2 task's
state:UNASSIGNED
2011-04-20 03:36:32,199 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
launch : attempt_201104200331_0002_m_00_2
2011-04-20 03:36:32,199 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 2 and trying to launch
attempt_201104200331_0002_m_00_2
2011-04-20 03:36:32,813 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201104200331_0002_m_-134007035
2011-04-20 03:36:32,814 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
jvm_201104200331_0002_m_-134007035 spawned.
2011-04-20 03:36:33,214 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
ID: jvm_201104200331_0002_m_-134007035 given task:
attempt_201104200331_0002_m_00_2
2011-04-20 03:36:33,711 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201104200331_0002_m_00_2 0.0%
2011-04-20 03:36:33,731 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201104200331_0002_m_-134007035 exited. Number of tasks it ran: 0
2011-04-20 03:36:36,732 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_201104200331_0002_m_00_2 done; removing files.
2011-04-20 03:36:36,733 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2011-04-20 03:36:50,210 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201104200331_0002




2011/4/20 pob 

> ad2. it works with -x local , so there cant be issue with
> pig->DB(Cassandra).
>
> im using pig-0.8 from official site + hadoop-0.20.2 from offic. site.
>
>
> thx
>
>
> 2011/4/20 aaron morton 
>
>> Am guessing but here goes. Looks like the cassandra RPC port is not set,
>> did you follow these steps in contrib/pig/README.txt
>>
>> Finally, set the following as environment variables (uppercase,
>> underscored), or as Hadoop configu

Re: pig + hadoop

2011-04-19 Thread pob
ad2. it works with -x local , so there cant be issue with
pig->DB(Cassandra).

im using pig-0.8 from official site + hadoop-0.20.2 from offic. site.


thx


2011/4/20 aaron morton 

> Am guessing but here goes. Looks like the cassandra RPC port is not set,
> did you follow these steps in contrib/pig/README.txt
>
> Finally, set the following as environment variables (uppercase,
> underscored), or as Hadoop configuration variables (lowercase, dotted):
> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to
> connect to
> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>
> Hope that helps.
> Aaron
>
>
> On 20 Apr 2011, at 11:28, pob wrote:
>
> Hello,
>
> I did cluster configuration by
> http://wiki.apache.org/cassandra/HadoopSupport. When I run
> pig example-script.pig
> -x local, everything is fine and i get correct results.
>
> Problem is occurring with -x mapreduce
>
> Im getting those errors :>
>
>
> 2011-04-20 01:24:21,791 [main] ERROR org.apache.pig.tools.pigstats.PigStats
> - ERROR: java.lang.NumberFormatException: null
> 2011-04-20 01:24:21,792 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-20 01:24:21,793 [main] INFO  org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> Input(s):
> Failed to read data from "cassandra://Keyspace1/Standard1"
>
> Output(s):
> Failed to produce result in "
> hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"
>
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
> Job DAG:
> job_201104200056_0005   ->  null,
> null->  null,
> null
>
>
> 2011-04-20 01:24:21,793 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1066: Unable to open iterator for alias topnames. Backend error :
> java.lang.NumberFormatException: null
>
>
>
> 
> thats from jobtasks web management - error  from task directly:
>
> java.lang.RuntimeException: java.lang.NumberFormatException: null
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.NumberFormatException: null
> at java.lang.Integer.parseInt(Integer.java:417)
>  at java.lang.Integer.parseInt(Integer.java:499)
> at
> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>  at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
> ... 5 more
>
>
>
> Any suggestions where should be problem?
>
> Thanks,
>
>
>


Re: pig + hadoop

2011-04-19 Thread pob
Hey Aaron,

I read it, and all of 3 env variables was exported. The results are same.

Best,
P

2011/4/20 aaron morton 

> Am guessing but here goes. Looks like the cassandra RPC port is not set,
> did you follow these steps in contrib/pig/README.txt
>
> Finally, set the following as environment variables (uppercase,
> underscored), or as Hadoop configuration variables (lowercase, dotted):
> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to
> connect to
> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>
> Hope that helps.
> Aaron
>
>
> On 20 Apr 2011, at 11:28, pob wrote:
>
> Hello,
>
> I did cluster configuration by
> http://wiki.apache.org/cassandra/HadoopSupport. When I run
> pig example-script.pig
> -x local, everything is fine and i get correct results.
>
> Problem is occurring with -x mapreduce
>
> Im getting those errors :>
>
>
> 2011-04-20 01:24:21,791 [main] ERROR org.apache.pig.tools.pigstats.PigStats
> - ERROR: java.lang.NumberFormatException: null
> 2011-04-20 01:24:21,792 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-20 01:24:21,793 [main] INFO  org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> Input(s):
> Failed to read data from "cassandra://Keyspace1/Standard1"
>
> Output(s):
> Failed to produce result in "
> hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"
>
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
> Job DAG:
> job_201104200056_0005   ->  null,
> null->  null,
> null
>
>
> 2011-04-20 01:24:21,793 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1066: Unable to open iterator for alias topnames. Backend error :
> java.lang.NumberFormatException: null
>
>
>
> 
> thats from jobtasks web management - error  from task directly:
>
> java.lang.RuntimeException: java.lang.NumberFormatException: null
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.NumberFormatException: null
> at java.lang.Integer.parseInt(Integer.java:417)
>  at java.lang.Integer.parseInt(Integer.java:499)
> at
> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>  at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
> ... 5 more
>
>
>
> Any suggestions where should be problem?
>
> Thanks,
>
>
>


Re: pig + hadoop

2011-04-19 Thread aaron morton
Am guessing but here goes. Looks like the cassandra RPC port is not set, did 
you follow these steps in contrib/pig/README.txt

Finally, set the following as environment variables (uppercase,
underscored), or as Hadoop configuration variables (lowercase, dotted):
* PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
* PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect 
to
* PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

Hope that helps. 
Aaron


On 20 Apr 2011, at 11:28, pob wrote:

> Hello, 
> 
> I did cluster configuration by 
> http://wiki.apache.org/cassandra/HadoopSupport. When I run pig 
> example-script.pig 
> -x local, everything is fine and i get correct results.
> 
> Problem is occurring with -x mapreduce 
> 
> Im getting those errors :>
> 
> 
> 2011-04-20 01:24:21,791 [main] ERROR org.apache.pig.tools.pigstats.PigStats - 
> ERROR: java.lang.NumberFormatException: null
> 2011-04-20 01:24:21,792 [main] ERROR 
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-20 01:24:21,793 [main] INFO  org.apache.pig.tools.pigstats.PigStats - 
> Script Statistics: 
> 
> Input(s):
> Failed to read data from "cassandra://Keyspace1/Standard1"
> 
> Output(s):
> Failed to produce result in 
> "hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"
> 
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> 
> Job DAG:
> job_201104200056_0005   ->  null,
> null->  null,
> null
> 
> 
> 2011-04-20 01:24:21,793 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Failed!
> 2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1066: Unable to open iterator for alias topnames. Backend error : 
> java.lang.NumberFormatException: null
> 
> 
> 
> 
> thats from jobtasks web management - error  from task directly:
> 
> java.lang.RuntimeException: java.lang.NumberFormatException: null
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>   at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.NumberFormatException: null
>   at java.lang.Integer.parseInt(Integer.java:417)
>   at java.lang.Integer.parseInt(Integer.java:499)
>   at 
> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
>   ... 5 more
> 
> 
> 
> Any suggestions where should be problem?
> 
> Thanks,
> 



pig + hadoop

2011-04-19 Thread pob
Hello,

I did cluster configuration by
http://wiki.apache.org/cassandra/HadoopSupport. When I run
pig example-script.pig
-x local, everything is fine and i get correct results.

Problem is occurring with -x mapreduce

Im getting those errors :>


2011-04-20 01:24:21,791 [main] ERROR org.apache.pig.tools.pigstats.PigStats
- ERROR: java.lang.NumberFormatException: null
2011-04-20 01:24:21,792 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-04-20 01:24:21,793 [main] INFO  org.apache.pig.tools.pigstats.PigStats
- Script Statistics:

Input(s):
Failed to read data from "cassandra://Keyspace1/Standard1"

Output(s):
Failed to produce result in
"hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201104200056_0005   ->  null,
null->  null,
null


2011-04-20 01:24:21,793 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1066: Unable to open iterator for alias topnames. Backend error :
java.lang.NumberFormatException: null




thats from jobtasks web management - error  from task directly:

java.lang.RuntimeException: java.lang.NumberFormatException: null
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:417)
at java.lang.Integer.parseInt(Integer.java:499)
at
org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
... 5 more



Any suggestions where should be problem?

Thanks,