> Instead, I get an error from CassandraStorage that the initial address isn't 
> set (on the slave, the master is ok). 
Can you post the full error ?

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/01/2013, at 11:15 AM, William Oberman <ober...@civicscience.com> wrote:

> Anyone ever try to read or write directly between EMR <-> Cassandra?  
> 
> I'm running various Cassandra resources in Ec2, so the "physical connection" 
> part is pretty easy using security groups.  But, I'm having some 
> configuration issues.  I have managed to get Cassandra + Hadoop working in 
> the past using a DIY hadoop cluster, and looking at the configurations in the 
> two environments (EMR vs DIY), I'm not sure what's different that is causing 
> my failures...  I should probably note I'm using the Pig integration of 
> Cassandra.
> 
> Versions: Hadoop 1.0.3, Pig 0.10, Cassandra 1.1.7.
> 
> I'm 99% sure I have classpaths working (because I didn't at first, and now 
> EMR can find and instantiate CassandraStorage on master and slaves).  What 
> isn't working are the system variables.  In my DIY cluster, all I needed to 
> do was:
> -------
> export PIG_INITIAL_ADDRESS=XXX
> export PIG_RPC_PORT=9160
> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> ----------
> And the task trackers somehow magically picked up the values (I never 
> questioned how/why).  But, in EMR, they do not.  Instead, I get an error from 
> CassandraStorage that the initial address isn't set (on the slave, the master 
> is ok).  
> 
> My DIY cluster used CDH3, which was hadoop 0.20.something.  So, maybe the 
> problem is a different version of hadoop?  
> 
> Looking at the CassandraStorage class, I realize I have no idea how it used 
> to work, since it only seems to look at System variables.  Those variables 
> are set on the Job.getConfiguration object.  I don't know how that part of 
> hadoop works though... do variables that get set on Job on the master get 
> propagated to the task threads?  I do know that on my DIY cluster, I do NOT 
> set those system variables on the slaves...
> 
> Thanks!
> 
> will

Reply via email to