> Instead, I get an error from CassandraStorage that the initial address isn't > set (on the slave, the master is ok). Can you post the full error ?
Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/01/2013, at 11:15 AM, William Oberman <ober...@civicscience.com> wrote: > Anyone ever try to read or write directly between EMR <-> Cassandra? > > I'm running various Cassandra resources in Ec2, so the "physical connection" > part is pretty easy using security groups. But, I'm having some > configuration issues. I have managed to get Cassandra + Hadoop working in > the past using a DIY hadoop cluster, and looking at the configurations in the > two environments (EMR vs DIY), I'm not sure what's different that is causing > my failures... I should probably note I'm using the Pig integration of > Cassandra. > > Versions: Hadoop 1.0.3, Pig 0.10, Cassandra 1.1.7. > > I'm 99% sure I have classpaths working (because I didn't at first, and now > EMR can find and instantiate CassandraStorage on master and slaves). What > isn't working are the system variables. In my DIY cluster, all I needed to > do was: > ------- > export PIG_INITIAL_ADDRESS=XXX > export PIG_RPC_PORT=9160 > export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner > ---------- > And the task trackers somehow magically picked up the values (I never > questioned how/why). But, in EMR, they do not. Instead, I get an error from > CassandraStorage that the initial address isn't set (on the slave, the master > is ok). > > My DIY cluster used CDH3, which was hadoop 0.20.something. So, maybe the > problem is a different version of hadoop? > > Looking at the CassandraStorage class, I realize I have no idea how it used > to work, since it only seems to look at System variables. Those variables > are set on the Job.getConfiguration object. I don't know how that part of > hadoop works though... do variables that get set on Job on the master get > propagated to the task threads? I do know that on my DIY cluster, I do NOT > set those system variables on the slaves... > > Thanks! > > will