Sorry for such a late reply. I'm not always keeping up with the mailing
list.

> Is the following scenario covered by 2388? I have a test cluster of 6
> nodes with a replication factor of 3. Each server can execute hadoop
> tasks. 1 cassandra node is down for the test.
> 
> The job is kicked off from node 1 jobtracker.
> A task is executed from node 1, and fails because the local cassandra
> instance is down
> retry on node 6, this tries to connect to node 1 and fails
> retry on node 5, this tries to connect to node 1 and fails
> retry on node 4, this tries to connect to node 1 and fails
> After 4 failures the task is killed and the job fails.
> 
> Node 2 and 3 which contain the other replicas never run the task. The
> node selection seems to be random. I can modify the cassandra code to
> check connectivity in ColumnFamilyRecordReader but I suspect this is
> fixing the wrong problem.

There are two problems here.

1) hadoop's jobtracker isn't preferencing tasks to tasktracker that
would provide data locality.

2) connection replica nodes are never attempted directly, instead the
task must fail and be re-submitted to another tasktracker which
hopefully is a replica node.

> [snip] but this comment from mck seems to say it should work
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%
> 3C1315253057.7466.222.camel@localhost%3E

not in your case. 
ColumnFamilyInputFormat splits the query into InputSplits. This is done
via the api calls describe_ring and describe_splits. These InputSplits
(ColumnFamilySplit) each has a list of locations which are the replica
nodes.

Now hadoop is supposed to preference sending tasks to tasktrackers based
on the split's location. This is problem (1). I haven't seen it actually
work. The closest information i got is
http://abel-perez.com/hadoop-task-assignment

Problem (2) is ColumnFamilyRecordReader.getLocation() returns you the
address from the list of locations for the current split that matches
the localhost. This preferences data locality. If none of the locations
is local then it simply returns the first location in the list. This
explains your use case not working. One fix for you to experiment with
is to increase the allowed task failures (i think it is
mapred.max.tracker.failures) to the number of nodes you have. Then each
node would be (randomly) tried before the task killed and job failed.

~mck


-- 
"Friendship with the upright, with the truthful and with the well
informed is beneficial. Friendship with those who flatter, with those
who are meek and who compromise with principles, and with those who talk
cleverly is harmful." Confucius 

| http://github.com/finn-no | http://tech.finn.no |

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to