Sorry for such a late reply. I'm not always keeping up with the mailing list.
> Is the following scenario covered by 2388? I have a test cluster of 6 > nodes with a replication factor of 3. Each server can execute hadoop > tasks. 1 cassandra node is down for the test. > > The job is kicked off from node 1 jobtracker. > A task is executed from node 1, and fails because the local cassandra > instance is down > retry on node 6, this tries to connect to node 1 and fails > retry on node 5, this tries to connect to node 1 and fails > retry on node 4, this tries to connect to node 1 and fails > After 4 failures the task is killed and the job fails. > > Node 2 and 3 which contain the other replicas never run the task. The > node selection seems to be random. I can modify the cassandra code to > check connectivity in ColumnFamilyRecordReader but I suspect this is > fixing the wrong problem. There are two problems here. 1) hadoop's jobtracker isn't preferencing tasks to tasktracker that would provide data locality. 2) connection replica nodes are never attempted directly, instead the task must fail and be re-submitted to another tasktracker which hopefully is a replica node. > [snip] but this comment from mck seems to say it should work > http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/% > 3C1315253057.7466.222.camel@localhost%3E not in your case. ColumnFamilyInputFormat splits the query into InputSplits. This is done via the api calls describe_ring and describe_splits. These InputSplits (ColumnFamilySplit) each has a list of locations which are the replica nodes. Now hadoop is supposed to preference sending tasks to tasktrackers based on the split's location. This is problem (1). I haven't seen it actually work. The closest information i got is http://abel-perez.com/hadoop-task-assignment Problem (2) is ColumnFamilyRecordReader.getLocation() returns you the address from the list of locations for the current split that matches the localhost. This preferences data locality. If none of the locations is local then it simply returns the first location in the list. This explains your use case not working. One fix for you to experiment with is to increase the allowed task failures (i think it is mapred.max.tracker.failures) to the number of nodes you have. Then each node would be (randomly) tried before the task killed and job failed. ~mck -- "Friendship with the upright, with the truthful and with the well informed is beneficial. Friendship with those who flatter, with those who are meek and who compromise with principles, and with those who talk cleverly is harmful." Confucius | http://github.com/finn-no | http://tech.finn.no |
signature.asc
Description: This is a digitally signed message part