Hello, One of the nodes of our Analytics DC is dead, but ColumnFamilyInputFormat (CFIF) still assigns Hadoop input splits to it. This leads to many failed tasks and consequently a failed job.
* Tasks fail with: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Failed to open a transport to XX.75:9160. (obviously, the node is dead) * Job fails with: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201404180250_4207_m_000079 We use RF=2 and CL=LOCAL_ONE for hadoop jobs, C* 1.2.16. Is this expected behavior? I checked CFIF code, but it always assigns input splits to all the ring nodes, no matter if the node is dead or alive. What we do to fix is patch CFIF to blacklist the dead node, but this is not very automatic procedure. Am I not getting something here? Cheers, -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br <http://www.chaordic.com.br/>* +55 48 3232.3200