In one of our test clusters we had a damaged commit log disks in one of the nodes.
We have replication factor = 2 in this cluster, and write with consistency level = ONE. So we expected writes will not be affected by such an issue. But what actually happened is that the client that was writing with CL.ONE got stuck. The client could resume writing when we stopped the server with the faulty disk (so this is another indication it's not a replication factor or consistency level issue). We are running Cassandra 0.7.6, and the client we're using is Hector. Can anyone explain what happened here? Why the client got stuck when the commit log disk on one of the servers damaged (and could resume writing if we actually took off that server)?