Re: 0.8 loosing nodes?

2011-04-25 Thread Terje Marthinussen
Got just enough time to look at this done today to verify that: Sometimes nodes (under pressure) fails to send heartbeats for long enough to get marked as dead by other nodes (why is a good question, which I need to check better. Does not seem to be GC). The node does however start sending

Re: 0.8 loosing nodes?

2011-04-25 Thread Jonathan Ellis
I bet the problem is with the other tasks on the executor that Gossip heartbeat runs on. I see at least two that could cause blocking: hint cleanup post-delivery and flush-expired-memtables, both of which call forceFlush which will block if the flush queue + threads are full. We've run into this

0.8 loosing nodes?

2011-04-24 Thread Terje Marthinussen
World as seen from .81 in the below ring .81 Up Normal 85.55 GB8.33% Token(bytes[30]) .82 Down Normal 83.23 GB8.33% Token(bytes[313230]) .83 Up Normal 70.43 GB8.33% Token(bytes[313437]) .84 Up Normal 81.7 GB 8.33%