The last patch on that ticket is what we're running in prod. Its
working well for us with disk_failure_mode: readwrite. In the case of
filesystem errors the node shuts off thrift and gossip. While the
gossip is propagating we can continue to serve some reads out of the
caches.

-ryan

On Tue, Aug 2, 2011 at 9:27 AM, Jim Ancona <j...@anconafamily.com> wrote:
> On Mon, Aug 1, 2011 at 6:12 PM, Ryan King <r...@twitter.com> wrote:
>> On Fri, Jul 29, 2011 at 12:02 PM, Chris Burroughs
>> <chris.burrou...@gmail.com> wrote:
>>> On 07/25/2011 01:53 PM, Ryan King wrote:
>>>> Actually I was wrong– our patch will disable gosisp and thrift but
>>>> leave the process running:
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-2118
>>>>
>>>> If people are interested in that I can make sure its up to date with
>>>> our latest version.
>>>
>>> Thanks Ryan.
>>>
>>> /me expresses interest.
>
> /me too!
>
>>>
>>> Zombie nodes when the file system does something "interesting" are not fun.
>>
>> In our experience this only gets triggered on hardware failures that
>> would otherwise seriously degrade the performance or cause lots of
>> errors.
>>
>> After the nodes traffic coalesces we get an alert which we can then deal 
>> with.
>>
>> -ryan
>>
>

Reply via email to