Re: Cluster not accepting insert while one node is down

Alain RODRIGUEZ Thu, 14 Feb 2013 02:07:50 -0800

I will let commiters or anyone that has knowledge on Cassandra internal
answer this.


>From what I understand, you should be able to insert data on any up node
with your configuration...

Alain


2013/2/14 Traian Fratean <[email protected]>

> You're right as regarding data availability on that node. And my config,
> being the default one, is not suited for a cluster.
> What I don't get is that my 67 node was down and I was trying to insert in
> 66 node, as can be seen from the stacktrace. Long story short: when node 67
> was down I could not insert into any machine in the cluster. Not what I was
> expecting.
>
> Thank you for the reply!
> Traian.
>
> 2013/2/14 Alain RODRIGUEZ <[email protected]>
>
>> Hi Traian,
>>
>> There is your problem. You are using RF=1, meaning that each node is
>> responsible for its range, and nothing more. So when a node goes down, do
>> the math, you just can't read 1/5 of your data.
>>
>> This is very cool for performances since each node owns its own part of
>> the data and any write or read need to reach only one node, but it removes
>> the SPOF, which is a main point of using C*. So you have poor availability
>> and poor consistency.
>>
>> An usual configuration with 5 nodes would be RF=3 and both CL (R&W) =
>> QUORUM.
>>
>> This will replicate your data to 2 nodes + the natural endpoints (total
>> of 3/5 nodes owning any data) and any read or write would need to reach at
>> least 2 nodes before being considered as being successful ensuring a strong
>> consistency.
>>
>> This configuration allow you to shut down a node (crash or configuration
>> update/rolling restart) without degrading the service (at least allowing
>> you to reach any data) but at cost of more data on each node.
>>
>> Alain
>>
>>
>> 2013/2/14 Traian Fratean <[email protected]>
>>
>>> I am using defaults for both RF and CL. As the keyspace was created
>>> using cassandra-cli the default RF should be 1 as I get it from below:
>>>
>>> [default@TestSpace] describe;
>>> Keyspace: TestSpace:
>>>   Replication Strategy:
>>> org.apache.cassandra.locator.NetworkTopologyStrategy
>>>   Durable Writes: true
>>>     Options: [datacenter1:1]
>>>
>>> As for the CL it the Astyanax default, which is 1 for both reads and
>>> writes.
>>>
>>> Traian.
>>>
>>>
>>> 2013/2/13 Alain RODRIGUEZ <[email protected]>
>>>
>>>> We probably need more info like the RF of your cluster and CL of your
>>>> reads and writes. Maybe could you also tell us if you use vnodes or not.
>>>>
>>>> I heard that Astyanax was not running very smoothly on 1.2.0, but a bit
>>>> better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for
>>>> C*1.2.
>>>>
>>>> Alain
>>>>
>>>>
>>>> 2013/2/13 Traian Fratean <[email protected]>
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java
>>>>> client with Astyanax 1.56.21.
>>>>> When a node(10.60.15.67 - *diiferent* from the one in the stacktrace
>>>>> below) went down I get TokenRandeOfflineException and no other data gets
>>>>> inserted into *any other* node from the cluster.
>>>>>
>>>>> Am I having a configuration issue or this is supposed to happen?
>>>>>
>>>>>
>>>>> com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
>>>>> -
>>>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
>>>>> latency=2057(2057), attempts=1]UnavailableException()
>>>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
>>>>> latency=2057(2057), attempts=1]UnavailableException()
>>>>> at
>>>>> com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
>>>>>  at
>>>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
>>>>> at
>>>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
>>>>>  at
>>>>> com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
>>>>> at
>>>>> com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
>>>>>  at
>>>>> com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)
>>>>>
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Traian.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Cluster not accepting insert while one node is down

Reply via email to