**

Hi,****

** **

I have a 20 node cluster running v1.0.7 split between 5 data centres, each
with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. ****

** **

I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I
brought online late last year with data consistency & availability: I’d
request data, nothing would be returned, I would then re-request the data
and it would correctly be returned: i.e. read-repair appeared to be
occurring.  However running repairs on the nodes didn’t resolve this (I
tried general ‘*repair’* commands as well as targeted keyspace commands) –
this didn’t alter the behaviour.****

** **

After a lot of fruitless investigation, I decided to wipe &
re-install/re-populate the nodes.  The re-install & repair operations are
now complete: I see the expected amount of data on the nodes, however I am
still seeing the same behaviour, i.e. I only get data after one failed
attempt.****

** **

When I run repair commands, I don’t see any errors in the logs. ****

I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during
repair sessions.

I see a number of dropped ‘MUTATION’ operations : just under 5% of the
total ‘MutationStage’ count.****

** **

Questions :****

**-          **Could anybody suggest anything specific to look at to see
why the repair operations aren’t having the desired effect? ****

**-          **Would increasing logging level to ‘DEBUG’ show read-repair
activity (to confirm that this is happening, when & for what proportion of
total requests)?****

**-          **Is there something obvious that I could be missing here?****

** **

Many thanks,****

Brian****

**

Reply via email to