** Hi,****
** ** I have a 20 node cluster running v1.0.7 split between 5 data centres, each with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. **** ** ** I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I brought online late last year with data consistency & availability: I’d request data, nothing would be returned, I would then re-request the data and it would correctly be returned: i.e. read-repair appeared to be occurring. However running repairs on the nodes didn’t resolve this (I tried general ‘*repair’* commands as well as targeted keyspace commands) – this didn’t alter the behaviour.**** ** ** After a lot of fruitless investigation, I decided to wipe & re-install/re-populate the nodes. The re-install & repair operations are now complete: I see the expected amount of data on the nodes, however I am still seeing the same behaviour, i.e. I only get data after one failed attempt.**** ** ** When I run repair commands, I don’t see any errors in the logs. **** I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during repair sessions. I see a number of dropped ‘MUTATION’ operations : just under 5% of the total ‘MutationStage’ count.**** ** ** Questions :**** **- **Could anybody suggest anything specific to look at to see why the repair operations aren’t having the desired effect? **** **- **Would increasing logging level to ‘DEBUG’ show read-repair activity (to confirm that this is happening, when & for what proportion of total requests)?**** **- **Is there something obvious that I could be missing here?**** ** ** Many thanks,**** Brian**** **