[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-06-23 Thread Duncan Sands (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040518#comment-14040518
 ] 

Duncan Sands commented on CASSANDRA-6887:
-

This was a while ago, but I think it was this at the time:

  'class': 'NetworkTopologyStrategy',
  'DC3': '1',
  'DC2': '3',
  'DC1': '3'

There is a small chance that I've swapped DC2 and DC3 here.

Default read_repair_chance, local_read_repair_chance and speculative_retry.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-30 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013805#comment-14013805
 ] 

Aleksey Yeschenko commented on CASSANDRA-6887:
--

[~baldrick] LOCAL_* or not should not have any impact here, really - there is 
no distinction in the codebase based on this. An async read repair will be 
triggered once all the requests are received, and the data will be repaired.

That said, can you post more info on the affected table(s)?
1. The keyspace RF, in each DC
2. The configured read_repair_chance and local_read_repair_chance values for 
the affected tables
3. speculative_retry setting for the affected tables

Thanks.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9, 2.1.0

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-29 Thread Duncan Sands (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012201#comment-14012201
 ] 

Duncan Sands commented on CASSANDRA-6887:
-

With these changes would DC local read repair (aka dclocal_read_repair_chance) 
still kick in with LOCAL_* requests?

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9, 2.1 rc1

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012319#comment-14012319
 ] 

Sylvain Lebresne commented on CASSANDRA-6887:
-

I don't think that patch is what we want to do. We have 
dclocal_read_repair_chance exactly for that kind of reasons. If you don't want 
to get any digest sent to a remote DC, then you should set read_repair_chance 
to 0 and configure dclocal_read_repair to whatever you want. Maybe that's 
something that needs documenting, and in fact I'd be in favor of making it a 
default (i.e instead of having read_repair_chance=0.1 and 
dclocal_read_repair_chance=0, to switch to read_repair_chance=0 and 
dclocal_read_repair_chance=0.1. If you have only one DC, then it won't change 
from the current default, and if you have multiple-DC, I can agree that not 
crossing DC boundaries for read repair is a better default. But I'm not in 
favor of removing the option altogether).

That said, the behaviour described above does sound like a bug. Since by 
default read_repair_chance is 0.1, there should be cross-DC digest queries 
every 10 queries or so, and those *should* repair. If that's not the case, we 
should fix it.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9, 2.1 rc1

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-29 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012348#comment-14012348
 ] 

Aleksey Yeschenko commented on CASSANDRA-6887:
--

[~slebresne] I would agree with you entirely had this comment been made before 
we had LOCAL_* CLs and eager read retries.

Relying on on dc/global read repair chance now, however, is not the most 
logical/expected behavior and violates the principle of least surprise.

Currently, without the attached patch:
1. LOCAL_* requests will potentially use a replica from another DC for an eager 
retry - and this behavior can *not* be disabled via read repair chance tuning
2. All three of RRD.NONE/GLOBAL/DC_LOCAL can cause a request go to a different 
DC for LOCAL_* CL queries, depending on the # of live replicas in the local DC 
- CL#filterForQuery() is too late a point to filter the replicas, it must be 
done before that.
3. A user might *not* want to have read_repair_chance set to 0 entirely, and 
also use both LOCAL_* and regular consistency levels - on per-query basis.

So I still think that LOCAL_* CLs should not allow any aspect of the query to 
involve a non-local DC - that's arguably the whole point of LOCAL_* CLs, and 
also, arguably, the expected/least surprising behavior.

Actually, after spending a bit more time in the code, I'd say that LOCAL_* CLs 
are flat out broken/not properly implemented for read requests, with the sole 
exception of CL#assureSufficientLiveNodes() that does work properly.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9, 2.1 rc1

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012367#comment-14012367
 ] 

Sylvain Lebresne commented on CASSANDRA-6887:
-

bq. Relying on on dc/global read repair chance now, however, is not the most 
logical/expected behavior and violates the principle of least surprise.

Again, I do think we should change the default because I agree the current 
default violates the principle of least surprise. But I disagree that crippling 
Cassandra by making it impossible to have global read repair with LOCAL CL 
helps in any way. Unless we have a strong reason to do it, but I don't think 
that's the case.

bq. LOCAL_* requests will potentially use a replica from another DC for an 
eager retry

I can agree that it's a problem. But surely that can easily be fixed separately 
from the rest.

bq. All three of RRD.NONE/GLOBAL/DC_LOCAL can cause a request go to a different 
DC for LOCAL_* CL queries, depending on the # of live replicas in the local DC

I don't think that's true. For RDD.NONE/DC_LOCAL, CL.filterForQuery will only 
include non-local nodes if the number of live local nodes is  blockFor. But in 
that case CL.assureSufficientLiveNodes will throw an UnavailableException.

bq. So I still think that LOCAL_* CLs should not allow any aspect of the query 
to involve a non-local DC - that's arguably the whole point of LOCAL_* CLs

I don't entirely agree. The point of LOCAL_* CLs is to get proper latencies in 
multi-DC setups. There is nothing wrong with having having asynchronous 
read-repair across DCs when you use them. In fact, that might even be a good 
idea.

bq. 3. A user might not want to have read_repair_chance set to 0 entirely, and 
also use both LOCAL_* and regular consistency levels - on per-query basis.

Theoretically, you're correct. But in my experience, if you have multi-DC 
setup, you'll always want you queries to be LOCAL_* (except maybe for a few 
maintenance queries where you want to ensure across-DC consistency, but they're 
probably not live queries and you probably don't care about read-repair on 
those). I think the scenario I want to do only LOCAL CL queries but I'd like 
to do cross-DC read-repair asynchronously for 1% of my queries (and maybe 
dc-local read-repair for 10% of them) is much more useful and common than some 
weird case where you mix local and non-local CL on the same table while still 
caring about read-repair.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9, 2.1 rc1

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent 

[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-29 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012372#comment-14012372
 ] 

Aleksey Yeschenko commented on CASSANDRA-6887:
--

bq. I don't think that's true. For RDD.NONE/DC_LOCAL, CL.filterForQuery will 
only include non-local nodes if the number of live local nodes is  blockFor. 
But in that case CL.assureSufficientLiveNodes will throw an 
UnavailableException.

Actually, I was both right and wrong here. Wrong about 
CL#assureSufficientLiveNodes() working properly, and right about all RRDs 
causing a LOCAL query going to a different DC. Only for LOCAL_ONE though, b/c 
unlike CL#isSufficientLiveNodes(), CL#assureSufficientLiveNodes() does not 
handle LOCAL_ONE properly. So if there are live replicas - but none in the 
local DC - the request will go to a different one, potentially, with all RRDs.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9, 2.1 rc1

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012448#comment-14012448
 ] 

Sylvain Lebresne commented on CASSANDRA-6887:
-

bq. Only for LOCAL_ONE though, b/c unlike CL#isSufficientLiveNodes(), 
CL#assureSufficientLiveNodes() does not handle LOCAL_ONE properly. So if there 
are live replicas - but none in the local DC - the request will go to a 
different one, potentially, with all RRDs.

Fair enough, but surely we can agree it's just a bug of 
assureSufficientLiveNodes. That doesn't invalidate the fact that allowing read 
repair globally even when LOCAL CL are used can be useful and shouldn't be 
removed imo, even if I 100% agree that it shouldn't be the default.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9, 2.1.0

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-29 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012456#comment-14012456
 ] 

Aleksey Yeschenko commented on CASSANDRA-6887:
--

True. I'll branch out the LOCAL_ONE issue into a separate ticket (1.2+) and new 
defaults into a separate ticket, too (2.0 or 2.1?) then, since both are 
CHANGES.txt-worthy.

I'll look into what's causing the problem in the issue description.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9, 2.1.0

 Attachments: 6887-2.0.txt


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6887) LOCAL_ONE read repair only does local repair, in spite of global digest queries

2014-05-26 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008944#comment-14008944
 ] 

Aleksey Yeschenko commented on CASSANDRA-6887:
--

I think your last point is correct. It's more logical and less unexpected for 
Cassandra to *not* send digest queries to another DC when using LOCAL_* CLs.

 LOCAL_ONE read repair only does local repair, in spite of global digest 
 queries
 ---

 Key: CASSANDRA-6887
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.6, x86-64 ubuntu precise
Reporter: Duncan Sands
Assignee: Aleksey Yeschenko
 Fix For: 2.0.9


 I have a cluster spanning two data centres.  Almost all of the writing (and a 
 lot of reading) is done in DC1.  DC2 is used for running the occasional 
 analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
 settings are set to the defaults on all column families.
 I had a long network outage between the data centres; it lasted longer than 
 the hints window, so after it was over DC2 didn't have the latest 
 information.  Even after reading data many many times in DC2, the returned 
 data was still out of date: read repair was not correcting it.
 I then investigated using cqlsh in DC2, with tracing on.
 What I saw was:
   - with consistency ONE, after about 10 read requests a digest request would 
 be sent to many nodes (spanning both data centres), and the data in DC2 would 
 be repaired.
  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
 would be sent to many nodes (spanning both data centres), but the data in DC2 
 would not be repaired.  This is in spite of digest requests being sent to 
 DC1, as shown by the tracing.
 So it looks like digest requests are being sent to both data centres, but 
 replies from outside the local data centre are ignored when using LOCAL_ONE.
 The same data is being queried all the time in DC1 with consistency 
 LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
 either.  This is a slightly different case to what I described above: in that 
 case the local node was out of date and the remote node had the latest data, 
 while here it is the other way round.
 It could be argued that you don't want cross data centre read repair when 
 using LOCAL_ONE.  But then why bother sending cross data centre digest 
 requests?  And if only doing local read repair is how it is supposed to work 
 then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)