[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException
[ https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064609#comment-13064609 ] Patrick Mackinlay commented on CASSANDRA-2870: -- In the default configuration of 0.7.6-2 (and other versions) LOCAL_QUORUM reads dont work. This is not a minor bug and should be fixed in the next release. By default configuration I mean the tar ball that is distributed by the cassandra website. The fact that it is not a regression just shows that this functionality was never properly tested. dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException - Key: CASSANDRA-2870 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7.8, 0.8.2 Attachments: 2870.txt When Read Repair is off, we want to avoid doing requests to more nodes than necessary to satisfy the ConsistencyLevel. ReadCallback does this here: {code} this.endpoints = repair || resolver instanceof RowRepairResolver ? endpoints : endpoints.subList(0, Math.min(endpoints.size(), blockfor)); // min so as to not throw exception until assureSufficient is called {code} You can see that it is assuming that the endpoints list is sorted in order of preferred-ness for the read. Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have enough nodes to do the read: {code} int localEndpoints = 0; for (InetAddress endpoint : endpoints) { if (localdc.equals(snitch.getDatacenter(endpoint))) localEndpoints++; } if (localEndpoints blockfor) throw new UnavailableException(); {code} So if repair is off (so we truncate our endpoints list) AND dynamic snitch has decided that nodes in another DC are to be preferred over local ones, we'll throw UE even if all the replicas are healthy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException
[ https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064621#comment-13064621 ] Jonathan Ellis commented on CASSANDRA-2870: --- It will be fixed in 0.7.8; 0.7.7 entered the release process before this was reported. dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException - Key: CASSANDRA-2870 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7.8, 0.8.2 Attachments: 2870.txt When Read Repair is off, we want to avoid doing requests to more nodes than necessary to satisfy the ConsistencyLevel. ReadCallback does this here: {code} this.endpoints = repair || resolver instanceof RowRepairResolver ? endpoints : endpoints.subList(0, Math.min(endpoints.size(), blockfor)); // min so as to not throw exception until assureSufficient is called {code} You can see that it is assuming that the endpoints list is sorted in order of preferred-ness for the read. Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have enough nodes to do the read: {code} int localEndpoints = 0; for (InetAddress endpoint : endpoints) { if (localdc.equals(snitch.getDatacenter(endpoint))) localEndpoints++; } if (localEndpoints blockfor) throw new UnavailableException(); {code} So if repair is off (so we truncate our endpoints list) AND dynamic snitch has decided that nodes in another DC are to be preferred over local ones, we'll throw UE even if all the replicas are healthy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException
[ https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061846#comment-13061846 ] Sylvain Lebresne commented on CASSANDRA-2870: - +1 dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException - Key: CASSANDRA-2870 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7.8, 0.8.2 Attachments: 2870.txt When Read Repair is off, we want to avoid doing requests to more nodes than necessary to satisfy the ConsistencyLevel. ReadCallback does this here: {code} this.endpoints = repair || resolver instanceof RowRepairResolver ? endpoints : endpoints.subList(0, Math.min(endpoints.size(), blockfor)); // min so as to not throw exception until assureSufficient is called {code} You can see that it is assuming that the endpoints list is sorted in order of preferred-ness for the read. Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have enough nodes to do the read: {code} int localEndpoints = 0; for (InetAddress endpoint : endpoints) { if (localdc.equals(snitch.getDatacenter(endpoint))) localEndpoints++; } if (localEndpoints blockfor) throw new UnavailableException(); {code} So if repair is off (so we truncate our endpoints list) AND dynamic snitch has decided that nodes in another DC are to be preferred over local ones, we'll throw UE even if all the replicas are healthy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException
[ https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062053#comment-13062053 ] Hudson commented on CASSANDRA-2870: --- Integrated in Cassandra-0.7 #526 (See [https://builds.apache.org/job/Cassandra-0.7/526/]) fix possibility of spuriousUnavailableException for LOCAL_QUORUM reads with dynamic snitch and read repair disabled patch by jbellis; reviewed by slebresne for CASSANDRA-2870 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1144380 Files : * /cassandra/branches/cassandra-0.7/CHANGES.txt * /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/ReadCallback.java * /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/DatacenterReadCallback.java dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException - Key: CASSANDRA-2870 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7.8, 0.8.2 Attachments: 2870.txt When Read Repair is off, we want to avoid doing requests to more nodes than necessary to satisfy the ConsistencyLevel. ReadCallback does this here: {code} this.endpoints = repair || resolver instanceof RowRepairResolver ? endpoints : endpoints.subList(0, Math.min(endpoints.size(), blockfor)); // min so as to not throw exception until assureSufficient is called {code} You can see that it is assuming that the endpoints list is sorted in order of preferred-ness for the read. Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have enough nodes to do the read: {code} int localEndpoints = 0; for (InetAddress endpoint : endpoints) { if (localdc.equals(snitch.getDatacenter(endpoint))) localEndpoints++; } if (localEndpoints blockfor) throw new UnavailableException(); {code} So if repair is off (so we truncate our endpoints list) AND dynamic snitch has decided that nodes in another DC are to be preferred over local ones, we'll throw UE even if all the replicas are healthy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException
[ https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062256#comment-13062256 ] Jeremy Hanna commented on CASSANDRA-2870: - This also appears to affect 0.7.6 and when read repair is not off. I didn't set read repair on my CFs (defaults to 100%) and tried a simple rowcount pig script using read consistency LOCAL_QUORUM and it fails with UE. I would think if that's the case, the priority should be higher and it should go in 0.7.7. Any thoughts? dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException - Key: CASSANDRA-2870 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7.8, 0.8.2 Attachments: 2870.txt When Read Repair is off, we want to avoid doing requests to more nodes than necessary to satisfy the ConsistencyLevel. ReadCallback does this here: {code} this.endpoints = repair || resolver instanceof RowRepairResolver ? endpoints : endpoints.subList(0, Math.min(endpoints.size(), blockfor)); // min so as to not throw exception until assureSufficient is called {code} You can see that it is assuming that the endpoints list is sorted in order of preferred-ness for the read. Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have enough nodes to do the read: {code} int localEndpoints = 0; for (InetAddress endpoint : endpoints) { if (localdc.equals(snitch.getDatacenter(endpoint))) localEndpoints++; } if (localEndpoints blockfor) throw new UnavailableException(); {code} So if repair is off (so we truncate our endpoints list) AND dynamic snitch has decided that nodes in another DC are to be preferred over local ones, we'll throw UE even if all the replicas are healthy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException
[ https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062308#comment-13062308 ] Jonathan Ellis commented on CASSANDRA-2870: --- This has been present since LOCAL_QUORUM was introduced, so it's not a new regression. And a reasonable workaround exists (disable dynamic snitch). So no, I don't think we should hold up 0.7.7 for this. dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException - Key: CASSANDRA-2870 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7.8, 0.8.2 Attachments: 2870.txt When Read Repair is off, we want to avoid doing requests to more nodes than necessary to satisfy the ConsistencyLevel. ReadCallback does this here: {code} this.endpoints = repair || resolver instanceof RowRepairResolver ? endpoints : endpoints.subList(0, Math.min(endpoints.size(), blockfor)); // min so as to not throw exception until assureSufficient is called {code} You can see that it is assuming that the endpoints list is sorted in order of preferred-ness for the read. Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have enough nodes to do the read: {code} int localEndpoints = 0; for (InetAddress endpoint : endpoints) { if (localdc.equals(snitch.getDatacenter(endpoint))) localEndpoints++; } if (localEndpoints blockfor) throw new UnavailableException(); {code} So if repair is off (so we truncate our endpoints list) AND dynamic snitch has decided that nodes in another DC are to be preferred over local ones, we'll throw UE even if all the replicas are healthy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException
[ https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062324#comment-13062324 ] Jeremy Hanna commented on CASSANDRA-2870: - Okay - it just seemed like a higher priority issue with the scope expanded. We'll probably just disable dynamic snitch until the fix is in a release then. dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException - Key: CASSANDRA-2870 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7.8, 0.8.2 Attachments: 2870.txt When Read Repair is off, we want to avoid doing requests to more nodes than necessary to satisfy the ConsistencyLevel. ReadCallback does this here: {code} this.endpoints = repair || resolver instanceof RowRepairResolver ? endpoints : endpoints.subList(0, Math.min(endpoints.size(), blockfor)); // min so as to not throw exception until assureSufficient is called {code} You can see that it is assuming that the endpoints list is sorted in order of preferred-ness for the read. Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have enough nodes to do the read: {code} int localEndpoints = 0; for (InetAddress endpoint : endpoints) { if (localdc.equals(snitch.getDatacenter(endpoint))) localEndpoints++; } if (localEndpoints blockfor) throw new UnavailableException(); {code} So if repair is off (so we truncate our endpoints list) AND dynamic snitch has decided that nodes in another DC are to be preferred over local ones, we'll throw UE even if all the replicas are healthy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira