[jira] [Updated] (CASSANDRA-15142) Fix errors on repairing empty keyspace

2019-05-27 Thread Stefan Podkowinski (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-15142:
---
Test and Documentation Plan: 
[CircleCI|https://circleci.com/workflow-run/45ecd7af-ec77-4090-bf07-278c78e43e30]
  (was: * 
[CASSANDRA-15142|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-15142]
* 
[CircleCI|https://circleci.com/workflow-run/45ecd7af-ec77-4090-bf07-278c78e43e30])

> Fix errors on repairing empty keyspace
> --
>
> Key: CASSANDRA-15142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Tool/nodetool
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
>
> Running repairs on empty keyspaces will produce a rather confusing error in 
> trunk:
> {noformat}
> ERROR [Repair-Task:1] 2019-05-24 10:36:20,323 RepairRunnable.java:274 - 
> Repair 014607d0-7dff-11e9-9256-158db058ccc5 failed:
> java.lang.IllegalArgumentException: repair sessions cannot operate on 
> multiple keyspaces
> ▸  at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:135)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService$ParentRepairSession.(ActiveRepairService.java:566)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService.registerParentRepairSession(ActiveRepairService.java:484)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:395)
> ▸  at 
> org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:269)
> ▸  at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ▸  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ▸  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ▸  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ▸  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ▸  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> ▸  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Let's ignore empty keyspaces and return a success return status instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15142) Fix errors on repairing empty keyspace

2019-05-27 Thread Stefan Podkowinski (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849001#comment-16849001
 ] 

Stefan Podkowinski commented on CASSANDRA-15142:


* 
[CASSANDRA-15142|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-15142]
* 
[CircleCI|https://circleci.com/workflow-run/45ecd7af-ec77-4090-bf07-278c78e43e30]

> Fix errors on repairing empty keyspace
> --
>
> Key: CASSANDRA-15142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Tool/nodetool
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
>
> Running repairs on empty keyspaces will produce a rather confusing error in 
> trunk:
> {noformat}
> ERROR [Repair-Task:1] 2019-05-24 10:36:20,323 RepairRunnable.java:274 - 
> Repair 014607d0-7dff-11e9-9256-158db058ccc5 failed:
> java.lang.IllegalArgumentException: repair sessions cannot operate on 
> multiple keyspaces
> ▸  at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:135)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService$ParentRepairSession.(ActiveRepairService.java:566)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService.registerParentRepairSession(ActiveRepairService.java:484)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:395)
> ▸  at 
> org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:269)
> ▸  at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ▸  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ▸  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ▸  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ▸  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ▸  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> ▸  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Let's ignore empty keyspaces and return a success return status instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15142) Fix errors on repairing empty keyspace

2019-05-27 Thread Stefan Podkowinski (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-15142:
---
Test and Documentation Plan: 
* 
[CASSANDRA-15142|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-15142]
* 
[CircleCI|https://circleci.com/workflow-run/45ecd7af-ec77-4090-bf07-278c78e43e30]
 Status: Patch Available  (was: Open)

> Fix errors on repairing empty keyspace
> --
>
> Key: CASSANDRA-15142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Tool/nodetool
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
>
> Running repairs on empty keyspaces will produce a rather confusing error in 
> trunk:
> {noformat}
> ERROR [Repair-Task:1] 2019-05-24 10:36:20,323 RepairRunnable.java:274 - 
> Repair 014607d0-7dff-11e9-9256-158db058ccc5 failed:
> java.lang.IllegalArgumentException: repair sessions cannot operate on 
> multiple keyspaces
> ▸  at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:135)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService$ParentRepairSession.(ActiveRepairService.java:566)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService.registerParentRepairSession(ActiveRepairService.java:484)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:395)
> ▸  at 
> org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:269)
> ▸  at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ▸  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ▸  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ▸  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ▸  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ▸  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> ▸  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Let's ignore empty keyspaces and return a success return status instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch

2019-05-27 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848873#comment-16848873
 ] 

mck edited comment on CASSANDRA-14812 at 5/27/19 12:01 PM:
---

[~benedict], I have reviewed the patch and tested the python reproducible on 
3.0.18 and 3.11.4, working with and failing without the patch applied.

I'm not competent on this area, but I am jumping in to help as we too are 
seeing users unable to upgrade because of this fault.

Review questions/points are: 
 - is there a way to replicate the test for the CQL equivalent? While this bug 
does not impact CQL it is my understanding that CQL queries with `IN` clauses 
will still be going through this code path… I've attached the reproducible 
script rewritten for CQL, is it applicable? Should it be added as a dtest? (i 
don't think so but double-checking)
 - I understand overriding {{`filter(..)`}} for the NONE impl, although at 
first it is not intuitive that  {{`DataLimits.NONE`}} is also used in thrift 
queries…
 - fyi the circleci results are here: 
https://circleci.com/workflow-run/3dd0d7f3-fa79-4118-80d8-247e85db40ea ; are 
these failures of concern?
 - {{"The branch I have uploaded also has a back port of CASSANDRA-14821"}}. I 
am confused… where is this?
 - a rebased commit for the 3.0 branch is here 
[mck/cassandra-3.0_14812|https://github.com/thelastpickle/cassandra/commits/mck/cassandra-3.0_14812]
 - the change in {{BasePartitions}} and the interactions from different 
{{StoppingTransformation}} subclasses is a bit harder to grok… It makes sense 
that the {{while}} loop does not need to continue in the situation where, 
{{stop}} has "leaked" and not been signalled, but where 
{{stopChild.isSignalled}} was. But not returning false in that same situation 
seems odd…? Do you want me to test the different cql interactions here (per 
partition, grouping, paging)?




was (Author: michaelsembwever):
[~benedict], I have reviewed the patch and tested the python reproducible on 
3.0.18 and 3.11.4, working with and failing without the patch applied.

I'm not competent on this area, but I am jumping in to help as we too are 
seeing users unable to upgrade because of this fault.

Review questions/points are: 
 - is there a way to replicate the test for the CQL equivalent? While this bug 
does not impact CQL it is my understanding that CQL queries with `IN` clauses 
will still be going through this code path… I've attached the reproducible 
script rewritten for CQL, is it applicable? Should it be added as a dtest? (i 
don't think so but double-checking)
 - I understand overriding {{`filter(..)`}} for the NONE impl, although at 
first it is not intuitive that  {{`DataLimits.NONE`}} is also used in thrift 
queries…
 - fyi the circleci results are here: 
https://circleci.com/workflow-run/3dd0d7f3-fa79-4118-80d8-247e85db40ea ; are 
these failures of concern?
 - {{"The branch I have uploaded also has a back port of CASSANDRA-14821"}}. I 
am confused… where is this?
 - a rebased commit for the 3.0 branch is here 
[mck/cassandra-3.0_14812|https://github.com/thelastpickle/cassandra/commits/mck/cassandra-3.0_14812]
 - the change in {{BasePartitions}} and the interactions from different 
{{StoppingTransformation}} subclasses is a bit harder to grok… It makes that 
the {{while}} loop does not need to continue in the situation where, {{stop}} 
has "leaked" and not been signalled, but where {{stopChild.isSignalled}} was. 
But not returning false in that same situation seems odd…? Do you want me to 
test the different cql interactions here (per partition, grouping, paging)?



> Multiget Thrift query returns null records after digest mismatch
> 
>
> Key: CASSANDRA-14812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14812
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Messaging/Thrift
>Reporter: Sivukhin Nikita
>Assignee: Benedict
>Priority: Urgent
> Fix For: 3.0.x, 3.11.x
>
> Attachments: repro_script.py, requirements.txt, 
> small_repro_script.py, small_repro_script_cql.py
>
>
> It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} 
> Thrift query processing logic. When one tries to read data from several 
> partitions with a single {{multiget}} query and {{DigestMismatch}} exception 
> is raised during this query processing, request coordinator prematurely 
> terminates response stream right at the point where the first 
> \{{DigestMismatch}} error is occurring. This leads to situation where clients 
> "do not see" some data contained in the database.
> We managed to reproduce this bug in all versions of Cassandra starting with 
> v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like 

[jira] [Updated] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch

2019-05-27 Thread mck (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-14812:

Attachment: small_repro_script_cql.py

> Multiget Thrift query returns null records after digest mismatch
> 
>
> Key: CASSANDRA-14812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14812
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Messaging/Thrift
>Reporter: Sivukhin Nikita
>Assignee: Benedict
>Priority: Urgent
> Fix For: 3.0.x, 3.11.x
>
> Attachments: repro_script.py, requirements.txt, 
> small_repro_script.py, small_repro_script_cql.py
>
>
> It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} 
> Thrift query processing logic. When one tries to read data from several 
> partitions with a single {{multiget}} query and {{DigestMismatch}} exception 
> is raised during this query processing, request coordinator prematurely 
> terminates response stream right at the point where the first 
> \{{DigestMismatch}} error is occurring. This leads to situation where clients 
> "do not see" some data contained in the database.
> We managed to reproduce this bug in all versions of Cassandra starting with 
> v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like 
> [refactoring of iterator transformation 
> hierarchy|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7]
>  related to CASSANDRA-9975 triggers incorrect behaviour.
> When concatenated iterator is returned from the 
> [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770],
>  Cassandra starts to consume this combined iterator. Because of 
> {{DigestMismatch}} exception some elements of this combined iterator contain 
> additional {{ThriftCounter}}, that was added during 
> [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120]
>  execution. While consuming iterator for many partitions Cassandra calls 
> [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115]
>  method that must switch from one partition iterator to another in case of 
> exhaustion of the former. In this case all Transformations contained in the 
> next iterator are applied to the combined BaseIterator that enumerates 
> partitions sequence which is wrong. This behaviour causes BaseIterator to 
> stop enumeration after it fully consumes partition with {{DigestMismatch}} 
> error, because this partition iterator has additional {{ThriftCounter}} data 
> limit.
> The attachment contains the python2 script [^small_repro_script.py] that 
> reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is 
> an extended version of this script - [^repro_script.py] - that contains more 
> logging information and provides the ability to test behavior for many 
> Cassandra versions (to run all test cases from repro_script.py you can call 
> {{python -m unittest2 -v repro_script.ThriftMultigetTestCase}}). All the 
> necessary dependencies contained in the [^requirements.txt]
>  
> This bug is critical in our production environment because we can't permit 
> any data skip.
> Any ideas about a patch for this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch

2019-05-27 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848873#comment-16848873
 ] 

mck edited comment on CASSANDRA-14812 at 5/27/19 12:00 PM:
---

[~benedict], I have reviewed the patch and tested the python reproducible on 
3.0.18 and 3.11.4, working with and failing without the patch applied.

I'm not competent on this area, but I am jumping in to help as we too are 
seeing users unable to upgrade because of this fault.

Review questions/points are: 
 - is there a way to replicate the test for the CQL equivalent? While this bug 
does not impact CQL it is my understanding that CQL queries with `IN` clauses 
will still be going through this code path… I've attached the reproducible 
script rewritten for CQL, is it applicable? Should it be added as a dtest? (i 
don't think so but double-checking)
 - I understand overriding {{`filter(..)`}} for the NONE impl, although at 
first it is not intuitive that  {{`DataLimits.NONE`}} is also used in thrift 
queries…
 - fyi the circleci results are here: 
https://circleci.com/workflow-run/3dd0d7f3-fa79-4118-80d8-247e85db40ea ; are 
these failures of concern?
 - {{"The branch I have uploaded also has a back port of CASSANDRA-14821"}}. I 
am confused… where is this?
 - a rebased commit for the 3.0 branch is here 
[mck/cassandra-3.0_14812|https://github.com/thelastpickle/cassandra/commits/mck/cassandra-3.0_14812]
 - the change in {{BasePartitions}} and the interactions from different 
{{StoppingTransformation}} subclasses is a bit harder to grok… It makes that 
the {{while}} loop does not need to continue in the situation where, {{stop}} 
has "leaked" and not been signalled, but where {{stopChild.isSignalled}} was. 
But not returning false in that same situation seems odd…? Do you want me to 
test the different cql interactions here (per partition, grouping, paging)?




was (Author: michaelsembwever):
[~benedict], I have reviewed the patch and tested the python reproducible on 
3.0.18 and 3.11.4, working with and failing without the patch applied.

I'm not competent on this area, but I am jumping in to help as we too are 
seeing users unable to upgrade because of this fault.

Review questions/points are: 
 - is there a way to replicate the test for the CQL equivalent? While this bug 
does not impact CQL it is my understanding that CQL queries with `IN` clauses 
will still be going through this code path… I've attached the reproducible 
script rewritten for CQL, is it applicable? Should it be added as a dtest? 
 - I understand overriding {{`filter(..)`}} for the NONE impl, although at 
first it is not intuitive that  {{`DataLimits.NONE`}} is also used in thrift 
queries…
 - fyi the circleci results are here: 
https://circleci.com/workflow-run/3dd0d7f3-fa79-4118-80d8-247e85db40ea ; are 
these failures of concern?
 - {{"The branch I have uploaded also has a back port of CASSANDRA-14821"}}. I 
am confused… where is this?
 - a rebased commit for the 3.0 branch is here 
[mck/cassandra-3.0_14812|https://github.com/thelastpickle/cassandra/commits/mck/cassandra-3.0_14812]
 - the change in {{BasePartitions}} and the interactions from different 
{{StoppingTransformation}} subclasses is a bit harder to grok… It makes that 
the {{while}} loop does not need to continue in the situation where, {{stop}} 
has "leaked" and not been signalled, but where {{stopChild.isSignalled}} was. 
But not returning false in that same situation seems odd…? Do you want me to 
test the different cql interactions here (per partition, grouping, paging)?



> Multiget Thrift query returns null records after digest mismatch
> 
>
> Key: CASSANDRA-14812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14812
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Messaging/Thrift
>Reporter: Sivukhin Nikita
>Assignee: Benedict
>Priority: Urgent
> Fix For: 3.0.x, 3.11.x
>
> Attachments: repro_script.py, requirements.txt, 
> small_repro_script.py, small_repro_script_cql.py
>
>
> It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} 
> Thrift query processing logic. When one tries to read data from several 
> partitions with a single {{multiget}} query and {{DigestMismatch}} exception 
> is raised during this query processing, request coordinator prematurely 
> terminates response stream right at the point where the first 
> \{{DigestMismatch}} error is occurring. This leads to situation where clients 
> "do not see" some data contained in the database.
> We managed to reproduce this bug in all versions of Cassandra starting with 
> v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like 
> [refactoring of iterator transformation 
> 

[jira] [Comment Edited] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch

2019-05-27 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848873#comment-16848873
 ] 

mck edited comment on CASSANDRA-14812 at 5/27/19 11:59 AM:
---

[~benedict], I have reviewed the patch and tested the python reproducible on 
3.0.18 and 3.11.4, working with and failing without the patch applied.

I'm not competent on this area, but I am jumping in to help as we too are 
seeing users unable to upgrade because of this fault.

Review questions/points are: 
 - is there a way to replicate the test for the CQL equivalent? While this bug 
does not impact CQL it is my understanding that CQL queries with `IN` clauses 
will still be going through this code path… I've attached the reproducible 
script rewritten for CQL, is it applicable? Should it be added as a dtest? 
 - I understand overriding {{`filter(..)`}} for the NONE impl, although at 
first it is not intuitive that  {{`DataLimits.NONE`}} is also used in thrift 
queries…
 - fyi the circleci results are here: 
https://circleci.com/workflow-run/3dd0d7f3-fa79-4118-80d8-247e85db40ea ; are 
these failures of concern?
 - {{"The branch I have uploaded also has a back port of CASSANDRA-14821"}}. I 
am confused… where is this?
 - a rebased commit for the 3.0 branch is here 
[mck/cassandra-3.0_14812|https://github.com/thelastpickle/cassandra/commits/mck/cassandra-3.0_14812]
 - the change in {{BasePartitions}} and the interactions from different 
{{StoppingTransformation}} subclasses is a bit harder to grok… It makes that 
the {{while}} loop does not need to continue in the situation where, {{stop}} 
has "leaked" and not been signalled, but where {{stopChild.isSignalled}} was. 
But not returning false in that same situation seems odd…? Do you want me to 
test the different cql interactions here (per partition, grouping, paging)?




was (Author: michaelsembwever):
[~benedict], I have reviewed the patch and tested the python reproducible on 
3.0.18 and 3.11.4, working with and failing without the patch applied.

I'm not competent on this area, but I am jumping in to help as we too are 
seeing users unable to upgrade because of this fault.

Review questions/points are: 
 - is there a way to replicate the test for the CQL equivalent? While this bug 
does not impact CQL it is my understanding that CQL queries with `IN` clauses 
will still be going through this code path… I've attached the reproducible 
script rewritten for CQL, is it applicable? Should it be added as a dtest? XXX
 - I understand overriding {{`filter(..)`}} for the NONE impl, although at 
first it is not intuitive that  {{`DataLimits.NONE`}} is also used in thrift 
queries…
 - fyi the circleci results are here: 
https://circleci.com/workflow-run/3dd0d7f3-fa79-4118-80d8-247e85db40ea ; are 
these failures of concern?
 - {{"The branch I have uploaded also has a back port of CASSANDRA-14821"}}. I 
am confused… where is this?
 - a rebased commit for the 3.0 branch is here 
[mck/cassandra-3.0_14812|https://github.com/thelastpickle/cassandra/commits/mck/cassandra-3.0_14812]
 - the change in {{BasePartitions}} and the interactions from different 
{{StoppingTransformation}} subclasses is a bit harder to grok… It makes that 
the {{while}} loop does not need to continue in the situation where, {{stop}} 
has "leaked" and not been signalled, but where {{stopChild.isSignalled}} was. 
But not returning false in that same situation seems odd…? Do you want me to 
test the different cql interactions here (per partition, grouping, paging)?



> Multiget Thrift query returns null records after digest mismatch
> 
>
> Key: CASSANDRA-14812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14812
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Messaging/Thrift
>Reporter: Sivukhin Nikita
>Assignee: Benedict
>Priority: Urgent
> Fix For: 3.0.x, 3.11.x
>
> Attachments: repro_script.py, requirements.txt, small_repro_script.py
>
>
> It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} 
> Thrift query processing logic. When one tries to read data from several 
> partitions with a single {{multiget}} query and {{DigestMismatch}} exception 
> is raised during this query processing, request coordinator prematurely 
> terminates response stream right at the point where the first 
> \{{DigestMismatch}} error is occurring. This leads to situation where clients 
> "do not see" some data contained in the database.
> We managed to reproduce this bug in all versions of Cassandra starting with 
> v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like 
> [refactoring of iterator transformation 
> 

[jira] [Commented] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch

2019-05-27 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848873#comment-16848873
 ] 

mck commented on CASSANDRA-14812:
-

[~benedict], I have reviewed the patch and tested the python reproducible on 
3.0.18 and 3.11.4, working with and failing without the patch applied.

I'm not competent on this area, but I am jumping in to help as we too are 
seeing users unable to upgrade because of this fault.

Review questions/points are: 
 - is there a way to replicate the test for the CQL equivalent? While this bug 
does not impact CQL it is my understanding that CQL queries with `IN` clauses 
will still be going through this code path… I've attached the reproducible 
script rewritten for CQL, is it applicable? Should it be added as a dtest? XXX
 - I understand overriding {{`filter(..)`}} for the NONE impl, although at 
first it is not intuitive that  {{`DataLimits.NONE`}} is also used in thrift 
queries…
 - fyi the circleci results are here: 
https://circleci.com/workflow-run/3dd0d7f3-fa79-4118-80d8-247e85db40ea ; are 
these failures of concern?
 - {{"The branch I have uploaded also has a back port of CASSANDRA-14821"}}. I 
am confused… where is this?
 - a rebased commit for the 3.0 branch is here 
[mck/cassandra-3.0_14812|https://github.com/thelastpickle/cassandra/commits/mck/cassandra-3.0_14812]
 - the change in {{BasePartitions}} and the interactions from different 
{{StoppingTransformation}} subclasses is a bit harder to grok… It makes that 
the {{while}} loop does not need to continue in the situation where, {{stop}} 
has "leaked" and not been signalled, but where {{stopChild.isSignalled}} was. 
But not returning false in that same situation seems odd…? Do you want me to 
test the different cql interactions here (per partition, grouping, paging)?



> Multiget Thrift query returns null records after digest mismatch
> 
>
> Key: CASSANDRA-14812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14812
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Messaging/Thrift
>Reporter: Sivukhin Nikita
>Assignee: Benedict
>Priority: Urgent
> Fix For: 3.0.x, 3.11.x
>
> Attachments: repro_script.py, requirements.txt, small_repro_script.py
>
>
> It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} 
> Thrift query processing logic. When one tries to read data from several 
> partitions with a single {{multiget}} query and {{DigestMismatch}} exception 
> is raised during this query processing, request coordinator prematurely 
> terminates response stream right at the point where the first 
> \{{DigestMismatch}} error is occurring. This leads to situation where clients 
> "do not see" some data contained in the database.
> We managed to reproduce this bug in all versions of Cassandra starting with 
> v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like 
> [refactoring of iterator transformation 
> hierarchy|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7]
>  related to CASSANDRA-9975 triggers incorrect behaviour.
> When concatenated iterator is returned from the 
> [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770],
>  Cassandra starts to consume this combined iterator. Because of 
> {{DigestMismatch}} exception some elements of this combined iterator contain 
> additional {{ThriftCounter}}, that was added during 
> [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120]
>  execution. While consuming iterator for many partitions Cassandra calls 
> [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115]
>  method that must switch from one partition iterator to another in case of 
> exhaustion of the former. In this case all Transformations contained in the 
> next iterator are applied to the combined BaseIterator that enumerates 
> partitions sequence which is wrong. This behaviour causes BaseIterator to 
> stop enumeration after it fully consumes partition with {{DigestMismatch}} 
> error, because this partition iterator has additional {{ThriftCounter}} data 
> limit.
> The attachment contains the python2 script [^small_repro_script.py] that 
> reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is 
> an extended version of this script - [^repro_script.py] - that contains more 
> logging information and provides the ability to test behavior 

[jira] [Updated] (CASSANDRA-15142) Fix errors on repairing empty keyspace

2019-05-27 Thread Stefan Podkowinski (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-15142:
---
 Severity: Low
   Complexity: Low Hanging Fruit
Discovered By: User Report
 Bug Category: Parent values: Correctness(12982)Level 1 values: Semantic 
Failure(12988)
   Status: Open  (was: Triage Needed)

> Fix errors on repairing empty keyspace
> --
>
> Key: CASSANDRA-15142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Tool/nodetool
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
>
> Running repairs on empty keyspaces will produce a rather confusing error in 
> trunk:
> {noformat}
> ERROR [Repair-Task:1] 2019-05-24 10:36:20,323 RepairRunnable.java:274 - 
> Repair 014607d0-7dff-11e9-9256-158db058ccc5 failed:
> java.lang.IllegalArgumentException: repair sessions cannot operate on 
> multiple keyspaces
> ▸  at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:135)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService$ParentRepairSession.(ActiveRepairService.java:566)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService.registerParentRepairSession(ActiveRepairService.java:484)
> ▸  at 
> org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:395)
> ▸  at 
> org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:269)
> ▸  at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ▸  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ▸  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ▸  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ▸  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ▸  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> ▸  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Let's ignore empty keyspaces and return a success return status instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15142) Fix errors on repairing empty keyspace

2019-05-27 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-15142:
--

 Summary: Fix errors on repairing empty keyspace
 Key: CASSANDRA-15142
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15142
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Repair, Tool/nodetool
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski


Running repairs on empty keyspaces will produce a rather confusing error in 
trunk:

{noformat}
ERROR [Repair-Task:1] 2019-05-24 10:36:20,323 RepairRunnable.java:274 - Repair 
014607d0-7dff-11e9-9256-158db058ccc5 failed:
java.lang.IllegalArgumentException: repair sessions cannot operate on multiple 
keyspaces
▸  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:135)
▸  at 
org.apache.cassandra.service.ActiveRepairService$ParentRepairSession.(ActiveRepairService.java:566)
▸  at 
org.apache.cassandra.service.ActiveRepairService.registerParentRepairSession(ActiveRepairService.java:484)
▸  at 
org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:395)
▸  at 
org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:269)
▸  at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
▸  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
▸  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
▸  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
▸  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
▸  at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
▸  at java.lang.Thread.run(Thread.java:748)
{noformat}

Let's ignore empty keyspaces and return a success return status instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15133) Node restart causes unnecessary token metadata update

2019-05-27 Thread Ted Petersson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848639#comment-16848639
 ] 

Ted Petersson commented on CASSANDRA-15133:
---

Added some minor comments to your patch on github

> Node restart causes unnecessary token metadata update
> -
>
> Key: CASSANDRA-15133
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15133
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> Restarting a node causes gossip generation update. When it propagates the 
> message to the cluster, every node blindly update its local token metadata 
> even it is not changed. Update token metadata is expensive for large vnode 
> cluster and causes token metadata cache unnecessarily invalided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org