[jira] [Commented] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets

2019-10-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947206#comment-16947206
 ] 

Jon Meredith commented on CASSANDRA-15318:
--

Created a new branch against trunk now CASSANDRA-15319 is merged.

[Branch|https://github.com/jonmeredith/cassandra/tree/CASSANDRA-15318] 
[GItHub PR|https://github.com/apache/cassandra/pull/363]
[CircleCI|https://circleci.com/workflow-run/cbd7e18d-e452-4a8c-aa84-29f1285d38ba]

> sendMessagesToNonlocalDC() should shuffle targets
> -
>
> Key: CASSANDRA-15318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> To better spread load and reduce the impact of a node failure before 
> detection (or other issues like issues host replacement), when forwarding 
> messages to other data centers the forwarding non-local dc nodes should be 
> selected at random rather than always selecting the first node in the list of 
> endpoints for a token.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15319) Add support for network topology and tracing to in-JVM dtests.

2019-10-01 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942237#comment-16942237
 ] 

Jon Meredith commented on CASSANDRA-15319:
--

Thanks for the review, I've renamed as requested.

> Add support for network topology and tracing to in-JVM dtests.
> --
>
> Key: CASSANDRA-15319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> While working on CASSANDRA-15318, testing it properly with an in-JVM test 
> requires setting up the network topology and tracing requests to check which 
> nodes performed forwarding.
>   
> In support of testing, make it possible to create in-JVM clusters with nodes 
> appearing in different datacenter/racks and add support for executing queries 
> with tracing enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15329) in-JVM dtest fails on java 11 since system ClassLoader is not a URLClassLoader

2019-09-25 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938046#comment-16938046
 ] 

Jon Meredith commented on CASSANDRA-15329:
--

+1 from me, this makes it possible to run the in-JVM dtests with JDK11.  I 
still get some failures (including a SIGBUS inside IntellIJ when running the 
DistributedReadWriteTest suite, may be a local set up issue) and StreamingTest 
fails when run as {{ant test-jvm-dtest}} but it's much better than all tests 
failing.

> in-JVM dtest fails on java 11 since system ClassLoader is not a URLClassLoader
> --
>
> Key: CASSANDRA-15329
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15329
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Attachments: CASSANDRA-15329.patch
>
>
> When running the in-JVM dtests on on java 11 they fail while trying to cast 
> the Versions.class.getClassLoader to URLClassLoader, which is no longer the 
> default ClassLoader on java 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15319) Add support for network topology and tracing to in-JVM dtests.

2019-09-24 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937275#comment-16937275
 ] 

Jon Meredith commented on CASSANDRA-15319:
--

[2.2|https://circleci.com/workflow-run/b10df64e-eb8c-43e4-81d6-cea5f8afda8c]
[3.0|https://circleci.com/workflow-run/3f31614b-dbeb-4f18-8995-dd8deb3b47ce]
[3.11|https://circleci.com/workflow-run/6e9c2cff-901e-4e74-a9a6-225d80ba510f]
[trunk|https://circleci.com/workflow-run/cb161886-9ce3-4d08-aa19-111f1ebce953]

> Add support for network topology and tracing to in-JVM dtests.
> --
>
> Key: CASSANDRA-15319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> While working on CASSANDRA-15318, testing it properly with an in-JVM test 
> requires setting up the network topology and tracing requests to check which 
> nodes performed forwarding.
>   
> In support of testing, make it possible to create in-JVM clusters with nodes 
> appearing in different datacenter/racks and add support for executing queries 
> with tracing enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15319) Add support for network topology and tracing to in-JVM dtests.

2019-09-24 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937273#comment-16937273
 ] 

Jon Meredith commented on CASSANDRA-15319:
--

Thanks for the feedback from both of you.  I've merged in Dinesh's suggestions 
which clean things up nicely and in the process discovered some places where 
Instances were being called before they had run startup() by calling 
IInstance.getMessageVersion/IInstance.getSchemaVersion.  With all that fixed, 
the in-jvm single version and upgrade version tests now pass on my local 
machine on all supported versions (no 2.2 upgrade tests).

Branches pushed up to let CircleCI have another swing at it.

> Add support for network topology and tracing to in-JVM dtests.
> --
>
> Key: CASSANDRA-15319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> While working on CASSANDRA-15318, testing it properly with an in-JVM test 
> requires setting up the network topology and tracing requests to check which 
> nodes performed forwarding.
>   
> In support of testing, make it possible to create in-JVM clusters with nodes 
> appearing in different datacenter/racks and add support for executing queries 
> with tracing enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15327) Deleted data can re-appear if range movement streaming time exceeds gc_grace_seconds

2019-09-17 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931641#comment-16931641
 ] 

Jon Meredith commented on CASSANDRA-15327:
--

Thanks for double-checking [~jbaker200]. 

> Deleted data can re-appear if range movement streaming time exceeds 
> gc_grace_seconds
> 
>
> Key: CASSANDRA-15327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Leon Zaruvinsky
>Priority: Normal
> Fix For: 2.2.15, 2.1.x
>
> Attachments: CASSANDRA-15327-2.1.txt, CASSANDRA-15327-2.2.txt
>
>
> Hey,
> We've come across a scenario in production (noticed on Cassandra 2.2.14) 
> where data that is deleted from Cassandra at consistency {{ALL}} can be 
> resurrected.  I've added a reproduction in a comment.
> If a {{delete}} is issued during a range movement (i.e. bootstrap, 
> decommission, move), and {{gc_grace_seconds}} is surpassed before the stream 
> is finished, then the tombstones from the {{delete}} can be purged from the 
> recipient node before the data is streamed. Once the move is complete, the 
> data now exists on the recipient node without a tombstone.
> We noticed this because our bootstrapping time occasionally exceeds our 
> configured gc_grace_seconds, so we lose the consistency guarantee.  As an 
> operator, it would be great to not have to worry about this edge case.
> I've attached a patch that we have tested and successfully used in 
> production, and haven't noticed any ill effects.  Happy to submit patches for 
> more recent versions, I'm not sure how cleanly this will actually merge since 
> there was some refactoring to this logic in 3.x.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15327) Deleted data can re-appear if range movement streaming time exceeds gc_grace_seconds

2019-09-16 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930962#comment-16930962
 ] 

Jon Meredith commented on CASSANDRA-15327:
--

Thanks for reporting the issue.  From what you're describing Cassandra is 
behaving as designed, the gc_grace_seconds should be set long enough that you 
can complete repair on the cluster.

Configuring an absolute gcgc is not ideal as you cannot always control how long 
repair will take in the face of outages and other issues.
CASSANDRA-6434 implements a better method in 3.0 where gcgs is automatically 
set based on repair times.


> Deleted data can re-appear if range movement streaming time exceeds 
> gc_grace_seconds
> 
>
> Key: CASSANDRA-15327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Leon Zaruvinsky
>Priority: Normal
> Fix For: 2.2.15, 2.1.x
>
> Attachments: CASSANDRA-15327-2.1.txt, CASSANDRA-15327-2.2.txt
>
>
> Hey,
> We've come across a scenario in production (noticed on Cassandra 2.2.14) 
> where data that is deleted from Cassandra at consistency {{ALL}} can be 
> resurrected.  I've added a reproduction in a comment.
> If a {{delete}} is issued during a range movement (i.e. bootstrap, 
> decommission, move), and {{gc_grace_seconds}} is surpassed before the stream 
> is finished, then the tombstones from the {{delete}} can be purged from the 
> recipient node before the data is streamed. Once the move is complete, the 
> data now exists on the recipient node without a tombstone.
> We noticed this because our bootstrapping time occasionally exceeds our 
> configured gc_grace_seconds, so we lose the consistency guarantee.  As an 
> operator, it would be great to not have to worry about this edge case.
> I've attached a patch that we have tested and successfully used in 
> production, and haven't noticed any ill effects.  Happy to submit patches for 
> more recent versions, I'm not sure how cleanly this will actually merge since 
> there was some refactoring to this logic in 3.x.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15327) Deleted data can re-appear if range movement streaming time exceeds gc_grace_seconds

2019-09-16 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15327:
-
Resolution: Won't Fix
Status: Resolved  (was: Open)

> Deleted data can re-appear if range movement streaming time exceeds 
> gc_grace_seconds
> 
>
> Key: CASSANDRA-15327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15327
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Leon Zaruvinsky
>Priority: Normal
> Fix For: 2.2.15, 2.1.x
>
> Attachments: CASSANDRA-15327-2.1.txt, CASSANDRA-15327-2.2.txt
>
>
> Hey,
> We've come across a scenario in production (noticed on Cassandra 2.2.14) 
> where data that is deleted from Cassandra at consistency {{ALL}} can be 
> resurrected.  I've added a reproduction in a comment.
> If a {{delete}} is issued during a range movement (i.e. bootstrap, 
> decommission, move), and {{gc_grace_seconds}} is surpassed before the stream 
> is finished, then the tombstones from the {{delete}} can be purged from the 
> recipient node before the data is streamed. Once the move is complete, the 
> data now exists on the recipient node without a tombstone.
> We noticed this because our bootstrapping time occasionally exceeds our 
> configured gc_grace_seconds, so we lose the consistency guarantee.  As an 
> operator, it would be great to not have to worry about this edge case.
> I've attached a patch that we have tested and successfully used in 
> production, and haven't noticed any ill effects.  Happy to submit patches for 
> more recent versions, I'm not sure how cleanly this will actually merge since 
> there was some refactoring to this logic in 3.x.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets

2019-09-09 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925876#comment-16925876
 ] 

Jon Meredith commented on CASSANDRA-15318:
--

Linked is a patch to shuffle the node used for forwarding messages to remote 
DCs. The MessageForwardingTest
has been updated to verify that the remote node is picked as the forwarder at 
least once
(it doesn't check fairness, I didn't want to risk any flappy tests due to 
randomness).

trunk [changes | https://github.com/jonmeredith/cassandra/pull/1 ] [trunk 
CircleCI | 
https://circleci.com/workflow-run/c418-f747-4665-be7a-7a97ffcc]

(The PR is against my local CASSANDRA-15319 branch to make the diff clear, can 
retarget/reopen as needed)

> sendMessagesToNonlocalDC() should shuffle targets
> -
>
> Key: CASSANDRA-15318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> To better spread load and reduce the impact of a node failure before 
> detection (or other issues like issues host replacement), when forwarding 
> messages to other data centers the forwarding non-local dc nodes should be 
> selected at random rather than always selecting the first node in the list of 
> endpoints for a token.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15319) Add support for network topology and tracing to in-JVM dtests.

2019-09-09 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925779#comment-16925779
 ] 

Jon Meredith commented on CASSANDRA-15319:
--

Here's a set of PRs that introduces rack awareness for in-JVM dtests and adds 
support for tracing. The MessageForwardingTest
 uses both of them to send 100 messages and verifies that the remote nodes 
receive the right number of
 messages (there was a double-forwarding bug introduced with TR that this test 
found that [~iamaleksey] has already fixed on trunk)

There's a few other small cleanups to the in-JVM code, including closing the 
test cluster in the RepairTest.

2.2 [changes |https://github.com/apache/cassandra/pull/354] | [CircleCI 
|https://circleci.com/workflow-run/92d23778-f8dc-4317-a780-4b88f244ebc7]
 3.0 [changes |https://github.com/apache/cassandra/pull/355] | [CircleCI 
|https://circleci.com/workflow-run/12d525a2-9937-40f8-b602-79909eaaa0ac]
 3.11 [changes |https://github.com/apache/cassandra/pull/356] | [CircleCI 
|https://circleci.com/workflow-run/28e20dcd-1d06-4129-801f-5865caf6c915]
 trunk [changes |https://github.com/apache/cassandra/pull/357] | [CircleCI 
|https://circleci.com/workflow-run/cf57fe2e-0f0f-4d74-bad5-32c7d7fedeee]

> Add support for network topology and tracing to in-JVM dtests.
> --
>
> Key: CASSANDRA-15319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15319
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While working on CASSANDRA-15318, testing it properly with an in-JVM test 
> requires setting up the network topology and tracing requests to check which 
> nodes performed forwarding.
>   
> In support of testing, make it possible to create in-JVM clusters with nodes 
> appearing in different datacenter/racks and add support for executing queries 
> with tracing enabled.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15319) Add support for network topology and tracing to in-JVM dtests.

2019-09-08 Thread Jon Meredith (Jira)
Jon Meredith created CASSANDRA-15319:


 Summary: Add support for network topology and tracing to in-JVM 
dtests.
 Key: CASSANDRA-15319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15319
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/dtest
Reporter: Jon Meredith


While working on CASSANDRA-15318, testing it properly with an in-JVM test 
requires setting up the network topology and tracing requests to check which 
nodes performed forwarding.
  
In support of testing, make it possible to create in-JVM clusters with nodes 
appearing in different datacenter/racks and add support for executing queries 
with tracing enabled.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15318) sendMessagesToNonlocalDC() should shuffle targets

2019-09-08 Thread Jon Meredith (Jira)
Jon Meredith created CASSANDRA-15318:


 Summary: sendMessagesToNonlocalDC() should shuffle targets
 Key: CASSANDRA-15318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15318
 Project: Cassandra
  Issue Type: Improvement
  Components: Messaging/Internode
Reporter: Jon Meredith
Assignee: Jon Meredith


To better spread load and reduce the impact of a node failure before detection 
(or other issues like issues host replacement), when forwarding messages to 
other data centers the forwarding non-local dc nodes should be selected at 
random rather than always selecting the first node in the list of endpoints for 
a token.
 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-26 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916085#comment-16916085
 ] 

Jon Meredith edited comment on CASSANDRA-15277 at 8/26/19 7:30 PM:
---

I went ahead and changed SEPExecutor to understand negative permits as resizing 
and give up their work permits, added support for nodetool to get/set 
concurrency, tried to clear up confusion with the JMXEnabledThreadPoolExecutor 
pool size getters/setters and cleaned up the unnecessary 
JMXConfigurableThreadPoolExecutor as it doesn't add anything to 
JMXEnabledPoolExecutor any more.

Also, switched the code to be based on CASSANDRA-15227 as there are changes to 
Stage/StageManager that impact this work and it seemed easier to make the 
changes after.


was (Author: jmeredithco):
I went ahead and changed SEPExecutor to understand negative permits as resizing 
and give up their work permits, added support for nodetool to get/set 
concurrency, tried to clear up confusion with the JMXEnabledThreadPoolExecutor 
pool size getters/setters and cleaned up the unnecessary 
JMXConfigurableThreadPoolExecutor as it doesn't add anything to 
JMXEnabledPoolExecutor any more.

> Make it possible to resize concurrent read / write thread pools at runtime
> --
>
> Key: CASSANDRA-15277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To better mitigate cluster overload the executor services for various stages 
> should be configurable at runtime (probably as a JMX hot property). 
> Related to CASSANDRA-5044, this would add the capability to resize to 
> multiThreadedLowSignalStage pools based on SEPExecutor.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-26 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916085#comment-16916085
 ] 

Jon Meredith commented on CASSANDRA-15277:
--

I went ahead and changed SEPExecutor to understand negative permits as resizing 
and give up their work permits, added support for nodetool to get/set 
concurrency, tried to clear up confusion with the JMXEnabledThreadPoolExecutor 
pool size getters/setters and cleaned up the unnecessary 
JMXConfigurableThreadPoolExecutor as it doesn't add anything to 
JMXEnabledPoolExecutor any more.

> Make it possible to resize concurrent read / write thread pools at runtime
> --
>
> Key: CASSANDRA-15277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To better mitigate cluster overload the executor services for various stages 
> should be configurable at runtime (probably as a JMX hot property). 
> Related to CASSANDRA-5044, this would add the capability to resize to 
> multiThreadedLowSignalStage pools based on SEPExecutor.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-21 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912466#comment-16912466
 ] 

Jon Meredith commented on CASSANDRA-15272:
--

CASSANDRA-15170 changes are now merged, you should be good to commit when ready.

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-19 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910709#comment-16910709
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Switched to the simpler/safer StreamingInboundHandler and clearly documented 
it's limitations.  Also found a possible issue with close() causing the helper 
I/O thread to poll forever.

[CircleCI 
run|https://circleci.com/workflow-run/e1440c87-8d12-479e-9119-8116a0a09152]

 

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-16 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909454#comment-16909454
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I tracked the problem down to the StreamingInboundHandler shutdown code.  I 
couldn't work out why it was failing the dtests, but perhaps the code being 
called in
{{session.messageReceived}} didn't like to be thread interrupted.

I've added a commit to revert the original change, and pushed up a new one that 
re-implements the set tracking the active handlers and switched to tracking 
with a weak reference as you suggested.

Here's a clean(ish) [CircleCI 
run|https://circleci.com/workflow-run/5ed4f520-c378-4760-9e42-c000b5c1946b], 
{{test_simple_repair_order_preserving - repair_tests.repair_test.TestRepair}} 
failed, but there were flaky test comments in there too so not sure how 
reliable a test it is.

If you're happy with the change, please can you squash the two new commits into 
the merge commit when you push to origin.


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-16 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909356#comment-16909356
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I've broken the trunk changes up into smaller commits and the issue is with the 
changes to StreamingInboundHandler which would only affect trunk.

This commit breaks the python dtests 
https://github.com/jonmeredith/cassandra/commit/da07569f65ae0eb248488295d5b0a70a8039ee6a

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-15 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908724#comment-16908724
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

2.2-3.11 look good.  Lots of errors on trunk - unclear whether it's some bad 
cloud whether or an issue with the PR.  Rerunning and will investigate further 
tomorrow, please hold off on merging.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-15 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908274#comment-16908274
 ] 

Jon Meredith edited comment on CASSANDRA-15170 at 8/16/19 3:37 AM:
---

Alright, squashed it all down to a commit against 2.2 and merge commits up 
through against the current origin.  There's a CircleCI commit tacked on the 
end of each which needs to be dropped before merge.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-2.2] | 
[CircleCI|https://circleci.com/workflow-run/db122979-d47a-4d3a-a0e5-0cedef411a46]
3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-3.0] | 
[CircleCI|https://circleci.com/workflow-run/370d6e4c-6ae8-4167-967a-a88e7b906582]
3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-3.11] 
| 
[CircleCI|https://circleci.com/workflow-run/aecdc4ac-c267-435e-8d2e-4f238e02d9be]
trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-trunk] 
| [CircleCI 
J8|https://circleci.com/workflow-run/eb1ad032-7b76-4aa2-a975-4ba6496bb74f] | 
[CircleCI 
J11|https://circleci.com/workflow-run/c8da2408-6caa-474c-894b-e41c029e2139]

Thanks for all the effort reviewing/rereviewing.


was (Author: jmeredithco):
Apologies, messed up the 3.11 [CircleCI 
link|https://circleci.com/workflow-run/aecdc4ac-c267-435e-8d2e-4f238e02d9be]

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-15 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908274#comment-16908274
 ] 

Jon Meredith edited comment on CASSANDRA-15170 at 8/16/19 3:23 AM:
---

Apologies, messed up the 3.11 [CircleCI 
link|https://circleci.com/workflow-run/aecdc4ac-c267-435e-8d2e-4f238e02d9be]


was (Author: jmeredithco):
Alright, squashed it all down to a commit against 2.2 and merge commits up 
through against the current origin.  There's a CircleCI commit tacked on the 
end of each which needs to be dropped before merge.

 2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-2.2] | 
[CircleCI|https://circleci.com/workflow-run/db122979-d47a-4d3a-a0e5-0cedef411a46]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-3.0] | 
[CircleCI|https://circleci.com/workflow-run/370d6e4c-6ae8-4167-967a-a88e7b906582]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-3.11] 
| 
[CircleCI|https://circleci.com/workflow-run/370d6e4c-6ae8-4167-967a-a88e7b906582]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-trunk] 
| [CircleCI 
J8|https://circleci.com/workflow-run/eb1ad032-7b76-4aa2-a975-4ba6496bb74f] | 
[CircleCI 
J11|https://circleci.com/workflow-run/c8da2408-6caa-474c-894b-e41c029e2139]

Thanks for all the effort reviewing/rereviewing.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-15 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908274#comment-16908274
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Alright, squashed it all down to a commit against 2.2 and merge commits up 
through against the current origin.  There's a CircleCI commit tacked on the 
end of each which needs to be dropped before merge.

 2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-2.2] | 
[CircleCI|https://circleci.com/workflow-run/db122979-d47a-4d3a-a0e5-0cedef411a46]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-3.0] | 
[CircleCI|https://circleci.com/workflow-run/370d6e4c-6ae8-4167-967a-a88e7b906582]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-3.11] 
| 
[CircleCI|https://circleci.com/workflow-run/370d6e4c-6ae8-4167-967a-a88e7b906582]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-trunk] 
| [CircleCI 
J8|https://circleci.com/workflow-run/eb1ad032-7b76-4aa2-a975-4ba6496bb74f] | 
[CircleCI 
J11|https://circleci.com/workflow-run/c8da2408-6caa-474c-894b-e41c029e2139]

Thanks for all the effort reviewing/rereviewing.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-14 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907479#comment-16907479
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I've updated each of the branch for the inconsistent use of 
timeouts/ExecutorUtils you pointed out, and also fixed my own nit of units -> 
unit.

The only thing I noticed while doing that was that trunk was missing a couple 
of synchronized modifiers in {{SharedExecutorPool}} on {{newExecutor}} and 
{{shutdownAndWait}} that had been added in the 3.x and below series.  Paranoia 
made me include them, but I wanted to check with you if you remembered why they 
were added and if there was a reason they were not merged up to trunk?

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906691#comment-16906691
 ] 

Jon Meredith commented on CASSANDRA-15272:
--

Thanks Dinesh, that's perfect.

+1 once CASSANDRA-15170 merges (once you update the imports)

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-13 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15277:
-
 Complexity: Normal
Change Category: Operability

> Make it possible to resize concurrent read / write thread pools at runtime
> --
>
> Key: CASSANDRA-15277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To better mitigate cluster overload the executor services for various stages 
> should be configurable at runtime (probably as a JMX hot property). 
> Related to CASSANDRA-5044, this would add the capability to resize to 
> multiThreadedLowSignalStage pools based on SEPExecutor.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-13 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906591#comment-16906591
 ] 

Jon Meredith commented on CASSANDRA-15277:
--

Branch: https://github.com/jonmeredith/cassandra/tree/CASSANDRA-15277
Pull request: https://github.com/apache/cassandra/pull/340 Add support for 
resizing the SEPExecutor thread pools used by some of the work stages.

This version has the smallest change to the SEPExecutor itself and introduces a 
new flag to make the workers release and re-acquire work permits while the 
thread setting the size adds/discards work permits to get the desired maximum 
concurrency.

There are two other design choices I could explore.

1) Convert the work permit representation to signed and have worker threads 
return permits while it is non-positive. This allows the resizing thread to 
immediately exit.

2) Save introducing the resizing volatile boolean, by dedicating a bit in 
`permits` to mark when resizing is taking place - it gets checked anyway, but 
would be a slightly larger change and would reduce the maximum number of 
taskwork permits representable.

> Make it possible to resize concurrent read / write thread pools at runtime
> --
>
> Key: CASSANDRA-15277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To better mitigate cluster overload the executor services for various stages 
> should be configurable at runtime (probably as a JMX hot property). 
> Related to CASSANDRA-5044, this would add the capability to resize to 
> multiThreadedLowSignalStage pools based on SEPExecutor.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15277) Make it possible to resize concurrent read / write thread pools at runtime

2019-08-13 Thread Jon Meredith (JIRA)
Jon Meredith created CASSANDRA-15277:


 Summary: Make it possible to resize concurrent read / write thread 
pools at runtime
 Key: CASSANDRA-15277
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15277
 Project: Cassandra
  Issue Type: Improvement
  Components: Local/Other
Reporter: Jon Meredith
Assignee: Jon Meredith


To better mitigate cluster overload the executor services for various stages 
should be configurable at runtime (probably as a JMX hot property). 

Related to CASSANDRA-5044, this would add the capability to resize to 
multiThreadedLowSignalStage pools based on SEPExecutor.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15272) Enhance & reenable RepairTest

2019-08-13 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906419#comment-16906419
 ] 

Jon Meredith commented on CASSANDRA-15272:
--

+1 from me.

It's probably worth waiting until CASSANDRA-15170 lands first and it'll only 
need a minor tweak to change GOSSIP/NETWORK import path.

Only one nit, instead of commenting out the compressed test, what do you think 
about using @Ignore instead

{{@Ignore("test requires CASSANDRA-13938 to be merged")}}

That gives somebody looking at the output from test runners a better chance of 
noticing it is disabled for a reason and enabling in future.

> Enhance & reenable RepairTest
> -
>
> Key: CASSANDRA-15272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15272
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
>
> Currently the In-JVM RepairTest is not enabled on trunk (See for more info: 
> CASSANDRA-13938). This patch enables the In JVM RepairTest. It adds a new 
> test that tests the compression=off path for SSTables. It will help catch any 
> regressions in repair on this path. This does not fix the issue with the 
> compressed sstable streaming (CASSANDRA-13938). That should be addressed in 
> the original ticket.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-12 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905590#comment-16905590
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Pushed and updated - upgrade tests now work on 3.0, 3.x and trunk
 * fixed internode messaging to serialize/deserialize messages from the 
Cassandra magic number.
 * added missing call to shutdown() to the AbstractCluster.Wrapper.

 

That should be everything now and is good for final review, I have no further 
planned changes.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904092#comment-16904092
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Thanks for explaining the issue with the termination, I've gone through and 
fixed the shutdowns to match trunk and pushed commits.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904008#comment-16904008
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Thanks for the review, and apologies for the messy history. I've just pushed 
deltas to each branch this time.

Losing the Feature enum was just a misunderstanding of change history - I was 
matching trunk instead of 3.0.  Agree the enum is nicer and have implemented 
that across releases instead.

On the IsolatedExecutor shutdown executor.  When the dtests run successfully 
there isn't a problem with shutting down, it's just when there are issues with 
the instance class loaders not being released there were several cases where 
the nodeX_shutdown thread was still around and also causing a root to the 
instance class loader.  Sadly I can't reproduce it now and have lost the heap 
dump showing an example, but I know that when I was debugging it made the heap 
dump have less active threads in it which made it easier to analyze.

On to the minor items.

* For NanoTimeToCurrentTimeMillis, I've left it alone on 2.2/3.0 as trying to 
minimize change on older versions.  3.11 and above use the 
FastScheduledTaskExecutor for it.

* The ColumnFamilyStore.shutdownExecutorsAndWait uses the builder so it can add 
the perDiskflushExecutors, so I just made the code as similar as possible. 
Happy to change if you'd really like ImmutableList.of.

* The shutdown->shutdownNow change on the InfiniteLoopExecutor was just to 
match trunk naming, there's no functional change.

* We could introduce the shutdown and wait, but would mean 4 variants of it, I 
don't mind it as it is at the moment.


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-08 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903511#comment-16903511
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Hopefully this is all the feedback addressed.  I've also unified the API 
between all of the active branches so things should be stable again post 
internode messaging overhaul.  I ended up keeping my original IsolatedExecutor 
change to make the special shutdown executors as the 
{{Executors.newSingleThreadExecutor}} threads hang around a lot longer making 
debugging harder.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v6-2.2]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v6-2%2E2]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v6-3.0]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v6-3%2E0]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v6-3.11]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v6-3%2E11]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v6-trunk]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v6-trunk]


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-04 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899690#comment-16899690
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I've updated the branches and this should be ready to review.  Once you're 
happy with it we can update the commit message and squash the fixup in,
I just didn't have the heart to redo all the merging up again.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v4-2.2]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v4-2%2E2]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v4-3.0]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v4-3%2E0]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v4-3.11]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v4-3%2E11]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v4-trunk]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v4-trunk]

Unit tests / in-jvm-dtest are passing on 2.2-3.11 successfully.  There's a 
failure on trunk for 
{{org.apache.cassandra.net.ConnectionTest.testCloseIfEndpointDown}} which I 
suspect is due to the growth in the powerset of connection options and 
unrelated to the in-jvm changes.

To document the discussion we had off-ticket.

{quote}
Making {{ResourceLeakTest.doTest}} to be configurable, could also later 
automatically loop through all in-jvm dtests and run them a dozen times or so 
to see if leaks are occurring. Perhaps on each loop, we could dump the threads, 
heap utilisation and files, and check they are not growing? That way the test 
can become one that actually fails if leaks are detected, and not produce heap 
dumps etc. unless it is so detected (and perhaps preferably only produce heap 
dumps if no thread leaks are detected)
{quote}

I agree, that would be nice.  I'd rather tackle that as a separate piece of 
work under a new ticket (it may make sense to do at the same time as 
CASSANDRA-15171. It's painful trying to keep all the variations of this in sync 
at the moment.

{quote}
IsolatedExecutor not using NamedThreadFactory
{quote}

I added a comment to explain, but using NamedThreadFactory was obscuring some 
exceptions while debugging as it sometimes called lways called 
FastThreadLocal.removeAll() before it was initialized and crashed (although 
perhaps with moving unloading the classloader it would not be an issue now, I 
can't remember how to reproduce).

{quote}
I'm anyway unclear why we are using `CompletableFuture` here, when we return a 
normal `Future`
{quote}

Good point, fixed up with your suggestion.


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-23 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890996#comment-16890996
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Here's the latest - had all the updated branches passing on macOS and a 
container based on the CircleCI image.  I need to make one more pass over it 
and would like to move the gossip and streaming shutdown from trunk to 2.2 and 
merge forward.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v3-2.2]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v3-2%2E2]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v3-3.0]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v3-3%2E0]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v3-3.11]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v3-3%2E11]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v3-trunk]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v3-trunk]

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15240) Reinstate support for native libraries for in-JVM dtests

2019-07-19 Thread Jon Meredith (JIRA)
Jon Meredith created CASSANDRA-15240:


 Summary: Reinstate support for native libraries for in-JVM dtests
 Key: CASSANDRA-15240
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15240
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/dtest
Reporter: Jon Meredith


While working on CASSANDRA-15170 native libraries for libc functions, epoll 
support and openssl were observed holding gcroots to the instance class loaders 
when in-JVM dtest {{with(NETWORK)}} support was enabled. The solution for 
CASSANDRA-15170 was to disable native libraries to get everything working, but 
this is not ideal because in-JVM tests will not be testing the real code on 
that platform.

One proposed solution from [~ifesdjeen] and [~benedict] is to introduce an 
additional classloader per-Cassandra version that can be used for loading 
native libraries and share the {{CassandraVersionClassLoader}} by each instance 
of that version, enabling the {{InstanceClassLoader}} to be garbage collected.

{noformat}
CLibrary 
 com.sun.jna.Native.registeredClasses
 com.sun.jna.Native.options
 com.sun.jna.Native.registredLibraries

Netty
   io.netty.channel.ChannelException
   io.netty.channel.unix.DatagramSocketAddress
   io.netty.channel.unix.PeerCredentials

   io.netty.internal.tcnative.CertificateCallbackTask
   io.netty.internal.tcnative.CertificateVerifierTask
   io.netty.internal.tcnative.SSLPrivateKeyMethodDecryptTask
   io.netty.internal.tcnative.SSLPrivateKeyMethodSignTask
   io.netty.internal.tcnative.SSLPrivateKeyMethodTask
   io.netty.internal.tcnative.SSLTask
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-18 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888194#comment-16888194
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Still searching for issues on trunk.  The current issue I'm hitting is that the 
native libraries used by Netty for epoll support on Linux, and SSL in 
netty-tcnative hold JNI global gcroots to the instance class loader and prevent 
it from being garbage collected. Continuing to investigate.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-14 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884770#comment-16884770
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Thanks for the review - I forgot to remove a WIP commit on trunk that made the 
numClusters unused.

Renaming the log appenders makes it easier to spot left-over logging threads 
when instance class loaders do not shut down correctly.

I've seen similar exceptions too - I think they predate this fix. I reran the 
trunk tests and am still getting issues with metaspace.  I think the internode 
Netty listeners aren't shutting down as expected - it all happens in a future 
and perhaps is not being combined correctly with other futures to either 
execute it or cause shutdown to wait until complete before closing the class 
loader.

I'll investigate further, but I don't think the shutdown hooks come into play 
when using the in-jvm dtests as the node is initialized without calling 
org.apache.cassandra.service.CassandraDaemon#setup.

 

 

I'll ping this ticket when I get to the bottom of the listener threads not 
shutting down.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881425#comment-16881425
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Updated the branches with my latest efforts. [~ifesdjeen] this is ready to take 
a look at when you have some time. 

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15171) In-JVM dtests leak file handles after closing cluster

2019-06-18 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-15171:


Assignee: Jon Meredith

>  In-JVM dtests leak file handles after closing cluster
> --
>
> Key: CASSANDRA-15171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15171
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> From the work in CASSANDRA-15170, reviewing the output from lsof shows that 
> file descriptors from closed clusters are still open after all instance 
> objects and the instance class loader have been garbage collected.
> To run sustained fuzz/property based testing, file handle leaks need to be 
> minimized.
> Reproduce by running ResourceLeakTest.looperTest included as part of 
> CASSANDRA-15170 and look at `build/test/*-lsof-final.txt`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15171) In-JVM dtests leak file handles after closing cluster

2019-06-18 Thread Jon Meredith (JIRA)
Jon Meredith created CASSANDRA-15171:


 Summary:  In-JVM dtests leak file handles after closing cluster
 Key: CASSANDRA-15171
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15171
 Project: Cassandra
  Issue Type: Bug
  Components: Test/dtest
Reporter: Jon Meredith


>From the work in CASSANDRA-15170, reviewing the output from lsof shows that 
>file descriptors from closed clusters are still open after all instance 
>objects and the instance class loader have been garbage collected.

To run sustained fuzz/property based testing, file handle leaks need to be 
minimized.

Reproduce by running ResourceLeakTest.looperTest included as part of 
CASSANDRA-15170 and look at `build/test/*-lsof-final.txt`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-06-18 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866992#comment-16866992
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Branches with fixes merged up through the releases.  Anything additional being 
added is a separate commit rather than in the merge commit to make it easier to 
spot.

Duplicating the process id stuff was deliberate from HeapUtils was deliberate, 
but I'd be happy to refactor out the getpid() functionality somewhere common. I 
wanted to minimize the impact on 2.2 and thought duplicating the functions was 
the simplest path.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v2-2.2]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v2-2%2E2]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v2-3.0]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v2-3%2E0]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v2-3.11]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v2-3%2E11]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v2-trunk]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v2-trunk]

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-06-18 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-15170:


Assignee: Jon Meredith

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-06-18 Thread Jon Meredith (JIRA)
Jon Meredith created CASSANDRA-15170:


 Summary: Reduce the time needed to release in-JVM dtest cluster 
resources after close
 Key: CASSANDRA-15170
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/dtest
Reporter: Jon Meredith


There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
once the cluster is closed.

IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
this thread was still running 10s after the dtest cluster was closed.  Instead, 
switch to a ThreadPoolExecutor with a core pool size of 0 so that the thread 
executing the class loader close executes sooner.

If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
answering, it has to wait for a timeout before it exits. Instead it should 
check the isShutdown flag and terminate early if shutdown has been requested.

In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
try-with-resources construct and leaks a file handle for the directory.  This 
doesn't matter for normal usage, it leaks a file handle for each dtest Instance 
created.

On trunk, Netty global event executor threads are still running and delay GC 
for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped

2019-05-23 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846736#comment-16846736
 ] 

Jon Meredith commented on CASSANDRA-15138:
--

Just spotted your email to the user mailing list - looks like C* 3.11.4

> A cluster (RF=3) not recovering after two nodes are stopped
> ---
>
> Key: CASSANDRA-15138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Hiroyuki Yamada
>Priority: Normal
>
> I faced a weird issue when recovering a cluster after two nodes are stopped.
>  It is easily reproduce-able and looks like a bug or an issue to fix.
>  The following is a step to reproduce it.
> === STEP TO REPRODUCE ===
>  * Create a 3-node cluster with RF=3
>     - node1(seed), node2, node3
>  * Start requests to the cluster with cassandra-stress (it continues
>  until the end)
>     - what we did: cassandra-stress mixed cl=QUORUM duration=10m
>  -errors ignore -node node1,node2,node3 -rate threads\>=16
>  threads\<=256
>  - (It doesn't have to be this many threads. Can be 1)
>  * Stop node3 normally (with systemctl stop or kill (without -9))
>     - the system is still available as expected because the quorum of nodes is
>  still available
>  * Stop node2 normally (with systemctl stop or kill (without -9))
>     - the system is NOT available as expected after it's stopped.
>     - the client gets `UnavailableException: Not enough replicas
>  available for query at consistency QUORUM`
>     - the client gets errors right away (so few ms)
>     - so far it's all expected
>  * Wait for 1 mins
>  * Bring up node2 back
>     - {color:#ff}The issue happens here.{color}
>     - the client gets ReadTimeoutException` or WriteTimeoutException
>  depending on if the request is read or write even after the node2 is
>  up
>     - the client gets errors after about 5000ms or 2000ms, which are
>  request timeout for write and read request
>     - what node1 reports with `nodetool status` and what node2 reports
>  are not consistent. (node2 thinks node1 is down)
>     - It takes very long time to recover from its state
> === STEPS TO REPRODUCE ===
> Some additional important information to note:
>  * If we don't start cassandra-stress, it doesn't cause the issue.
>  * Restarting node1 and it recovers its state right after it's restarted
>  * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
>  or something) fixes the issue
>  * If we `kill -9` the nodes, then it doesn't cause the issue.
>  * Hints seems not related. I tested with hints disabled, it didn't make any 
> difference.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped

2019-05-23 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846734#comment-16846734
 ] 

Jon Meredith commented on CASSANDRA-15138:
--

Thanks for the detailed steps in the report. Which versions of Cassandra have 
you reproduced the issue with?

> A cluster (RF=3) not recovering after two nodes are stopped
> ---
>
> Key: CASSANDRA-15138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Hiroyuki Yamada
>Priority: Normal
>
> I faced a weird issue when recovering a cluster after two nodes are stopped.
>  It is easily reproduce-able and looks like a bug or an issue to fix.
>  The following is a step to reproduce it.
> === STEP TO REPRODUCE ===
>  * Create a 3-node cluster with RF=3
>     - node1(seed), node2, node3
>  * Start requests to the cluster with cassandra-stress (it continues
>  until the end)
>     - what we did: cassandra-stress mixed cl=QUORUM duration=10m
>  -errors ignore -node node1,node2,node3 -rate threads\>=16
>  threads\<=256
>  - (It doesn't have to be this many threads. Can be 1)
>  * Stop node3 normally (with systemctl stop or kill (without -9))
>     - the system is still available as expected because the quorum of nodes is
>  still available
>  * Stop node2 normally (with systemctl stop or kill (without -9))
>     - the system is NOT available as expected after it's stopped.
>     - the client gets `UnavailableException: Not enough replicas
>  available for query at consistency QUORUM`
>     - the client gets errors right away (so few ms)
>     - so far it's all expected
>  * Wait for 1 mins
>  * Bring up node2 back
>     - {color:#ff}The issue happens here.{color}
>     - the client gets ReadTimeoutException` or WriteTimeoutException
>  depending on if the request is read or write even after the node2 is
>  up
>     - the client gets errors after about 5000ms or 2000ms, which are
>  request timeout for write and read request
>     - what node1 reports with `nodetool status` and what node2 reports
>  are not consistent. (node2 thinks node1 is down)
>     - It takes very long time to recover from its state
> === STEPS TO REPRODUCE ===
> Some additional important information to note:
>  * If we don't start cassandra-stress, it doesn't cause the issue.
>  * Restarting node1 and it recovers its state right after it's restarted
>  * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
>  or something) fixes the issue
>  * If we `kill -9` the nodes, then it doesn't cause the issue.
>  * Hints seems not related. I tested with hints disabled, it didn't make any 
> difference.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15005) Configurable whilelist for UDFs

2019-05-10 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837698#comment-16837698
 ] 

Jon Meredith commented on CASSANDRA-15005:
--

I don't have a date I'm afraid. The custom function code is still pending a 
review that's queued up behind some other work. 4.0 is frozen for features so 
it may have to land in 4.x (which should happen much sooner than the gap 
between 3.x/4.0  I'll update this ticket when I know more on either count.

> Configurable whilelist for UDFs
> ---
>
> Key: CASSANDRA-15005
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15005
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Interpreter
>Reporter: A. Soroka
>Priority: Low
>
> I would like to use the UDF system to distribute some simple calculations on 
> values. For some use cases, this would require access only to some Java API 
> classes that aren't on the (hardcoded) whitelist (e.g. 
> {{java.security.MessageDigest}}). In other cases, it would require access to 
> a little non-C* library code, pre-distributed to nodes by out-of-band means.
> As I understand the situation now, the whitelist for types UDFs can use is 
> hardcoded in java in 
> [UDFunction|[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java#L99].]
> This ticket, then, is a request for a facility that would allow that list to 
> be extended via some kind of deployment-time configuration. I realize that 
> serious security concerns immediately arise for this kind of functionality, 
> but I hope that by restricting it (only used during startup, no exposing the 
> whitelist for introspection, etc.) it could be quite practical.
> I'd like very much to assist with this ticket if it is accepted. (I believe I 
> have sufficient Java skill to do that, but no real familiarity with C*'s 
> codebase, yet. :) )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15114) Cassandra does not follow user's disk_failure_policy when getWriteDirectory() runs out of disk space

2019-05-10 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837356#comment-16837356
 ] 

Jon Meredith commented on CASSANDRA-15114:
--

I think it is, but only in the next major rather than risking logging 
infrastructure we don't know/understand for no direct user benefit.

> Cassandra does not follow user's disk_failure_policy when getWriteDirectory() 
> runs out of disk space
> 
>
> Key: CASSANDRA-15114
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15114
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Low
> Fix For: 4.x
>
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following two 
> {{throw}} statements whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java
> Line: 222 & 231
> {code:java}
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to write %s to 
> %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));{code}
> {code:java}
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space to store 
> %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> messages indicate that the Cassandra node is running out of disk space. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{getWriteDirectory()}} can run out of disk space because it 
> does not throw an accurate exception class (e.g., CASSANDRA-11448). Or, the 
> callers trying to handle other {{RuntimeException}} may accidentally (and 
> incorrectly) handle the out of disk space scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15116) CommitLogArchiver.construct() throws a RuntimeException when it failed to create a directory

2019-05-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836707#comment-16836707
 ] 

Jon Meredith commented on CASSANDRA-15116:
--

This should only happen at startup and called somewhere under 
{{org.apache.cassandra.service.StartupChecks#checkSystemKeyspaceState}} and the 
exception eventually passed all the way out and caught in 
org.apache.cassandra.service.CassandraDaemon#activate logged (with no stack 
trace logged) and cause an exit. It would be nice to know if this is a 
permissions, missing directory or IO error issue. I can't see a way to get that 
out of the current {{java.io.File interface}} but somebody could move it over 
to java.nio to try for better error reporting.

> CommitLogArchiver.construct() throws a RuntimeException when it failed to 
> create a directory
> 
>
> Key: CASSANDRA-15116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15116
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Low
> Fix For: 4.x
>
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
> Line: 110
> {code:java}
> throw new RuntimeException("Unable to create directory: " + dir);{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{construct()}} failed to create a directory. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{construct()}} can fail to create a directory because it 
> does not throw any {{IOException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the directory 
> creation failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15117) CommitLogArchiver.maybeRestoreArchive() throws a RuntimeException when it failed to list a directory

2019-05-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836693#comment-16836693
 ] 

Jon Meredith commented on CASSANDRA-15117:
--

Like CASSANDRA-15116 this happens on startup and should exit the node non-zero 
with a log message about the directory. We could provide more information about 
why (permissions issue, IO issue). I don't see an operational problem here so 
any improvement to the error messages should land in 4.x.

> CommitLogArchiver.maybeRestoreArchive() throws a RuntimeException when it 
> failed to list a directory
> 
>
> Key: CASSANDRA-15117
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15117
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
> Line: 225
> {code:java}
> throw new RuntimeException("Unable to list directory " + dir);{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{maybeRestoreArchive()}} failed to list a directory. 
> This mismatch could be a problem. For example, the callers may miss the 
> possibility that {{maybeRestoreArchive()}} can fail to list a directory 
> because it does not throw any {{IOException}}. Or, the callers trying to 
> handle other {{RuntimeException}} may accidentally (and incorrectly) handle 
> the directory listing failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15117) CommitLogArchiver.maybeRestoreArchive() throws a RuntimeException when it failed to list a directory

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15117:
-
 Severity: Low
   Complexity: Low Hanging Fruit
Fix Version/s: 4.x
  Component/s: Local/Commit Log

> CommitLogArchiver.maybeRestoreArchive() throws a RuntimeException when it 
> failed to list a directory
> 
>
> Key: CASSANDRA-15117
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15117
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Low
> Fix For: 4.x
>
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
> Line: 225
> {code:java}
> throw new RuntimeException("Unable to list directory " + dir);{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{maybeRestoreArchive()}} failed to list a directory. 
> This mismatch could be a problem. For example, the callers may miss the 
> possibility that {{maybeRestoreArchive()}} can fail to list a directory 
> because it does not throw any {{IOException}}. Or, the callers trying to 
> handle other {{RuntimeException}} may accidentally (and incorrectly) handle 
> the directory listing failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15116) CommitLogArchiver.construct() throws a RuntimeException when it failed to create a directory

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15116:
-
 Severity: Low
   Complexity: Low Hanging Fruit
Discovered By: User Report
Fix Version/s: 4.x
  Component/s: Local/Commit Log

> CommitLogArchiver.construct() throws a RuntimeException when it failed to 
> create a directory
> 
>
> Key: CASSANDRA-15116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15116
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Low
> Fix For: 4.x
>
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
> Line: 110
> {code:java}
> throw new RuntimeException("Unable to create directory: " + dir);{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{construct()}} failed to create a directory. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{construct()}} can fail to create a directory because it 
> does not throw any {{IOException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the directory 
> creation failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15115) WindowsFailedSnapshotTracker.deleteOldSnapshots() throws RuntimeException when it failed to create a file

2019-05-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836679#comment-16836679
 ] 

Jon Meredith commented on CASSANDRA-15115:
--

I don't have a test setup capable of reproducing failed snapshots on Windows.  
Tracing through the code I think it will end up being handled in 
\{{org.apache.cassandra.service.CassandraDaemon#activate}} (although I'm not 
100% certain how Windows daemonizes things, but it looks like it goes that 
way).  At the moment the handler only logs the full exception on a 
configuration exception, so without adding extra logging there I don't think 
there's a lot of value in passing in the cause.

> WindowsFailedSnapshotTracker.deleteOldSnapshots() throws RuntimeException 
> when it failed to create a file
> -
>
> Key: CASSANDRA-15115
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15115
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/WindowsFailedSnapshotTracker.java
> Line: 98
> {code:java}
> throw new RuntimeException(String.format("Failed to create failed snapshot 
> tracking file [%s]. Aborting", TODELETEFILE));{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{deleteOldSnapshots()}} failed to create a file. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{deleteOldSnapshots()}} can fail to create a file because 
> it does not throw any {{IOException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the file 
> creation failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15115) WindowsFailedSnapshotTracker.deleteOldSnapshots() throws RuntimeException when it failed to create a file

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15115:
-
Resolution: Not A Bug
Status: Resolved  (was: Triage Needed)

Although the exception generation/handling code could be improved, it is 
functionally correct.

> WindowsFailedSnapshotTracker.deleteOldSnapshots() throws RuntimeException 
> when it failed to create a file
> -
>
> Key: CASSANDRA-15115
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15115
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/WindowsFailedSnapshotTracker.java
> Line: 98
> {code:java}
> throw new RuntimeException(String.format("Failed to create failed snapshot 
> tracking file [%s]. Aborting", TODELETEFILE));{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{deleteOldSnapshots()}} failed to create a file. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{deleteOldSnapshots()}} can fail to create a file because 
> it does not throw any {{IOException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the file 
> creation failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15113) StorageService.decommission() throws RuntimeException when interrupted, which results in dead code in Decommission.execute()

2019-05-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836673#comment-16836673
 ] 

Jon Meredith commented on CASSANDRA-15113:
--

I agree the rethrown exception will not be caught / handled in 
{{org.apache.cassandra.tools.nodetool.Decommission}}, however 
{{org.apache.cassandra.tools.NodeTool}} 3.11 L184 will catch all throwables and 
exit non-zero as desired.

Briefly hacking up the code shows {{nodetool}} exits non-zero on runtime 
exception.

{panel:title=nodetool output}
error: fake RuntimeException for CASSANDRA-15113
-- StackTrace --
java.lang.RuntimeException: fake RuntimeException for CASSANDRA-15113
at 
org.apache.cassandra.tools.nodetool.Decommission.execute(Decommission.java:33)
at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:255)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:169)


Process finished with exit code 2
{panel}

The wrapped runtime exception could wrapping the two checked exceptions could 
pass on the inner exception,
but again without anything broken I don't want to make changes in the minor.

> StorageService.decommission() throws RuntimeException when interrupted, which 
> results in dead code in Decommission.execute()
> 
>
> Key: CASSANDRA-15113
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15113
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: CASSANDRA-ROOT/src/java/org/apache/cassandra/service/StorageService.java
> Line: 4008 
> {code:java}
> try
> {
> ...
> Thread.sleep(timeout);
> ...
> unbootstrap(finishLeaving);
> }
> catch (InterruptedException e)
> {
> throw new RuntimeException("Node interrupted while decommissioning");
> }
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that an interrupt has occurred. This mismatch be a problem. 
> For example, the callers may miss the possibility that 
> {{StorageService.decommission()}} can be interrupted because it does not 
> throw any {{InterruptedException}}. Or,  the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the interrupt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15113) StorageService.decommission() throws RuntimeException when interrupted, which results in dead code in Decommission.execute()

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15113:
-
Resolution: Not A Bug
Status: Resolved  (was: Triage Needed)

Although the exception generation/handling code could be improved, it is 
functionally correct.

> StorageService.decommission() throws RuntimeException when interrupted, which 
> results in dead code in Decommission.execute()
> 
>
> Key: CASSANDRA-15113
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15113
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: CASSANDRA-ROOT/src/java/org/apache/cassandra/service/StorageService.java
> Line: 4008 
> {code:java}
> try
> {
> ...
> Thread.sleep(timeout);
> ...
> unbootstrap(finishLeaving);
> }
> catch (InterruptedException e)
> {
> throw new RuntimeException("Node interrupted while decommissioning");
> }
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that an interrupt has occurred. This mismatch be a problem. 
> For example, the callers may miss the possibility that 
> {{StorageService.decommission()}} can be interrupted because it does not 
> throw any {{InterruptedException}}. Or,  the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the interrupt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15114) Cassandra does not follow user's disk_failure_policy when getWriteDirectory() runs out of disk space

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15114:
-
 Severity: Low
   Complexity: Low Hanging Fruit
Discovered By: User Report
Fix Version/s: 4.x

> Cassandra does not follow user's disk_failure_policy when getWriteDirectory() 
> runs out of disk space
> 
>
> Key: CASSANDRA-15114
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15114
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Low
> Fix For: 4.x
>
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following two 
> {{throw}} statements whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java
> Line: 222 & 231
> {code:java}
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to write %s to 
> %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));{code}
> {code:java}
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space to store 
> %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> messages indicate that the Cassandra node is running out of disk space. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{getWriteDirectory()}} can run out of disk space because it 
> does not throw an accurate exception class (e.g., CASSANDRA-11448). Or, the 
> callers trying to handle other {{RuntimeException}} may accidentally (and 
> incorrectly) handle the out of disk space scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15114) Cassandra does not follow user's disk_failure_policy when getWriteDirectory() runs out of disk space

2019-05-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836678#comment-16836678
 ] 

Jon Meredith commented on CASSANDRA-15114:
--

I've reproduced the out of space condition with a small ByteMan test, and done 
some archaeology.

Looking at the first examples,  Cassandra does not treat out of space for 
compaction as a disk failure event (CASSANDRA-12385), so changing the thrown 
exception to be the more descriptive FSDiskFullWriteError would have it 
converted back to a RuntimeException here in 
[AbstractCompactionTask|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/compaction/AbstractCompactionTask.java#L63]

However, I do agree it would be more consistent for out of disk space 
conditions to be handled the same way between 
org.apache.cassandra.db.compaction.writers.CompactionAwareWriter#getWriteDirectory
 and org.apache.cassandra.db.Directories#getWriteableLocation.

I don't think the second exception will ever be thrown. 
{{org.apache.cassandra.db.Directories#getWriteableLocation}} should be marked 
{{@Nonnull}} as it throws FSDiskFullWriterError or FSWriteError if there was a 
problem with one of the candidate paths.  {{getWriteDirectory}} should just 
directly return the result from {{getDirectories().getWriteableLocation()}}.

I don't want to risk changes that could affect logging systems, but think it's 
worth outputting the correct exception in 4.next.

> Cassandra does not follow user's disk_failure_policy when getWriteDirectory() 
> runs out of disk space
> 
>
> Key: CASSANDRA-15114
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15114
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following two 
> {{throw}} statements whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java
> Line: 222 & 231
> {code:java}
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to write %s to 
> %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));{code}
> {code:java}
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space to store 
> %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> messages indicate that the Cassandra node is running out of disk space. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{getWriteDirectory()}} can run out of disk space because it 
> does not throw an accurate exception class (e.g., CASSANDRA-11448). Or, the 
> callers trying to handle other {{RuntimeException}} may accidentally (and 
> incorrectly) handle the out of disk space scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15112) StorageService.rebuild() throws RuntimeException when interrupted

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15112:
-
Fix Version/s: 4.x

> StorageService.rebuild() throws RuntimeException when interrupted
> -
>
> Key: CASSANDRA-15112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15112
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/JMX, Tool/nodetool
>Reporter: eBugs
>Priority: Low
> Fix For: 4.x
>
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: CASSANDRA-ROOT/src/java/org/apache/cassandra/service/StorageService.java
> Line: 1313 
> {code:java}
> try
> {
> ...
> // wait for result
> resultFuture.get();
> }
> catch (InterruptedException e)
> {
> throw new RuntimeException("Interrupted while waiting on rebuild 
> streaming");
> }{code}
>   
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that an interrupt has occurred. This mismatch could be a 
> problem. For example, the callers may miss the possibility that 
> {{StorageService.rebuild()}} can be interrupted because it does not throw any 
> {{InterruptedException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the interrupt.
>  
> If throwing a {{RuntimeException}} is preferred, maybe it can wrap the cause 
> exception so that the inner call stack is preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15112) StorageService.rebuild() throws RuntimeException when interrupted

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15112:
-
 Severity: Low
   Complexity: Low Hanging Fruit
Discovered By: User Report
  Component/s: Tool/nodetool
   Observability/JMX

> StorageService.rebuild() throws RuntimeException when interrupted
> -
>
> Key: CASSANDRA-15112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15112
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/JMX, Tool/nodetool
>Reporter: eBugs
>Priority: Low
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: CASSANDRA-ROOT/src/java/org/apache/cassandra/service/StorageService.java
> Line: 1313 
> {code:java}
> try
> {
> ...
> // wait for result
> resultFuture.get();
> }
> catch (InterruptedException e)
> {
> throw new RuntimeException("Interrupted while waiting on rebuild 
> streaming");
> }{code}
>   
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that an interrupt has occurred. This mismatch could be a 
> problem. For example, the callers may miss the possibility that 
> {{StorageService.rebuild()}} can be interrupted because it does not throw any 
> {{InterruptedException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the interrupt.
>  
> If throwing a {{RuntimeException}} is preferred, maybe it can wrap the cause 
> exception so that the inner call stack is preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15111) StorageService.move() throws RuntimeException when interrupted

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15111:
-
Component/s: (was: Cluster/Membership)
 Observability/JMX

> StorageService.move() throws RuntimeException when interrupted
> --
>
> Key: CASSANDRA-15111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15111
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/JMX, Tool/nodetool
>Reporter: eBugs
>Priority: Low
> Fix For: 4.x
>
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: CASSANDRA-ROOT/src/java/org/apache/cassandra/service/StorageService.java
> Line: 4168
> {code:java}
> try
> {
> relocator.stream().get();
> }
> catch (ExecutionException | InterruptedException e)
> {
> throw new RuntimeException("Interrupted while waiting for stream/fetch 
> ranges to finish: " + e.getMessage());
> }
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that an interrupt has occurred. This mismatch could be a 
> problem. For example, the callers may miss the possibility that 
> {{StorageService.move()}} can be interrupted because it does not throw any 
> {{InterruptedException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the interrupt.
>  
> If throwing a {{RuntimeException}} is preferred, maybe it can wrap the cause 
> exception so that the inner call stack is preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15112) StorageService.rebuild() throws RuntimeException when interrupted

2019-05-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836663#comment-16836663
 ] 

Jon Meredith commented on CASSANDRA-15112:
--

Similar to CASSANDRA-15112, I agree it would be better to wrap the 
InterruptedException in the RuntimeException to retain the context, but cannot 
justify risks to changing logging infrastructure in a minor. Set to 4.next.

> StorageService.rebuild() throws RuntimeException when interrupted
> -
>
> Key: CASSANDRA-15112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15112
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: CASSANDRA-ROOT/src/java/org/apache/cassandra/service/StorageService.java
> Line: 1313 
> {code:java}
> try
> {
> ...
> // wait for result
> resultFuture.get();
> }
> catch (InterruptedException e)
> {
> throw new RuntimeException("Interrupted while waiting on rebuild 
> streaming");
> }{code}
>   
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that an interrupt has occurred. This mismatch could be a 
> problem. For example, the callers may miss the possibility that 
> {{StorageService.rebuild()}} can be interrupted because it does not throw any 
> {{InterruptedException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the interrupt.
>  
> If throwing a {{RuntimeException}} is preferred, maybe it can wrap the cause 
> exception so that the inner call stack is preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15111) StorageService.move() throws RuntimeException when interrupted

2019-05-09 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15111:
-
 Severity: Low
   Complexity: Low Hanging Fruit
Discovered By: User Report
Fix Version/s: 4.x
  Component/s: Tool/nodetool
   Cluster/Membership

> StorageService.move() throws RuntimeException when interrupted
> --
>
> Key: CASSANDRA-15111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15111
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Tool/nodetool
>Reporter: eBugs
>Priority: Low
> Fix For: 4.x
>
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: CASSANDRA-ROOT/src/java/org/apache/cassandra/service/StorageService.java
> Line: 4168
> {code:java}
> try
> {
> relocator.stream().get();
> }
> catch (ExecutionException | InterruptedException e)
> {
> throw new RuntimeException("Interrupted while waiting for stream/fetch 
> ranges to finish: " + e.getMessage());
> }
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that an interrupt has occurred. This mismatch could be a 
> problem. For example, the callers may miss the possibility that 
> {{StorageService.move()}} can be interrupted because it does not throw any 
> {{InterruptedException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the interrupt.
>  
> If throwing a {{RuntimeException}} is preferred, maybe it can wrap the cause 
> exception so that the inner call stack is preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15111) StorageService.move() throws RuntimeException when interrupted

2019-05-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836656#comment-16836656
 ] 

Jon Meredith commented on CASSANDRA-15111:
--

Thanks for trying out your tool on Cassandra and reporting your findings.

The {{StorageService.move}} call is invoked by {{nodetool}} when the operator 
wants to move the tokens owned by the node. The catch is being used to convert 
from checked to unchecked exceptions. Looking at StorageServerMbean, the 
interface has gone back and forth between checked/unchecked exceptions over 
time.

I can't see a way where this could fail to be handled with current usage and 
don't think there's an immediate need to make a change, and I know that 
operational systems parse the log stream generated by Cassandra so I don't want 
to risk breaking any tooling in a patch release that I have no way to audit.

As you say it would be possible for the interrupted/execution exception to be 
wrapped (although I don't think it would ever be output anywhere tracing 
through the default logging), and agree it would be better to do that. I've 
tagged this issue for 4.next.

I didn't provoke an Interrupted/ExecutionException, but did check that 
exceptions are making it through to logs and nodetool correctly

{panel:title=log}
ERROR [RMI TCP Connection(2)-x.x.x.x] 2019-05-09 10:29:36,965 
StorageService.java:4149 - Invalid request to move(Token); This node has more 
than one token and cannot be moved thusly.
{panel}

{panel:title=nodetool outtput}
$ bin/nodetool  move 0
error: This node has more than one token and cannot be moved thusly.
-- StackTrace --
java.lang.UnsupportedOperationException: This node has more than one token and 
cannot be moved thusly.
at 
org.apache.cassandra.service.StorageService.move(StorageService.java:4150)
at 
org.apache.cassandra.service.StorageService.move(StorageService.java:4125)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at 

[jira] [Commented] (CASSANDRA-15114) Cassandra does not follow user's disk_failure_policy when getWriteDirectory() runs out of disk space

2019-05-08 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835722#comment-16835722
 ] 

Jon Meredith commented on CASSANDRA-15114:
--

Thanks for filing the bug report. I'll be investigating over the next few days. 

> Cassandra does not follow user's disk_failure_policy when getWriteDirectory() 
> runs out of disk space
> 
>
> Key: CASSANDRA-15114
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15114
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following two 
> {{throw}} statements whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java
> Line: 222 & 231
> {code:java}
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to write %s to 
> %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));{code}
> {code:java}
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space to store 
> %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> messages indicate that the Cassandra node is running out of disk space. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{getWriteDirectory()}} can run out of disk space because it 
> does not throw an accurate exception class (e.g., CASSANDRA-11448). Or, the 
> callers trying to handle other {{RuntimeException}} may accidentally (and 
> incorrectly) handle the out of disk space scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15117) CommitLogArchiver.maybeRestoreArchive() throws a RuntimeException when it failed to list a directory

2019-05-08 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835718#comment-16835718
 ] 

Jon Meredith commented on CASSANDRA-15117:
--

Thanks for filing the bug report. I'll be investigating over the next few days.

> CommitLogArchiver.maybeRestoreArchive() throws a RuntimeException when it 
> failed to list a directory
> 
>
> Key: CASSANDRA-15117
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15117
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
> Line: 225
> {code:java}
> throw new RuntimeException("Unable to list directory " + dir);{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{maybeRestoreArchive()}} failed to list a directory. 
> This mismatch could be a problem. For example, the callers may miss the 
> possibility that {{maybeRestoreArchive()}} can fail to list a directory 
> because it does not throw any {{IOException}}. Or, the callers trying to 
> handle other {{RuntimeException}} may accidentally (and incorrectly) handle 
> the directory listing failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15116) CommitLogArchiver.construct() throws a RuntimeException when it failed to create a directory

2019-05-08 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835717#comment-16835717
 ] 

Jon Meredith commented on CASSANDRA-15116:
--

Thanks for filing the bug report. I'll be investigating over the next few days.

> CommitLogArchiver.construct() throws a RuntimeException when it failed to 
> create a directory
> 
>
> Key: CASSANDRA-15116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15116
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
> Line: 110
> {code:java}
> throw new RuntimeException("Unable to create directory: " + dir);{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{construct()}} failed to create a directory. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{construct()}} can fail to create a directory because it 
> does not throw any {{IOException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the directory 
> creation failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15115) WindowsFailedSnapshotTracker.deleteOldSnapshots() throws RuntimeException when it failed to create a file

2019-05-08 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835716#comment-16835716
 ] 

Jon Meredith commented on CASSANDRA-15115:
--

Thanks for filing the bug report. I'll be investigating over the next few days.

> WindowsFailedSnapshotTracker.deleteOldSnapshots() throws RuntimeException 
> when it failed to create a file
> -
>
> Key: CASSANDRA-15115
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15115
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/WindowsFailedSnapshotTracker.java
> Line: 98
> {code:java}
> throw new RuntimeException(String.format("Failed to create failed snapshot 
> tracking file [%s]. Aborting", TODELETEFILE));{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{deleteOldSnapshots()}} failed to create a file. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{deleteOldSnapshots()}} can fail to create a file because 
> it does not throw any {{IOException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the file 
> creation failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15117) CommitLogArchiver.maybeRestoreArchive() throws a RuntimeException when it failed to list a directory

2019-05-08 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-15117:


Assignee: Jon Meredith

> CommitLogArchiver.maybeRestoreArchive() throws a RuntimeException when it 
> failed to list a directory
> 
>
> Key: CASSANDRA-15117
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15117
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
> Line: 225
> {code:java}
> throw new RuntimeException("Unable to list directory " + dir);{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{maybeRestoreArchive()}} failed to list a directory. 
> This mismatch could be a problem. For example, the callers may miss the 
> possibility that {{maybeRestoreArchive()}} can fail to list a directory 
> because it does not throw any {{IOException}}. Or, the callers trying to 
> handle other {{RuntimeException}} may accidentally (and incorrectly) handle 
> the directory listing failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15115) WindowsFailedSnapshotTracker.deleteOldSnapshots() throws RuntimeException when it failed to create a file

2019-05-08 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-15115:


Assignee: Jon Meredith

> WindowsFailedSnapshotTracker.deleteOldSnapshots() throws RuntimeException 
> when it failed to create a file
> -
>
> Key: CASSANDRA-15115
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15115
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/WindowsFailedSnapshotTracker.java
> Line: 98
> {code:java}
> throw new RuntimeException(String.format("Failed to create failed snapshot 
> tracking file [%s]. Aborting", TODELETEFILE));{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{deleteOldSnapshots()}} failed to create a file. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{deleteOldSnapshots()}} can fail to create a file because 
> it does not throw any {{IOException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the file 
> creation failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15116) CommitLogArchiver.construct() throws a RuntimeException when it failed to create a directory

2019-05-08 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-15116:


Assignee: Jon Meredith

> CommitLogArchiver.construct() throws a RuntimeException when it failed to 
> create a directory
> 
>
> Key: CASSANDRA-15116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15116
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following 
> {{throw}} statement whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
> Line: 110
> {code:java}
> throw new RuntimeException("Unable to create directory: " + dir);{code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> message indicates that {{construct()}} failed to create a directory. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{construct()}} can fail to create a directory because it 
> does not throw any {{IOException}}. Or, the callers trying to handle other 
> {{RuntimeException}} may accidentally (and incorrectly) handle the directory 
> creation failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15114) Cassandra does not follow user's disk_failure_policy when getWriteDirectory() runs out of disk space

2019-05-07 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-15114:


Assignee: Jon Meredith

> Cassandra does not follow user's disk_failure_policy when getWriteDirectory() 
> runs out of disk space
> 
>
> Key: CASSANDRA-15114
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15114
> Project: Cassandra
>  Issue Type: Bug
>Reporter: eBugs
>Assignee: Jon Meredith
>Priority: Normal
>
> Dear Cassandra developers, we are developing a tool to detect 
> exception-related bugs in Java. Our prototype has spotted the following two 
> {{throw}} statements whose exception class and error message indicate 
> different error conditions.
>  
> Version: Cassandra-3.11 (commit: 123113f7b887370a248669ee0db6fdf13df0146e) 
> File: 
> CASSANDRA-ROOT/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java
> Line: 222 & 231
> {code:java}
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to write %s to 
> %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));{code}
> {code:java}
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space to store 
> %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> {code}
>  
> {{RuntimeException}} is usually used to represent errors in the program logic 
> (think of one of its subclasses, {{NullPointerException}}), while the error 
> messages indicate that the Cassandra node is running out of disk space. This 
> mismatch could be a problem. For example, the callers may miss the 
> possibility that {{getWriteDirectory()}} can run out of disk space because it 
> does not throw an accurate exception class (e.g., CASSANDRA-11448). Or, the 
> callers trying to handle other {{RuntimeException}} may accidentally (and 
> incorrectly) handle the out of disk space scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15110) Improve logging when repair has no neighbors

2019-05-03 Thread Jon Meredith (JIRA)
Jon Meredith created CASSANDRA-15110:


 Summary: Improve logging when repair has no neighbors
 Key: CASSANDRA-15110
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15110
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Repair
Reporter: Jon Meredith


If RepairRunnable cannot find any neighbors - either due to the range, or 
filters supplied it calls addRangeToNeighbors with an empty list which triggers 
an NPE.

Help the operator understand what went wrong by logging the range, unfiltered 
neighbors and filters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15005) Configurable whilelist for UDFs

2019-04-02 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807919#comment-16807919
 ] 

Jon Meredith commented on CASSANDRA-15005:
--

Thanks for the docs and the additional tests - your modifications look good to 
me. I'll find somebody to review it and then we'll have to work out where to 
park it until trunk opens up for feature contributions.

Do you have any plans to use it before it lands in a public release?

> Configurable whilelist for UDFs
> ---
>
> Key: CASSANDRA-15005
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15005
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Interpreter
>Reporter: A. Soroka
>Priority: Low
>
> I would like to use the UDF system to distribute some simple calculations on 
> values. For some use cases, this would require access only to some Java API 
> classes that aren't on the (hardcoded) whitelist (e.g. 
> {{java.security.MessageDigest}}). In other cases, it would require access to 
> a little non-C* library code, pre-distributed to nodes by out-of-band means.
> As I understand the situation now, the whitelist for types UDFs can use is 
> hardcoded in java in 
> [UDFunction|[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java#L99].]
> This ticket, then, is a request for a facility that would allow that list to 
> be extended via some kind of deployment-time configuration. I realize that 
> serious security concerns immediately arise for this kind of functionality, 
> but I hope that by restricting it (only used during startup, no exposing the 
> whitelist for introspection, etc.) it could be quite practical.
> I'd like very much to assist with this ticket if it is accepted. (I believe I 
> have sufficient Java skill to do that, but no real familiarity with C*'s 
> codebase, yet. :) )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15064) Wrong ordering for timeuuid fields

2019-03-29 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805425#comment-16805425
 ] 

Jon Meredith commented on CASSANDRA-15064:
--

Thinking a little more on this, if you went with using server-side now() you'd 
need to ensure the same node was used to coordinate all of the INSERTs and I'm 
not sure what the default behavior of the Go client is, so perhaps doing 
something client side would be best.

> Wrong ordering for timeuuid fields
> --
>
> Key: CASSANDRA-15064
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Andreas Andersen
>Assignee: Jon Meredith
>Priority: Normal
> Attachments: example.cql
>
>
> Hi!
> We're seeing some strange behavior for the ordering of timeuuid fields. They 
> seem to be sorted in the wrong order when the clock_seq_low field in a 
> timeuuid goes from 7f to 80. Consider the following example:
> {noformat}
> cqlsh:test> show version; 
> [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] 
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   timeuuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  
> (5 rows) 
> cqlsh:test>
> {noformat}
> The expected behavior is that the rows are returned in the same order as they 
> were inserted (we inserted them with their clustering key in an ascending 
> order). Instead, the order "wraps" in the middle.
> This issue only arises when the 9th octet (clock_seq_low) in the uuid goes 
> from 7f to 80. A guess would be that the comparison is implemented as a 
> signed integer instead of an unsigned integer, as 0x7f = 127 and 0x80 = -128. 
> According to the RFC, the field should be treated as an unsigned integer: 
> [https://tools.ietf.org/html/rfc4122#section-4.1.2]
> Changing the field from a timeuuid to a uuid gives the expected correct 
> behavior:
> {noformat}
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   uuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  
> (5 rows) 
> cqlsh:test>{noformat}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15064) Wrong ordering for timeuuid fields

2019-03-29 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805069#comment-16805069
 ] 

Jon Meredith commented on CASSANDRA-15064:
--

Thanks for the additional information.  I'm not familiar with the Go client so 
thanks for the link. The Go client takes a different approach to UUID 
generation than the server-side now() function, which as I mentioned above, 
generates the clockseq once at startup time - which limits now() to generating 
a single UUID per 100ns (which is 10 million / second).

I agree with you you'll get a wrapping problem with your UUID solution.  It 
just takes the lower 14 bits from the 32-bit clockseq counter which obviously 
wraps.

What do you think about these two possibilities

# Insert using CQL now() - probably easiest
# Write an alternate UUIDFromTime

*Insert Using CQL now()*

{noformat}
INSERT INTO timeline(user_id, image_timestamp, image_id) VALUES(?, now(), ?)
{noformat}

That will generate timestamps that always sort as you expect, however you'll 
need an average insertion rate less than 10,000,000 / second  / coordinator and 
move the cost of UUID generation from client to server which should be small.

*Write an alternate UUIDFromTime to match server-side now()*

It should be straightforward to provide an alternative UUIDFromTime with the 
same semantics as CQL now().

Initialize clockseq once at startup
Keep an atomic 64-bit, lastNanos, to store the 60-bit UUID timestamp
Atomically update to be monontonically increasing.


Something like this (apologies for the pigeon Go, I don't normally write it and 
have not tested this fragment)

{noformat}
var currentNanos = atomic.LoadInt64()

//  compute nextNanos the same way as 
https://github.com/gocql/gocql/blame/master/uuid.go#L126
var nextNanos = int64(utcTime.Unix()-timeBase)*1000 + 
int64(utcTime.Nanosecond()/100)

if (thisNanos <= currentNanos) {
  thisNanos = currentNanos + 1; // If the clock is drifting backwards, just 
pick the monontonic next timestamp
}
if(!atomic.CompareAndSwapUInt64(, currentNanos, nextNanos)) {
  // If unable to swap, another thread has generated a new UUID, so the 
time should be close
  // to when this thread would have, just take the next timestamp rather 
than loop forever
  // in case of high contention.
  thisNanos = atomic.AddInt64(, 1)
}
{noformat}


What do you think?


As for this JIRA, what I'd propose is that I'll submit documentation 
improvements so that users don't rely on TimeUUID clockseq sort order, and file 
an issue under the GoCql client driver in case they missed this JIRA.


> Wrong ordering for timeuuid fields
> --
>
> Key: CASSANDRA-15064
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Andreas Andersen
>Assignee: Jon Meredith
>Priority: Normal
> Attachments: example.cql
>
>
> Hi!
> We're seeing some strange behavior for the ordering of timeuuid fields. They 
> seem to be sorted in the wrong order when the clock_seq_low field in a 
> timeuuid goes from 7f to 80. Consider the following example:
> {noformat}
> cqlsh:test> show version; 
> [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] 
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   timeuuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  
> (5 rows) 
> cqlsh:test>
> {noformat}
> The expected behavior is that the rows are returned in the same order as they 
> were inserted (we inserted them with their clustering key in an ascending 
> order). Instead, the order "wraps" in the middle.
> 

[jira] [Updated] (CASSANDRA-15064) Wrong ordering for timeuuid fields

2019-03-26 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15064:
-
Component/s: Cluster/Schema

> Wrong ordering for timeuuid fields
> --
>
> Key: CASSANDRA-15064
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Andreas Andersen
>Assignee: Jon Meredith
>Priority: Normal
> Attachments: example.cql
>
>
> Hi!
> We're seeing some strange behavior for the ordering of timeuuid fields. They 
> seem to be sorted in the wrong order when the clock_seq_low field in a 
> timeuuid goes from 7f to 80. Consider the following example:
> {noformat}
> cqlsh:test> show version; 
> [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] 
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   timeuuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  
> (5 rows) 
> cqlsh:test>
> {noformat}
> The expected behavior is that the rows are returned in the same order as they 
> were inserted (we inserted them with their clustering key in an ascending 
> order). Instead, the order "wraps" in the middle.
> This issue only arises when the 9th octet (clock_seq_low) in the uuid goes 
> from 7f to 80. A guess would be that the comparison is implemented as a 
> signed integer instead of an unsigned integer, as 0x7f = 127 and 0x80 = -128. 
> According to the RFC, the field should be treated as an unsigned integer: 
> [https://tools.ietf.org/html/rfc4122#section-4.1.2]
> Changing the field from a timeuuid to a uuid gives the expected correct 
> behavior:
> {noformat}
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   uuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  
> (5 rows) 
> cqlsh:test>{noformat}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15064) Wrong ordering for timeuuid fields

2019-03-26 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802209#comment-16802209
 ] 

Jon Meredith commented on CASSANDRA-15064:
--

I can confirm the behavior you're seeing, however I don't think it's something 
that should be changed.

Cassandra uses the ClockSeq in the manner described in section 4.1.5 as a 
mechanism to avoid generating duplicate UUIDs if the server is restarted witht 
the clock in the past, or comes up with a different node id [1]. As an aside, 
if the time reported by the OS via System.currentTimeMillis goes backwards 
while the server is running, the *timestamp* (not clock seq) will be 
constrained to be 100ns later than the previous timestamp returned until time 
catches up (or Cassandra is restarted).

The node id and clock sequence are initialized on server startup and should not 
change during execution, so I was wondering if you generated them 
synthetically? Here are the UUIDs you supplied broken into components by 
java.util.UUID

{noformat}
1 UUID 84e2c963-4ef9-11e9-b57e-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x357e]
2 UUID 84e2c963-4ef9-11e9-b57f-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x357f]
3 UUID 84e2c963-4ef9-11e9-b580-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x3580]
4 UUID 84e2c963-4ef9-11e9-b581-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x3581]
5 UUID 84e2c963-4ef9-11e9-b582-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x3582]
{noformat}

The reason the UUID [3] and TimeUUID [4] types compare differently for the 
clock sequence (and node) is that they have slightly different comparison 
functions. The TimeUUID treats clk_seq_hi as a signed byte and then 
clk_seq_low/node as a series of unsigned bytes (not a single integer), however 
the UUIDType comparison just treats it as a single signed long. It looks like 
this behavior was preserved during the performance refactor in CASSANDRA-8730.

As for the correct ordering, although the spec does say to treat the clock 
sequence as an unsigned integer, it doesn't really convey any meaning as it is 
assigned from a random number generator.  (FWIW, The RFC also recommends 
lexical ordering by time_low, time_mid, then time_hi which seemed odd to me but 
is confirmed by reading the uuid_compare function in Appendix A).

For both TimeUUID / UUID the primary ordering is by 100-nsec resolution 
timestamp first which is what we want when dealing with timestamps, then 
deterministically on the node/clockseq (although unfortunately in a different 
order depending on TimeUUID/UUID).

Changing the comparison function affects the order data is stored on disk and 
would require users to perform some kind of migration (automated or manual), 
and although it would be nice to make them both the same, and perhaps change 
the clockseq to be treated as an unsigned int to reduce surprise I don't think 
the value is there to make any change.

Just in case I've missed the point, would you mind answering a couple of 
questions

# How did you notice this issue, was it with synthetic tests or through 
production usage?
# Why does it matter to you?  Are you trying to integrate with an external 
system that expects the TimeUUIDs to be ordered in a specific way?


[1] node id for Cassandra is generated from a hash of local IP addresses as the 
comments claim the MAC address was not easily accessible from Java. I'm not 
sure if the order of the hashed addresses is stable so it's possible the same 
node with same networking could come up with a different node.  
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/utils/UUIDGen.java#L349
  The method has been changed for trunk, but assuming you're not running it.

[2] Initializing clock seq 
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/utils/UUIDGen.java#L295

[3] UUID comparison 
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/marshal/UUIDType.java#L90

[4] TimeUUID comparison 
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/marshal/TimeUUIDType.java#L44


> Wrong ordering for timeuuid fields
> --
>
> Key: CASSANDRA-15064
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Andreas Andersen
>Assignee: Jon Meredith
>Priority: Normal
> Attachments: example.cql
>
>
> Hi!
> We're seeing some strange behavior for the ordering of 

[jira] [Assigned] (CASSANDRA-15064) Wrong ordering for timeuuid fields

2019-03-26 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-15064:


Assignee: Jon Meredith

> Wrong ordering for timeuuid fields
> --
>
> Key: CASSANDRA-15064
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Andreas Andersen
>Assignee: Jon Meredith
>Priority: Normal
> Attachments: example.cql
>
>
> Hi!
> We're seeing some strange behavior for the ordering of timeuuid fields. They 
> seem to be sorted in the wrong order when the clock_seq_low field in a 
> timeuuid goes from 7f to 80. Consider the following example:
> {noformat}
> cqlsh:test> show version; 
> [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] 
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   timeuuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  
> (5 rows) 
> cqlsh:test>
> {noformat}
> The expected behavior is that the rows are returned in the same order as they 
> were inserted (we inserted them with their clustering key in an ascending 
> order). Instead, the order "wraps" in the middle.
> This issue only arises when the 9th octet (clock_seq_low) in the uuid goes 
> from 7f to 80. A guess would be that the comparison is implemented as a 
> signed integer instead of an unsigned integer, as 0x7f = 127 and 0x80 = -128. 
> According to the RFC, the field should be treated as an unsigned integer: 
> [https://tools.ietf.org/html/rfc4122#section-4.1.2]
> Changing the field from a timeuuid to a uuid gives the expected correct 
> behavior:
> {noformat}
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   uuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  
> (5 rows) 
> cqlsh:test>{noformat}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15005) Configurable whilelist for UDFs

2019-03-15 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793650#comment-16793650
 ] 

Jon Meredith commented on CASSANDRA-15005:
--

Apologies for the lack of instructions, I was rushing yesterday.

The custom functions are enabled in the config file under ‘custom_fcts’,
very similar to your list of whitelisted functions.

There’s a couple of examples under test/unit/com/example and the
test/conf/cassandra.conf file has been updated to enable them. The
functions appear in the system keyspace.

You should be able to scan through the files in
https://github.com/jonmeredith/cassandra/tree/CASSANDRA-15005-3.0/src/java/org/apache/cassandra/cql3/functions
for examples of functions and how to serialize/deserialize the byte buffers
for the arguments.

I’m going to be mostly offline until 23rd March, but would be happy to help
when I’m back if you get stuck.




> Configurable whilelist for UDFs
> ---
>
> Key: CASSANDRA-15005
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15005
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Interpreter
>Reporter: A. Soroka
>Priority: Low
>
> I would like to use the UDF system to distribute some simple calculations on 
> values. For some use cases, this would require access only to some Java API 
> classes that aren't on the (hardcoded) whitelist (e.g. 
> {{java.security.MessageDigest}}). In other cases, it would require access to 
> a little non-C* library code, pre-distributed to nodes by out-of-band means.
> As I understand the situation now, the whitelist for types UDFs can use is 
> hardcoded in java in 
> [UDFunction|[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java#L99].]
> This ticket, then, is a request for a facility that would allow that list to 
> be extended via some kind of deployment-time configuration. I realize that 
> serious security concerns immediately arise for this kind of functionality, 
> but I hope that by restricting it (only used during startup, no exposing the 
> whitelist for introspection, etc.) it could be quite practical.
> I'd like very much to assist with this ticket if it is accepted. (I believe I 
> have sufficient Java skill to do that, but no real familiarity with C*'s 
> codebase, yet. :) )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15005) Configurable whilelist for UDFs

2019-03-14 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793216#comment-16793216
 ] 

Jon Meredith commented on CASSANDRA-15005:
--

That sounds like a very reasonable way to get started if you want to modify the 
whitelist.

I've made a bit of progress on loading custom functions instead of extending 
the whitelist, here's a branch against 3.0

[https://github.com/jonmeredith/cassandra/tree/CASSANDRA-15005-3.0]

I'm not sure if the convention of exporting an `all()` method will stay, but it 
matches the classes that define functions.  I'm also not sure what version this 
could land in with the current freeze on trunk.

> Configurable whilelist for UDFs
> ---
>
> Key: CASSANDRA-15005
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15005
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Interpreter
>Reporter: A. Soroka
>Priority: Low
>
> I would like to use the UDF system to distribute some simple calculations on 
> values. For some use cases, this would require access only to some Java API 
> classes that aren't on the (hardcoded) whitelist (e.g. 
> {{java.security.MessageDigest}}). In other cases, it would require access to 
> a little non-C* library code, pre-distributed to nodes by out-of-band means.
> As I understand the situation now, the whitelist for types UDFs can use is 
> hardcoded in java in 
> [UDFunction|[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java#L99].]
> This ticket, then, is a request for a facility that would allow that list to 
> be extended via some kind of deployment-time configuration. I realize that 
> serious security concerns immediately arise for this kind of functionality, 
> but I hope that by restricting it (only used during startup, no exposing the 
> whitelist for introspection, etc.) it could be quite practical.
> I'd like very much to assist with this ticket if it is accepted. (I believe I 
> have sufficient Java skill to do that, but no real familiarity with C*'s 
> codebase, yet. :) )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15005) Configurable whilelist for UDFs

2019-02-25 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777561#comment-16777561
 ] 

Jon Meredith commented on CASSANDRA-15005:
--

Sorry for the delay getting back to you.  That's exactly the distinction.  Like 
being able to load functions written like 
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java#L170]
 from an external jar and have them wired up.

I'm hoping to have some time to see if it's feasible or not this week, I'll 
post back here when I find out.

> Configurable whilelist for UDFs
> ---
>
> Key: CASSANDRA-15005
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15005
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Interpreter
>Reporter: A. Soroka
>Priority: Minor
>
> I would like to use the UDF system to distribute some simple calculations on 
> values. For some use cases, this would require access only to some Java API 
> classes that aren't on the (hardcoded) whitelist (e.g. 
> {{java.security.MessageDigest}}). In other cases, it would require access to 
> a little non-C* library code, pre-distributed to nodes by out-of-band means.
> As I understand the situation now, the whitelist for types UDFs can use is 
> hardcoded in java in 
> [UDFunction|[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java#L99].]
> This ticket, then, is a request for a facility that would allow that list to 
> be extended via some kind of deployment-time configuration. I realize that 
> serious security concerns immediately arise for this kind of functionality, 
> but I hope that by restricting it (only used during startup, no exposing the 
> whitelist for introspection, etc.) it could be quite practical.
> I'd like very much to assist with this ticket if it is accepted. (I believe I 
> have sufficient Java skill to do that, but no real familiarity with C*'s 
> codebase, yet. :) )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15009) In-JVM Testing tooling for paging

2019-02-25 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776971#comment-16776971
 ] 

Jon Meredith commented on CASSANDRA-15009:
--

+1 to merge - I'm fine with the null's the way they are, it was just a question 
on convention, if it's unambiguous and causes no warnings with the 
compiler/findbugs it's fine.

> In-JVM Testing tooling for paging
> -
>
> Key: CASSANDRA-15009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15009
> Project: Cassandra
>  Issue Type: Test
>Reporter: Alex Petrov
>Assignee: Jon Meredith
>Priority: Major
>
> Add distributed pager to in-jvm distributed tests to allow realistic pager 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9384) Update jBCrypt dependency to version 0.4

2019-02-20 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773169#comment-16773169
 ] 

Jon Meredith commented on CASSANDRA-9384:
-

Small patches get the most comments...

For the trunk patch, what do you think about adding a 'Please recreate user 
passwords after changing this setting' to the log message, there's very little 
documentation about the setting and it might save somebody wondering why 
everything is broken after an upgrade.

For the 2.1 patch, I agree bcrypt shouldn't be updated.  What do you think 
about changing the log message to

logger.warn("!!! IMPORTANT !!! ...")

Otherwise they'll get a WARN  !!! WARNING  message

> Update jBCrypt dependency to version 0.4
> 
>
> Key: CASSANDRA-9384
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9384
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Dinesh Joshi
>Priority: Major
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x
>
>
> https://bugzilla.mindrot.org/show_bug.cgi?id=2097
> Although the bug tracker lists it as NEW/OPEN, the release notes for 0.4 
> indicate that this is now fixed, so we should update.
> Thanks to [~Bereng] for identifying the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15005) Configurable whilelist for UDFs

2019-02-19 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772473#comment-16772473
 ] 

Jon Meredith commented on CASSANDRA-15005:
--

You've found the correct place in the code for the whitelist (and blacklist) 
for functions.  I'm interested in extending functions available in CQL at the 
moment, although I'm not sure if I want to add UDFs or add additional 
functionality contained in jars distributed out of band as you describe.

Would being able to add functions through distributed jars be a possible 
alternative for your use case?

> Configurable whilelist for UDFs
> ---
>
> Key: CASSANDRA-15005
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15005
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Interpreter
>Reporter: A. Soroka
>Priority: Minor
>
> I would like to use the UDF system to distribute some simple calculations on 
> values. For some use cases, this would require access only to some Java API 
> classes that aren't on the (hardcoded) whitelist (e.g. 
> {{java.security.MessageDigest}}). In other cases, it would require access to 
> a little non-C* library code, pre-distributed to nodes by out-of-band means.
> As I understand the situation now, the whitelist for types UDFs can use is 
> hardcoded in java in 
> [UDFunction|[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java#L99].]
> This ticket, then, is a request for a facility that would allow that list to 
> be extended via some kind of deployment-time configuration. I realize that 
> serious security concerns immediately arise for this kind of functionality, 
> but I hope that by restricting it (only used during startup, no exposing the 
> whitelist for introspection, etc.) it could be quite practical.
> I'd like very much to assist with this ticket if it is accepted. (I believe I 
> have sufficient Java skill to do that, but no real familiarity with C*'s 
> codebase, yet. :) )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15009) In-JVM Testing tooling for paging

2019-02-11 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765498#comment-16765498
 ] 

Jon Meredith commented on CASSANDRA-15009:
--

I had a look at the latest state of branches, and the fix to prevent the 
executeInternal call looks good. The only minor question I had was whether 
currentPage should be explicitly initialized to null in the anonymous 
{{AbstractIterator}} returned by {{FromDistributedQuery.iterator}}? The 
implicit default of null is obviously fine, but I'm not sure what the 
conventions are for the project or if static analysis tools get upset by that 
kind of thing.

Will be interesting how the remaining dtests complete.

> In-JVM Testing tooling for paging
> -
>
> Key: CASSANDRA-15009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15009
> Project: Cassandra
>  Issue Type: Test
>Reporter: Alex Petrov
>Assignee: Jon Meredith
>Priority: Major
>
> Add distributed pager to in-jvm distributed tests to allow realistic pager 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15009) In-JVM Testing tooling for paging

2019-02-07 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762977#comment-16762977
 ] 

Jon Meredith commented on CASSANDRA-15009:
--

Nice patch and tests, I had a couple of minor comments across versions (and I'm 
assuming the CircleCI changes were just for testing).

- RowUtil - why does the new 
toObjects(List,Iteratorhttps://github.com/apache/cassandra/compare/trunk...ifesdjeen:CASSANDRA-14922-followup-3.0]
 mixed in there with that branch.   The trunk run failed with an OOM out of 
metaspace after readWithSchemaDisagreement ran, were the extra commits an 
attempt to get that passing?

> In-JVM Testing tooling for paging
> -
>
> Key: CASSANDRA-15009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15009
> Project: Cassandra
>  Issue Type: Test
>Reporter: Alex Petrov
>Assignee: Jon Meredith
>Priority: Major
>
> Add distributed pager to in-jvm distributed tests to allow realistic pager 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15009) In-JVM Testing tooling for paging

2019-02-07 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith reassigned CASSANDRA-15009:


Assignee: Jon Meredith  (was: Alex Petrov)

> In-JVM Testing tooling for paging
> -
>
> Key: CASSANDRA-15009
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15009
> Project: Cassandra
>  Issue Type: Test
>Reporter: Alex Petrov
>Assignee: Jon Meredith
>Priority: Major
>
> Add distributed pager to in-jvm distributed tests to allow realistic pager 
> tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14989) NullPointerException when SELECTing token() on only one part of a two-part partition key

2019-01-22 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749320#comment-16749320
 ] 

Jon Meredith commented on CASSANDRA-14989:
--

Thanks for the patch.  I like the refactor to clean up FunctionResolver.get. 
The only real comment I have on it is the name for maybeNativeFunction - I 
think the other functions are native functions too, the thing that's special 
about token/toJson/fromJson is they support polymorphic types and so don't fit 
in the Candidate structure something like maybeSpecialFunction or 
maybePolymorphicFunction would be more descriptive.

After that, +1 from me (not that I can commit it)

> NullPointerException when SELECTing token() on only one part of a two-part 
> partition key
> 
>
> Key: CASSANDRA-14989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14989
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
> Environment: Using {{cqlsh 5.0.1}} on a Mac OS X host system with 
> Cassandra 3.11.3 running via Docker for Mac from the official 
> {{cassandra:3.11.3}} image.
>Reporter: Manuel Kießling
>Assignee: Dinesh Joshi
>Priority: Major
>
> I have the following schema:
> {code}
> CREATE TABLE query_tests.cart_snapshots (
> cart_id uuid,
> realm text,
> snapshot_id timeuuid,
> state text,
> PRIMARY KEY ((cart_id, realm), snapshot_id)
> ) WITH CLUSTERING ORDER BY (snapshot_id DESC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> In cqlsh, I try the following query:
> {code}select token(cart_id) from cart_snapshots ;{code}
> This results in cqlsh returning {{ServerError: 
> java.lang.NullPointerException}}, and the following error in the server log:
> {code}
> DC1N1_1  | ERROR [Native-Transport-Requests-1] 2019-01-16 12:17:52,075 
> QueryMessage.java:129 - Unexpected error during query
> DC1N1_1  | java.lang.NullPointerException: null
> DC1N1_1  |   at 
> org.apache.cassandra.db.marshal.CompositeType.build(CompositeType.java:356) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.db.marshal.CompositeType.build(CompositeType.java:349) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.config.CFMetaData.serializePartitionKey(CFMetaData.java:805)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.functions.TokenFct.execute(TokenFct.java:59) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.selection.ScalarFunctionSelector.getOutput(ScalarFunctionSelector.java:61)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.selection.Selection$SelectionWithProcessing$1.getOutputRow(Selection.java:666)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.selection.Selection$ResultSetBuilder.getOutputRow(Selection.java:492)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.selection.Selection$ResultSetBuilder.newRow(Selection.java:458)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.processPartition(SelectStatement.java:860)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:790)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:438)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:416)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:289)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:117)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]

[jira] [Commented] (CASSANDRA-14989) NullPointerException when SELECTing token() on only one part of a two-part partition key

2019-01-17 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745218#comment-16745218
 ] 

Jon Meredith commented on CASSANDRA-14989:
--

Reproducible for me with a smaller reproducer. The token function expects the 
number of arguments (and probably the argument types) to match the partition 
key, however in the failing example only a partial key is passed. 
{code:java}
cqlsh:query_tests> DROP TABLE repro14989;
cqlsh:query_tests> CREATE TABLE repro14989(pk1 uuid, pk2 text, PRIMARY KEY 
((pk1, pk2)));
cqlsh:query_tests> INSERT INTO repro14989(pk1,pk2) VALUES (uuid(),'pk2');
cqlsh:query_tests> SELECT token(pk1) FROM repro14989;
ServerError: java.lang.NullPointerException
cqlsh:query_tests> SELECT token(pk1,pk2) FROM repro14989;

 system.token(pk1, pk2)

    7705645267149106563

(1 rows)
{code}

In the example above, this query should work
{code}
select token(cart_id, realm) from cart_snapshots ;
{code}

As a proposed fix, when preparing the query, Cassandra should check the 
arguments for {{token}} are suitable for serializing a partition key before 
executing the function.

> NullPointerException when SELECTing token() on only one part of a two-part 
> partition key
> 
>
> Key: CASSANDRA-14989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14989
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
> Environment: Using {{cqlsh 5.0.1}} on a Mac OS X host system with 
> Cassandra 3.11.3 running via Docker for Mac from the official 
> {{cassandra:3.11.3}} image.
>Reporter: Manuel Kießling
>Priority: Major
>
> I have the following schema:
> {code}
> CREATE TABLE query_tests.cart_snapshots (
> cart_id uuid,
> realm text,
> snapshot_id timeuuid,
> state text,
> PRIMARY KEY ((cart_id, realm), snapshot_id)
> ) WITH CLUSTERING ORDER BY (snapshot_id DESC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> In cqlsh, I try the following query:
> {code}select token(cart_id) from cart_snapshots ;{code}
> This results in cqlsh returning {{ServerError: 
> java.lang.NullPointerException}}, and the following error in the server log:
> {code}
> DC1N1_1  | ERROR [Native-Transport-Requests-1] 2019-01-16 12:17:52,075 
> QueryMessage.java:129 - Unexpected error during query
> DC1N1_1  | java.lang.NullPointerException: null
> DC1N1_1  |   at 
> org.apache.cassandra.db.marshal.CompositeType.build(CompositeType.java:356) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.db.marshal.CompositeType.build(CompositeType.java:349) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.config.CFMetaData.serializePartitionKey(CFMetaData.java:805)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.functions.TokenFct.execute(TokenFct.java:59) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.selection.ScalarFunctionSelector.getOutput(ScalarFunctionSelector.java:61)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.selection.Selection$SelectionWithProcessing$1.getOutputRow(Selection.java:666)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.selection.Selection$ResultSetBuilder.getOutputRow(Selection.java:492)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.selection.Selection$ResultSetBuilder.newRow(Selection.java:458)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.processPartition(SelectStatement.java:860)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:790)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> DC1N1_1  |   at 
> org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:438)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]

[jira] [Commented] (CASSANDRA-14915) Handle ant-optional dependency

2018-12-19 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725089#comment-16725089
 ] 

Jon Meredith commented on CASSANDRA-14915:
--

Patches all look good.  Does 
org.apache.cassandra.distributed.DistributedReadWritePathTest on trunk 
experience intermittent failures? I had a quick look but couldn't see much more 
than abnormal VM exit as the reason.

> Handle ant-optional dependency
> --
>
> Key: CASSANDRA-14915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14915
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> CASSANDRA-13117 added a JUnit task which dumps threads on unit test timeout, 
> and it depends on a class in {{org.apache.tools.ant.taskdefs.optional}} which 
> seems to not always be present depending on how {{ant}} was installed. It can 
> cause this error when building;
> {code:java}
> Throws: cassandra-trunk/build.xml:1134: taskdef A class needed by class 
> org.krummas.junit.JStackJUnitTask cannot be found:
> org/apache/tools/ant/taskdefs/optional/junit/JUnitTask  using the classloader
> AntClassLoader[/.../cassandra-trunk/lib/jstackjunit-0.0.1.jar]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14915) Handle ant-optional dependency

2018-11-30 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705253#comment-16705253
 ] 

Jon Meredith commented on CASSANDRA-14915:
--

Gave Ubuntu 16.04 a go - if you don't have {{ant-optionals}} installed the 
patch allows the {{ant build}} command to succeed, however the tests still 
won't run without {{ant-optionals}} installed as that contains the junit 
integration. 

Anyway, patch works as advertised (lowering the test timeout still triggers 
stack traces) - however I'm not sure why somebody would want to run without 
ant-junit available.

+1

> Handle ant-optional dependency
> --
>
> Key: CASSANDRA-14915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14915
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> CASSANDRA-13117 added a JUnit task which dumps threads on unit test timeout, 
> and it depends on a class in {{org.apache.tools.ant.taskdefs.optional}} which 
> seems to not always be present depending on how {{ant}} was installed. It can 
> cause this error when building;
> {code:java}
> Throws: cassandra-trunk/build.xml:1134: taskdef A class needed by class 
> org.krummas.junit.JStackJUnitTask cannot be found:
> org/apache/tools/ant/taskdefs/optional/junit/JUnitTask  using the classloader
> AntClassLoader[/.../cassandra-trunk/lib/jstackjunit-0.0.1.jar]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14915) Handle ant-optional dependency

2018-11-30 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705226#comment-16705226
 ] 

Jon Meredith commented on CASSANDRA-14915:
--

I worked out you meant ant-optional on the Debian derived ones - agreed we'll 
just need document requiring ant-junit on Fedora et al, and will test the patch 
against a distro with ant-optional available too.

> Handle ant-optional dependency
> --
>
> Key: CASSANDRA-14915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14915
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> CASSANDRA-13117 added a JUnit task which dumps threads on unit test timeout, 
> and it depends on a class in {{org.apache.tools.ant.taskdefs.optional}} which 
> seems to not always be present depending on how {{ant}} was installed. It can 
> cause this error when building;
> {code:java}
> Throws: cassandra-trunk/build.xml:1134: taskdef A class needed by class 
> org.krummas.junit.JStackJUnitTask cannot be found:
> org/apache/tools/ant/taskdefs/optional/junit/JUnitTask  using the classloader
> AntClassLoader[/.../cassandra-trunk/lib/jstackjunit-0.0.1.jar]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14915) Handle ant-optional dependency

2018-11-30 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705169#comment-16705169
 ] 

Jon Meredith commented on CASSANDRA-14915:
--

I think that class is always packaged up in the {{ant-junit}} dependency, I 
grabbed the rpm and expanded it and checked the class is there.  I couldn't 
find an {{ant-optional}} package that included it under the standard repos.

{code}
[jmeredith@localhost x]$ jar tvf ./usr/share/java/ant/ant-junit.jar | grep 
JUnitTask
1746 Wed Mar 01 08:56:26 MST 2017 
org/apache/tools/ant/taskdefs/optional/junit/JUnitTask$1.class
{code}

If you always need to install {{ant-junit}} on Fedora, does this patch still 
have value other than permitting the build task to succeed (and none of the 
test tasks)?

> Handle ant-optional dependency
> --
>
> Key: CASSANDRA-14915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14915
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> CASSANDRA-13117 added a JUnit task which dumps threads on unit test timeout, 
> and it depends on a class in {{org.apache.tools.ant.taskdefs.optional}} which 
> seems to not always be present depending on how {{ant}} was installed. It can 
> cause this error when building;
> {code:java}
> Throws: cassandra-trunk/build.xml:1134: taskdef A class needed by class 
> org.krummas.junit.JStackJUnitTask cannot be found:
> org/apache/tools/ant/taskdefs/optional/junit/JUnitTask  using the classloader
> AntClassLoader[/.../cassandra-trunk/lib/jstackjunit-0.0.1.jar]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14915) Handle ant-optional dependency

2018-11-30 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705028#comment-16705028
 ] 

Jon Meredith commented on CASSANDRA-14915:
--

I've confirmed the patch allows you to get beyond the initial error in the 
description on Fedora 26 and complete an {{ant build}}. However, it still 
requires ant-junit to be installed to get any of the test targets to complete 
(which are needed for {{jar}}/{{package}} targets).

Is this what you were hoping to achieve, or did you also expect the unit tests 
to be able to run with just {{ant}} and {{java-1.8.0-openjdk-devel}} installed?

An alternative would be to be more explicit about which dependencies are needed 
to build on Fedora in the docs.

 

 

> Handle ant-optional dependency
> --
>
> Key: CASSANDRA-14915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14915
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> CASSANDRA-13117 added a JUnit task which dumps threads on unit test timeout, 
> and it depends on a class in {{org.apache.tools.ant.taskdefs.optional}} which 
> seems to not always be present depending on how {{ant}} was installed. It can 
> cause this error when building;
> {code:java}
> Throws: cassandra-trunk/build.xml:1134: taskdef A class needed by class 
> org.krummas.junit.JStackJUnitTask cannot be found:
> org/apache/tools/ant/taskdefs/optional/junit/JUnitTask  using the classloader
> AntClassLoader[/.../cassandra-trunk/lib/jstackjunit-0.0.1.jar]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14832) Other threads can take all newly allocated BufferPool chunks before original and cause reallocation

2018-11-26 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699352#comment-16699352
 ] 

Jon Meredith commented on CASSANDRA-14832:
--

PR https://github.com/apache/cassandra/pull/288

> Other threads can take all newly allocated BufferPool chunks before original 
> and cause reallocation
> ---
>
> Key: CASSANDRA-14832
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14832
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jon Meredith
>Priority: Minor
>
> When BufferPool does not have any free Chunks to satisfy a request, the 
> calling thread allocates a new large block of memory which it breaks up into 
> chunks and adds to the free chunks queue, then pulls from the queue to 
> satisfy it's own allocation.
> If enough other threads request chunks it is possible for the queue to be 
> exhausted before the original allocating thread is able to pull of it's own 
> allocation, causing the original allocator to loop and attempt to allocate 
> more memory.  This is unfair to the original caller and may cause it to block 
> on a system call to allocate more memory.
> Instead of the current behavior, allocateMoreChunks could hold back one of 
> the chunks and return it to the caller instead so that it will at most call 
> allocate once.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14832) Other threads can take all newly allocated BufferPool chunks before original and cause reallocation

2018-11-26 Thread Jon Meredith (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-14832:
-
Status: Patch Available  (was: Open)

> Other threads can take all newly allocated BufferPool chunks before original 
> and cause reallocation
> ---
>
> Key: CASSANDRA-14832
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14832
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jon Meredith
>Priority: Minor
>
> When BufferPool does not have any free Chunks to satisfy a request, the 
> calling thread allocates a new large block of memory which it breaks up into 
> chunks and adds to the free chunks queue, then pulls from the queue to 
> satisfy it's own allocation.
> If enough other threads request chunks it is possible for the queue to be 
> exhausted before the original allocating thread is able to pull of it's own 
> allocation, causing the original allocator to loop and attempt to allocate 
> more memory.  This is unfair to the original caller and may cause it to block 
> on a system call to allocate more memory.
> Instead of the current behavior, allocateMoreChunks could hold back one of 
> the chunks and return it to the caller instead so that it will at most call 
> allocate once.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14806) CircleCI workflow improvements and Java 11 support

2018-10-23 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660687#comment-16660687
 ] 

Jon Meredith commented on CASSANDRA-14806:
--

Looks like we had the same ideas on the CircleCI config - I was trying to add 
targets that generated coverage information with JaCoCo in CASSANDRA-14788. 
Obviously the two changes conflict, I can look at reworking the coverage 
changes on top of this one once it has merged, or you're welcome to incorporate 
the changes in this patch if you would prefer.

Also, I'm not sure if you've had problems running the burn tests on the high 
resource configuration. I hit problems locally on 8 / 12 core machines running 
it on machines that are not configured to use very large heaps.  
CASSANDRA-14790 makes the long buffer test pass for me, however the long btree 
test has been a little flaky just running locally.

 

> CircleCI workflow improvements and Java 11 support
> --
>
> Key: CASSANDRA-14806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14806
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build, Testing
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
>
> The current CircleCI config could use some cleanup and improvements. First of 
> all, the config has been made more modular by using the new CircleCI 2.1 
> executors and command elements. Based on CASSANDRA-14713, there's now also a 
> Java 11 executor that will allow running tests under Java 11. The {{build}} 
> step will be done using Java 11 in all cases, so we can catch any regressions 
> for that and also test the Java 11 multi-jar artifact during dtests, that 
> we'd also create during the release process.
> The job workflow has now also been changed to make use of the [manual job 
> approval|https://circleci.com/docs/2.0/workflows/#holding-a-workflow-for-a-manual-approval]
>  feature, which now allows running dtest jobs only on request and not 
> automatically with every commit. The Java8 unit tests still do, but that 
> could also be easily changed if needed. See [example 
> workflow|https://circleci.com/workflow-run/be25579d-3cbb-4258-9e19-b1f571873850]
>  with start_ jobs being triggers needed manual approval for starting the 
> actual jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14832) Other threads can take all newly allocated BufferPool chunks before original and cause reallocation

2018-10-18 Thread Jon Meredith (JIRA)
Jon Meredith created CASSANDRA-14832:


 Summary: Other threads can take all newly allocated BufferPool 
chunks before original and cause reallocation
 Key: CASSANDRA-14832
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14832
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jon Meredith


When BufferPool does not have any free Chunks to satisfy a request, the calling 
thread allocates a new large block of memory which it breaks up into chunks and 
adds to the free chunks queue, then pulls from the queue to satisfy it's own 
allocation.

If enough other threads request chunks it is possible for the queue to be 
exhausted before the original allocating thread is able to pull of it's own 
allocation, causing the original allocator to loop and attempt to allocate more 
memory.  This is unfair to the original caller and may cause it to block on a 
system call to allocate more memory.

Instead of the current behavior, allocateMoreChunks could hold back one of the 
chunks and return it to the caller instead so that it will at most call 
allocate once.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion

2018-10-17 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653787#comment-16653787
 ] 

Jon Meredith commented on CASSANDRA-14790:
--

You're right, I was reading it wrong and I agree it's not a bug - thanks for 
looking at it.

I've already got a patch for the change, should it go on this ticket or open a 
new one?

> LongBufferPoolTest burn test fails assertion
> 
>
> Key: CASSANDRA-14790
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14790
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
> Environment: Run under macOS 10.13.6, with patch (attached, but also 
> https://github.com/jonmeredith/cassandra/tree/failing-burn-test)
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, 
> 0002-Initialize-before-running-LongBufferPoolTest.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The LongBufferPoolTest from the burn tests fails with an assertion error.  I 
> added a build target to run individual burn tests, and \{jasobrown} gave a 
> fix for the uninitialized test setup (attached), however the test now fails 
> on an assertion about recycling buffers.
> To reproduce (with patch applied)
> {{ant burn-testsome 
> -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest 
> -Dtest.methods=testAllocate}}
> Output
> {{    [junit] Testcase: 
> testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}}
> {{    [junit] null}}
> {{    [junit] junit.framework.AssertionFailedError}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}}
> All major branches from 3.0 and later have issues, however the trunk branch 
> also warns about references not being released before the reference is 
> garbage collected.
> {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - 
> LEAK DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was 
> not released before the reference was garbage collected}}
> {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - 
> Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}}
> {{ [junit] Thread[pool-2-thread-24,5,main]}}
> {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}}
> {{ [junit] at 
> org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:245)}}
> {{ [junit] at 
> org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:175)}}
> {{ [junit] at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:97)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.setAttachment(BufferPool.java:663)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:803)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:793)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.get(BufferPool.java:388)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.maybeTakeFromPool(BufferPool.java:143)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:115)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:85)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$3.allocate(LongBufferPoolTest.java:296)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$3.testOne(LongBufferPoolTest.java:246)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:399)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:379)}}
> {{ [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{ [junit] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
> {{ [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
> {{ [junit] at java.lang.Thread.run(Thread.java:748)}}
>  
> Perhaps the environment is not being set up correctly for the tests.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion

2018-10-17 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653734#comment-16653734
 ] 

Jon Meredith commented on CASSANDRA-14790:
--

[~benedict] I think we're reading it the same way - my argument was that it can 
cause buffers to be allocated from the heap when MEMORY_USAGE_THRESHOLD has not 
been exceeded yet. I'd describe it as a benign race rather than beneficial. The 
calling thread has to pay the price of allocating chunks that other threads 
stole and then an extra allocation which could possibly result in a blocking 
system call to get more memory. Instead allocateMoreChunks could return one of 
the chunks to it's caller and add one less chunks to the queue.

I'm not even sure it's worth changing anything, but [~djoshi3] wanted to see 
what you thought about it.

--8<--
 Here's the example I wrote up before I read your comment more carefully.

Start with no allocations from any of the thread local or buffer pools yet.

CHUNK_SIZE=64 KiB
 MACRO_CHUNK_SIZE = 1024 KiB
 MEMORY_USAGE_THRESHOLD = 16384 KiB (for the unit test)

1) T1 calls BufferPool.get(1) and ends up in GlobalPool:get. chunks.poll 
returns null so it calls allocateMoreChunks which allocates a macro chunk, 
divides it up into 16 (1024KiB / 64KiB) Chunks that are added to 
BufferPool.GlobalPool.chunks.

2) Between the adding the last chunk and the 'one last attempt' to pull it in 
Chunk.get, 16 other calls to GlobalPool::get take place on other threads, 
emptying GlobalPool.chunks

3) T1 returns from allocateMoreChunks, back in Chunk::get chunks.poll() returns 
null and which gets passed up the call chain with the null causing a call to 
BufferPool.allocate which allocates memory outside of the pool, despite the 
current pool memory usage being at ~1MiB, which is less than the usage 
threshold and should have been satisfied by the pool.

As I said, I don't think it's really a big deal as memory allocated outside the 
pool should be freed/garbage collected just fine and the buffer pool is just an 
optimization.

It's also possible for T1, T2 to both arrive in allocateMoreBuffers with 
BufferPool.GlobalPool.chunk
 empty and cause harmless allocation of extra buffers, but it looks like it 
uses atomics
 to make sure the MEMORY_USAGE_THRESHOLD invariant isn't exceeded.

> LongBufferPoolTest burn test fails assertion
> 
>
> Key: CASSANDRA-14790
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14790
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
> Environment: Run under macOS 10.13.6, with patch (attached, but also 
> https://github.com/jonmeredith/cassandra/tree/failing-burn-test)
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Major
>  Labels: pull-request-available
> Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, 
> 0002-Initialize-before-running-LongBufferPoolTest.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The LongBufferPoolTest from the burn tests fails with an assertion error.  I 
> added a build target to run individual burn tests, and \{jasobrown} gave a 
> fix for the uninitialized test setup (attached), however the test now fails 
> on an assertion about recycling buffers.
> To reproduce (with patch applied)
> {{ant burn-testsome 
> -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest 
> -Dtest.methods=testAllocate}}
> Output
> {{    [junit] Testcase: 
> testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}}
> {{    [junit] null}}
> {{    [junit] junit.framework.AssertionFailedError}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}}
> All major branches from 3.0 and later have issues, however the trunk branch 
> also warns about references not being released before the reference is 
> garbage collected.
> {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - 
> LEAK DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was 
> not released before the reference was garbage collected}}
> {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - 
> Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}}
> {{ [junit] Thread[pool-2-thread-24,5,main]}}
> {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}}
> {{ [junit] at 
> 

  1   2   >