[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-19 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910709#comment-16910709
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Switched to the simpler/safer StreamingInboundHandler and clearly documented 
it's limitations.  Also found a possible issue with close() causing the helper 
I/O thread to poll forever.

[CircleCI 
run|https://circleci.com/workflow-run/e1440c87-8d12-479e-9119-8116a0a09152]

 

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-16 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909454#comment-16909454
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I tracked the problem down to the StreamingInboundHandler shutdown code.  I 
couldn't work out why it was failing the dtests, but perhaps the code being 
called in
{{session.messageReceived}} didn't like to be thread interrupted.

I've added a commit to revert the original change, and pushed up a new one that 
re-implements the set tracking the active handlers and switched to tracking 
with a weak reference as you suggested.

Here's a clean(ish) [CircleCI 
run|https://circleci.com/workflow-run/5ed4f520-c378-4760-9e42-c000b5c1946b], 
{{test_simple_repair_order_preserving - repair_tests.repair_test.TestRepair}} 
failed, but there were flaky test comments in there too so not sure how 
reliable a test it is.

If you're happy with the change, please can you squash the two new commits into 
the merge commit when you push to origin.


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-16 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909356#comment-16909356
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I've broken the trunk changes up into smaller commits and the issue is with the 
changes to StreamingInboundHandler which would only affect trunk.

This commit breaks the python dtests 
https://github.com/jonmeredith/cassandra/commit/da07569f65ae0eb248488295d5b0a70a8039ee6a

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-15 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908724#comment-16908724
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

2.2-3.11 look good.  Lots of errors on trunk - unclear whether it's some bad 
cloud whether or an issue with the PR.  Rerunning and will investigate further 
tomorrow, please hold off on merging.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-15 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908274#comment-16908274
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Alright, squashed it all down to a commit against 2.2 and merge commits up 
through against the current origin.  There's a CircleCI commit tacked on the 
end of each which needs to be dropped before merge.

 2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-2.2] | 
[CircleCI|https://circleci.com/workflow-run/db122979-d47a-4d3a-a0e5-0cedef411a46]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-3.0] | 
[CircleCI|https://circleci.com/workflow-run/370d6e4c-6ae8-4167-967a-a88e7b906582]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-3.11] 
| 
[CircleCI|https://circleci.com/workflow-run/370d6e4c-6ae8-4167-967a-a88e7b906582]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/CASSANDRA-15170-trunk] 
| [CircleCI 
J8|https://circleci.com/workflow-run/eb1ad032-7b76-4aa2-a975-4ba6496bb74f] | 
[CircleCI 
J11|https://circleci.com/workflow-run/c8da2408-6caa-474c-894b-e41c029e2139]

Thanks for all the effort reviewing/rereviewing.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908089#comment-16908089
 ] 

Benedict commented on CASSANDRA-15170:
--

Thanks.  Just a couple of tiny nits/questions:

The {{synchronizedList}} wrapper: while its use is probably unnecessary (since 
the global event executor is single threaded), it is probably clearer and more 
future-proof.  But perhaps it makes sense to only wrap the list for the time it 
is submitted as a consumer, e.g. {{synchronzedList(inboundExecutors)::add}}?

Should {{IsolatedExecutor.shutdown]} invoke {{ExecutorUtils.awaitTermination}} 
to report a {{TimeoutException}}?


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-14 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907479#comment-16907479
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I've updated each of the branch for the inconsistent use of 
timeouts/ExecutorUtils you pointed out, and also fixed my own nit of units -> 
unit.

The only thing I noticed while doing that was that trunk was missing a couple 
of synchronized modifiers in {{SharedExecutorPool}} on {{newExecutor}} and 
{{shutdownAndWait}} that had been added in the 3.x and below series.  Paranoia 
made me include them, but I wanted to check with you if you remembered why they 
were added and if there was a reason they were not merged up to trunk?

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-12 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905590#comment-16905590
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Pushed and updated - upgrade tests now work on 3.0, 3.x and trunk
 * fixed internode messaging to serialize/deserialize messages from the 
Cassandra magic number.
 * added missing call to shutdown() to the AbstractCluster.Wrapper.

 

That should be everything now and is good for final review, I have no further 
planned changes.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904092#comment-16904092
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Thanks for explaining the issue with the termination, I've gone through and 
fixed the shutdowns to match trunk and pushed commits.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904026#comment-16904026
 ] 

Benedict commented on CASSANDRA-15170:
--

Thanks, looks good.

bq. The shutdown->shutdownNow change on the InfiniteLoopExecutor was just to 
match trunk naming, there's no functional change.

It was an unrelated point that I noticed, at the places you made the 
modification, we were invoking {{Executor.awaitTermination}} instead of 
{{ExecutorUtils.awaitTermination}}, the latter of which actually reports a 
failure to terminate as an exception.  We probably have other places in which 
we are inconsistent, I simply noticed these places while looking at the changes.

It looks like there's a related bad merge to trunk for 
{{shutdownReferenceReaper}}.  The {{EXEC}} and {{STRONG_LEAK_DETECTOR}} are 
being shutdown/awaited separately?

bq.  Sadly I can't reproduce it now and have lost the heap dump showing an 
example

The problem is that there's no good reason for this to have any impact.  If the 
shutdown task exits, the thread pool will be shutdown, so there's no apparent 
reason to need to depend on the instant core thread timeout.  It's not terribly 
important, since the change should only confer a very minor difference in code 
clarity, but it may cause people to scratch their heads later.  Anyway, it's 
probably not worth further investigation.

bq. The ColumnFamilyStore.shutdownExecutorsAndWait uses the builder so it can 
add the perDiskflushExecutors

My mistake, thanks for clarifying.

bq. We could introduce the shutdown and wait, but would mean 4 variants of it, 
I don't mind it as it is at the moment.

I don't mind terribly either way, either.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904008#comment-16904008
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Thanks for the review, and apologies for the messy history. I've just pushed 
deltas to each branch this time.

Losing the Feature enum was just a misunderstanding of change history - I was 
matching trunk instead of 3.0.  Agree the enum is nicer and have implemented 
that across releases instead.

On the IsolatedExecutor shutdown executor.  When the dtests run successfully 
there isn't a problem with shutting down, it's just when there are issues with 
the instance class loaders not being released there were several cases where 
the nodeX_shutdown thread was still around and also causing a root to the 
instance class loader.  Sadly I can't reproduce it now and have lost the heap 
dump showing an example, but I know that when I was debugging it made the heap 
dump have less active threads in it which made it easier to analyze.

On to the minor items.

* For NanoTimeToCurrentTimeMillis, I've left it alone on 2.2/3.0 as trying to 
minimize change on older versions.  3.11 and above use the 
FastScheduledTaskExecutor for it.

* The ColumnFamilyStore.shutdownExecutorsAndWait uses the builder so it can add 
the perDiskflushExecutors, so I just made the code as similar as possible. 
Happy to change if you'd really like ImmutableList.of.

* The shutdown->shutdownNow change on the InfiniteLoopExecutor was just to 
match trunk naming, there's no functional change.

* We could introduce the shutdown and wait, but would mean 4 variants of it, I 
don't mind it as it is at the moment.


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903932#comment-16903932
 ] 

Benedict commented on CASSANDRA-15170:
--

Thanks.  It's looking good, just a few more minor(ish) items:

* {{NanoTimeToCurrentTimeMillis}}: {{updater}} should be {{final}} to ensure 
visibility, but better still would be to probably use {{InfiniteLoopExecutor}}. 
 I have a number of question marks around the pre-existing code there, though 
(like why it’s using an {{Object}} for waiting for instance), so up to you 
exactly what you do there.
* {{blockingIOThreead}} probably also needs to be {{volatile}}
* I may have lost track of changes here, but any reason why you got rid of the 
{{Feature}} enum in preference for a long?
* {{ColumnFamilyStore.shutdownExecutorsAndWait}}: use {{ImmutableList.of}}?
* Should we introduce {{ExecutorUtils.shutdownAndWait}} since we do it 
repeatedly?
* Some of the places you’ve switched {{shutdown}} to {{shutdownNow]} it might 
also be good to switch to {{ExecutorUtils.awaitTermination}} so that we can be 
informed if we haven’t shutdown/

Could you elaborate on what you saw with the single threaded executor?  It 
shouldn’t have had any impact on how long the threads were around for, and it 
might be indicative of some other problem?

For future reference, it helps for review to update existing branches with new 
commits, without rebasing, instead of creating new ones with a clean commit 
history.  We tend to rebase and clean-up just prior to commit, so that the 
reviewer can easily see what has changed between versions.  It's admittedly 
painful working with multiple branches, for all involved.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-08 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903511#comment-16903511
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Hopefully this is all the feedback addressed.  I've also unified the API 
between all of the active branches so things should be stable again post 
internode messaging overhaul.  I ended up keeping my original IsolatedExecutor 
change to make the special shutdown executors as the 
{{Executors.newSingleThreadExecutor}} threads hang around a lot longer making 
debugging harder.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v6-2.2]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v6-2%2E2]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v6-3.0]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v6-3%2E0]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v6-3.11]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v6-3%2E11]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v6-trunk]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v6-trunk]


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901150#comment-16901150
 ] 

Benedict commented on CASSANDRA-15170:
--

Thanks, the patch looks good overall.  I've reviewed the 4.0 branch for now.

{{StreamingInboundHandler}}:  We probably shouldn’t assume we cannot leak these 
objects, particularly just to enable tests - we should probably store only weak 
references if retaining references in a {{static}} collection.  It might be 
easier overall to instead use a {{ThreadGroup}} here, and simply interrupt all 
of the threads in the group.  In fact, this might be a cleaner way to manage 
all errant threads on a per-node basis.

{{InboundSockets}}: Probably not great to {{awaitTermination}} inside the 
listener that may be executed on the {{GlobalEventExecutor}} - which is single 
threaded, and whose shutdown order we cannot guarantee.  Not actually sure the 
best solution to this problem.  Perhaps accept an {{executor}} parameter to the 
{{close}} method, so that the invoker can consciously guarantee that the 
executor can both handle committing a thread to this and also ensure the 
executor is not shutdown before its completion.  It would be nicer to return a 
{{Future}} that logically wraps {{awaitTermination}} without committing a 
{{Thread}}, but unfortunately there’s seemingly no clean way to do so; it 
should be possible to implement a {{java.util.concurrent.Future}} with only 
{{awaitTermination}}, but not an {{io.netty.util.concurrent.Future}}.  We could 
plausibly have the {{Future}} return a function that accepts a {{nanoTime}} 
deadline that invokes {{awaitTermination}}, which might be cleaner, if still 
far from optimal.
Some nits:
* SSLFactory
** unused imports (4.0)
** isOpenSslAvailable != openSslIsAvailable?
* {{Ref}}:
** Ignoring parameters (looks to be pre-existing, but propagated bug)
** Marginally cleaner to {{awaitTermination}} of both items together? 
Particularly given parameter indicating timeout that we should honour.
* IsolatedExecutor
** unused imports
** unused {{ThreadFactory}} variable


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-08-04 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899690#comment-16899690
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

I've updated the branches and this should be ready to review.  Once you're 
happy with it we can update the commit message and squash the fixup in,
I just didn't have the heart to redo all the merging up again.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v4-2.2]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v4-2%2E2]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v4-3.0]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v4-3%2E0]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v4-3.11]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v4-3%2E11]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v4-trunk]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v4-trunk]

Unit tests / in-jvm-dtest are passing on 2.2-3.11 successfully.  There's a 
failure on trunk for 
{{org.apache.cassandra.net.ConnectionTest.testCloseIfEndpointDown}} which I 
suspect is due to the growth in the powerset of connection options and 
unrelated to the in-jvm changes.

To document the discussion we had off-ticket.

{quote}
Making {{ResourceLeakTest.doTest}} to be configurable, could also later 
automatically loop through all in-jvm dtests and run them a dozen times or so 
to see if leaks are occurring. Perhaps on each loop, we could dump the threads, 
heap utilisation and files, and check they are not growing? That way the test 
can become one that actually fails if leaks are detected, and not produce heap 
dumps etc. unless it is so detected (and perhaps preferably only produce heap 
dumps if no thread leaks are detected)
{quote}

I agree, that would be nice.  I'd rather tackle that as a separate piece of 
work under a new ticket (it may make sense to do at the same time as 
CASSANDRA-15171. It's painful trying to keep all the variations of this in sync 
at the moment.

{quote}
IsolatedExecutor not using NamedThreadFactory
{quote}

I added a comment to explain, but using NamedThreadFactory was obscuring some 
exceptions while debugging as it sometimes called lways called 
FastThreadLocal.removeAll() before it was initialized and crashed (although 
perhaps with moving unloading the classloader it would not be an issue now, I 
can't remember how to reproduce).

{quote}
I'm anyway unclear why we are using `CompletableFuture` here, when we return a 
normal `Future`
{quote}

Good point, fixed up with your suggestion.


> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-23 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890996#comment-16890996
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Here's the latest - had all the updated branches passing on macOS and a 
container based on the CircleCI image.  I need to make one more pass over it 
and would like to move the gossip and streaming shutdown from trunk to 2.2 and 
merge forward.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v3-2.2]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v3-2%2E2]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v3-3.0]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v3-3%2E0]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v3-3.11]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v3-3%2E11]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v3-trunk]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v3-trunk]

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-18 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888194#comment-16888194
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Still searching for issues on trunk.  The current issue I'm hitting is that the 
native libraries used by Netty for epoll support on Linux, and SSL in 
netty-tcnative hold JNI global gcroots to the instance class loader and prevent 
it from being garbage collected. Continuing to investigate.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-14 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884770#comment-16884770
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Thanks for the review - I forgot to remove a WIP commit on trunk that made the 
numClusters unused.

Renaming the log appenders makes it easier to spot left-over logging threads 
when instance class loaders do not shut down correctly.

I've seen similar exceptions too - I think they predate this fix. I reran the 
trunk tests and am still getting issues with metaspace.  I think the internode 
Netty listeners aren't shutting down as expected - it all happens in a future 
and perhaps is not being combined correctly with other futures to either 
execute it or cause shutdown to wait until complete before closing the class 
loader.

I'll investigate further, but I don't think the shutdown hooks come into play 
when using the in-jvm dtests as the node is initialized without calling 
org.apache.cassandra.service.CassandraDaemon#setup.

 

 

I'll ping this ticket when I get to the bottom of the listener threads not 
shutting down.

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-12 Thread Alex Petrov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883755#comment-16883755
 ] 

Alex Petrov commented on CASSANDRA-15170:
-

[~jmeredithco] thank you for the patch. I have several minor nits:

  * {{numClusterNodes}} seems to be unused in {{ResourceLeakTest}}
  * I'm not 100% sure why we need changes to logging to remove instance IDs 
from some log messages and adding {{INSTANCE}} prefix to logger names.
  * We have a [shutdown 
hook|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L641],
 which should be using the instance class loader, but because we're running it 
after the instance class loader is already shut down, we get the exception [1]. 
The error message it throws is unclear, and I would probably override 
{InstanceClassLoader#close} to make it more obvious what's going on: if class 
loader is already closed, we should thrown with a message that it's been 
already shut down. In addition to this, I'd probably avoid adding a JVM 
shutdown hook, and close this explicitly. I think this was existing prior to 
this patch. 
 * On multiple runs, I've also seen the exceptions [2] and [3]. I'm not 
claiming that this patch has caused them.  
 * We're seemingly logging each log message twice right now. I think this is 
also pre-existing, and this can be resolved by using only one of the two 
console appenders.
 
[1]
{code}
java.lang.NoClassDefFoundError: 
org/apache/cassandra/utils/logging/LoggingSupportFactory
at 
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:638)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.utils.logging.LoggingSupportFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at 
org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClassInternal(InstanceClassLoader.java:95)
at 
org.apache.cassandra.distributed.impl.InstanceClassLoader.loadClass(InstanceClassLoader.java:84)
... 4 more
{code}

[2]
{code}
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
down
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:58)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:162)
at 
org.apache.cassandra.db.ColumnFamilyStore.waitForFlushes(ColumnFamilyStore.java:907)
at 
org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:873)
at 
org.apache.cassandra.schema.SchemaKeyspace.lambda$flush$19(SchemaKeyspace.java:348)
at 
com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407)
at 
org.apache.cassandra.schema.SchemaKeyspace.flush(SchemaKeyspace.java:348)
at 
org.apache.cassandra.schema.SchemaKeyspace.applyChanges(SchemaKeyspace.java:1282)
at org.apache.cassandra.schema.Schema.merge(Schema.java:653)
at 
org.apache.cassandra.schema.Schema.mergeAndAnnounceVersion(Schema.java:586)
at 
org.apache.cassandra.schema.MigrationTask.lambda$runMayThrow$0(MigrationTask.java:91)
at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:58)
at 
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:885)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{code}

[3]
{code}
SEVERE: RuntimeException while executing runnable 
org.apache.cassandra.db.ColumnFamilyStore$Flush$1@46975039 with executor 
org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor@7616fb13[Terminated,
 pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 21]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 

[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-07-09 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881425#comment-16881425
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Updated the branches with my latest efforts. [~ifesdjeen] this is ready to take 
a look at when you have some time. 

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15170) Reduce the time needed to release in-JVM dtest cluster resources after close

2019-06-18 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866992#comment-16866992
 ] 

Jon Meredith commented on CASSANDRA-15170:
--

Branches with fixes merged up through the releases.  Anything additional being 
added is a separate commit rather than in the merge commit to make it easier to 
spot.

Duplicating the process id stuff was deliberate from HeapUtils was deliberate, 
but I'd be happy to refactor out the getpid() functionality somewhere common. I 
wanted to minimize the impact on 2.2 and thought duplicating the functions was 
the simplest path.

2.2 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v2-2.2]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v2-2%2E2]
 3.0 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v2-3.0]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v2-3%2E0]
 3.11 | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v2-3.11]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v2-3%2E11]
 trunk | 
[Branch|https://github.com/jonmeredith/cassandra/commits/in-jvm-dtest-fixes-v2-trunk]
 | 
[CircleCI|https://circleci.com/gh/jonmeredith/cassandra/tree/in-jvm-dtest-fixes-v2-trunk]

> Reduce the time needed to release in-JVM dtest cluster resources after close
> 
>
> Key: CASSANDRA-15170
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15170
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> There are a few issues that slow the in-JVM dtests from reclaiming metaspace 
> once the cluster is closed.
> IsolatedExecutor issues the shutdown on a SingleExecutorThreadPool, sometimes 
> this thread was still running 10s after the dtest cluster was closed.  
> Instead, switch to a ThreadPoolExecutor with a core pool size of 0 so that 
> the thread executing the class loader close executes sooner.
> If an OutboundTcpConnection is waiting to connect() and the endpoint is not 
> answering, it has to wait for a timeout before it exits. Instead it should 
> check the isShutdown flag and terminate early if shutdown has been requested.
> In 3.0 and above, HintsCatalog.load uses java.nio.Files.list outside of a 
> try-with-resources construct and leaks a file handle for the directory.  This 
> doesn't matter for normal usage, it leaks a file handle for each dtest 
> Instance created.
> On trunk, Netty global event executor threads are still running and delay GC 
> for the instance class loader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org