[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-13 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475267#comment-17475267
 ] 

Guus der Kinderen commented on DIRMINA-1156:


Ok, thanks!

On Thu, 13 Jan 2022 at 11:58, Emmanuel Lécharny (Jira) 



> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Critical
> Attachments: DIRMINA1156Test.java, 
> OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, 
> UnorderedThreadPoolExecutorTest.java
>
>
> I stumbled upon this while performing analysis of heap dumps of a JVMs that 
> suffered from an issue where no user was able to establish a connection to 
> our server software anymore.
>  
> Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the 
> affected server seems to be in an inconsistent state:
>  * idleWorkers: 2
>  * waitingSessions.count: 2293
>  * workers.map.size: 0
> What strikes me as odd is:
>  * No workers, while there are sessions waiting to be processed
>  * No workers, but a non-zero idle workers count
> Servers that are unaffected by the issue have an idle worker count that is 
> equal to the amount of workers (I assume that the snapshots were taken when 
> that server was not actively processing data).
> We are using Apache MINA 2.1.3. I have no indication that this problem is or 
> is not present in other versions.
> This issue has also been discussed on the Apache MINA users mailinglist, 
> starting with this message: 
> https://www.mail-archive.com/users@mina.apache.org/msg06887.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-13 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475251#comment-17475251
 ] 

Guus der Kinderen commented on DIRMINA-1156:


Is there any idea of when this will make it in a release? I can use that to 
guide my own release plans. :)

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Critical
> Attachments: DIRMINA1156Test.java, 
> OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, 
> UnorderedThreadPoolExecutorTest.java
>
>
> I stumbled upon this while performing analysis of heap dumps of a JVMs that 
> suffered from an issue where no user was able to establish a connection to 
> our server software anymore.
>  
> Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the 
> affected server seems to be in an inconsistent state:
>  * idleWorkers: 2
>  * waitingSessions.count: 2293
>  * workers.map.size: 0
> What strikes me as odd is:
>  * No workers, while there are sessions waiting to be processed
>  * No workers, but a non-zero idle workers count
> Servers that are unaffected by the issue have an idle worker count that is 
> equal to the amount of workers (I assume that the snapshots were taken when 
> that server was not actively processing data).
> We are using Apache MINA 2.1.3. I have no indication that this problem is or 
> is not present in other versions.
> This issue has also been discussed on the Apache MINA users mailinglist, 
> starting with this message: 
> https://www.mail-archive.com/users@mina.apache.org/msg06887.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474959#comment-17474959
 ] 

Guus der Kinderen commented on DIRMINA-1156:


Thank you for the super fast help!

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Critical
> Attachments: DIRMINA1156Test.java, 
> OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, 
> UnorderedThreadPoolExecutorTest.java
>
>
> I stumbled upon this while performing analysis of heap dumps of a JVMs that 
> suffered from an issue where no user was able to establish a connection to 
> our server software anymore.
>  
> Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the 
> affected server seems to be in an inconsistent state:
>  * idleWorkers: 2
>  * waitingSessions.count: 2293
>  * workers.map.size: 0
> What strikes me as odd is:
>  * No workers, while there are sessions waiting to be processed
>  * No workers, but a non-zero idle workers count
> Servers that are unaffected by the issue have an idle worker count that is 
> equal to the amount of workers (I assume that the snapshots were taken when 
> that server was not actively processing data).
> We are using Apache MINA 2.1.3. I have no indication that this problem is or 
> is not present in other versions.
> This issue has also been discussed on the Apache MINA users mailinglist, 
> starting with this message: 
> https://www.mail-archive.com/users@mina.apache.org/msg06887.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474951#comment-17474951
 ] 

Guus der Kinderen commented on DIRMINA-1156:


Thank you. My last comment was a mistake. Please ignore it.

 

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Critical
> Attachments: DIRMINA1156Test.java, 
> OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, 
> UnorderedThreadPoolExecutorTest.java
>
>
> I stumbled upon this while performing analysis of heap dumps of a JVMs that 
> suffered from an issue where no user was able to establish a connection to 
> our server software anymore.
>  
> Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the 
> affected server seems to be in an inconsistent state:
>  * idleWorkers: 2
>  * waitingSessions.count: 2293
>  * workers.map.size: 0
> What strikes me as odd is:
>  * No workers, while there are sessions waiting to be processed
>  * No workers, but a non-zero idle workers count
> Servers that are unaffected by the issue have an idle worker count that is 
> equal to the amount of workers (I assume that the snapshots were taken when 
> that server was not actively processing data).
> We are using Apache MINA 2.1.3. I have no indication that this problem is or 
> is not present in other versions.
> This issue has also been discussed on the Apache MINA users mailinglist, 
> starting with this message: 
> https://www.mail-archive.com/users@mina.apache.org/msg06887.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Comment Edited] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686
 ] 

Guus der Kinderen edited comment on DIRMINA-1156 at 1/12/22, 5:24 PM:
--

Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:
{quote}I've asked a user that suffers from the problem to run with a slightly 
modified version of MINA. The modification logs (and rethrows) any Throwable in 
Worker's run(). This caused an Error to be logged! The Error was thrown by an 
IoHandler implementation. Full details are available at 
[https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738]
{quote}
Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
{{{}PriorityThreadPoolExecutorTest.java{}}}, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java] 
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the {{run()}} method of the 
{{Worker}} class in each {{ThreadPoolExecutor}} implementation is susceptible 
to this problem, but uses an implementation that bypasses some of the usage 
patterns that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a {{Session}} that has a {{Handler}} that throws an {{Error}} 
(I've included tests that throw checked and unchecked \{{Exception}}s, but 
those do not trigger the bug).

Removing the 'finally' block that is inside the {{for (;;)}} block of Worker, 
and making execution of {{idleWorkers.incrementAndGet();}} happen only when no 
{{Throwable}} is thrown, will make all these tests go green. In other words, 
replace:
{code:java}
try {
if (session != null) {
runTasks(getSessionTasksQueue(session)); }
} finally {
idleWorkers.incrementAndGet();
}
{code}
with this:
{code:java}
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
{code}
(a slightly different implementation is needed for PriorityThreadPoolExecutor).


was (Author: guus.der.kinderen):
Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:

bq. I've asked a user that suffers from the problem to run with a slightly 
modified version of MINA. The modification logs (and rethrows) any Throwable in 
Worker's run(). This caused an Error to be logged! The Error was thrown by an 
IoHandler implementation. Full details are available at 
[https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738]

Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
{{PriorityThreadPoolExecutorTest.java}}, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java] 
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the {{run()}} method of the 
{{Worker}} class in each {{ThreadPoolExecutor}} implementation is susceptible 
to this problem, but uses an implementation that bypasses some of the usage 
patterns that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a {{Session}} that has a {{Handler}} that throws an {{Error}} 
(I've included tests that throw checked and unchecked {{Exception}}s, but those 
do not trigger the bug).

Removing the 'finally' block that is inside the {{for (;;)}} block of Worker, 
and making execution of {{idleWorkers.incrementAndGet();}} happen only when no 
{{Throwable}} is thrown, will make all these tests go green. In other words, 
replace:

{code:java}
try {
if (session != null) {
runTasks(getSessionTasksQueue(session)); }
} finally {
idleWorkers.incrementAndGet();
}
{code}

with this:

{code:java}
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
{code}

(a slightly different implementation is needed for PriorityThreadPoolExecutor).

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>

[jira] [Comment Edited] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686
 ] 

Guus der Kinderen edited comment on DIRMINA-1156 at 1/12/22, 5:23 PM:
--

Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:

bq. I've asked a user that suffers from the problem to run with a slightly 
modified version of MINA. The modification logs (and rethrows) any Throwable in 
Worker's run(). This caused an Error to be logged! The Error was thrown by an 
IoHandler implementation. Full details are available at 
[https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738]

Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
{{PriorityThreadPoolExecutorTest.java}}, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java] 
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the {{run()}} method of the 
{{Worker}} class in each {{ThreadPoolExecutor}} implementation is susceptible 
to this problem, but uses an implementation that bypasses some of the usage 
patterns that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a {{Session}} that has a {{Handler}} that throws an {{Error}} 
(I've included tests that throw checked and unchecked {{Exception}}s, but those 
do not trigger the bug).

Removing the 'finally' block that is inside the {{for (;;)}} block of Worker, 
and making execution of {{idleWorkers.incrementAndGet();}} happen only when no 
{{Throwable}} is thrown, will make all these tests go green. In other words, 
replace:

{code:java}
try {
if (session != null) {
runTasks(getSessionTasksQueue(session)); }
} finally {
idleWorkers.incrementAndGet();
}
{code}

with this:

{code:java}
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
{code}

(a slightly different implementation is needed for PriorityThreadPoolExecutor).


was (Author: guus.der.kinderen):
Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:

bq. I've asked a user that suffers from the problem to run with a slightly 
modified version of MINA. The modification logs (and rethrows) any Throwable in 
Worker's run(). This caused an Error to be logged! The Error was thrown by an 
IoHandler implementation. Full details are available at 
[https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738]

Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
PriorityThreadPoolExecutorTest.java, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java] 
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the 
`Worker` class in each ThreadPoolExecutor implementation is susceptible to this 
problem, but uses an implementation that bypasses some of the usage patterns 
that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a Session that has a Handler that throws an Error (I've 
included tests that throw checked and unchecked exceptions, but those do not 
trigger the bug).

Removing the 'finally' block that is inside the for (;;) block of Worker, and 
making execution of idleWorkers.incrementAndGet(); happen only when no 
Throwable is thrown, will make all these tests go green. In other words, 
replace:


{code:java}
try {
if (session != null) {
runTasks(getSessionTasksQueue(session)); }
} finally {
idleWorkers.incrementAndGet();
}
{code}

with this:

{code:java}
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
{code}

(a slightly different implementation is needed for PriorityThreadPoolExecutor).

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: 

[jira] [Comment Edited] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686
 ] 

Guus der Kinderen edited comment on DIRMINA-1156 at 1/12/22, 5:20 PM:
--

Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:

bq. I've asked a user that suffers from the problem to run with a slightly 
modified version of MINA. The modification logs (and rethrows) any Throwable in 
Worker's run(). This caused an Error to be logged! The Error was thrown by an 
IoHandler implementation. Full details are available at 
[https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738]

Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
PriorityThreadPoolExecutorTest.java, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java] 
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the 
`Worker` class in each ThreadPoolExecutor implementation is susceptible to this 
problem, but uses an implementation that bypasses some of the usage patterns 
that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a Session that has a Handler that throws an Error (I've 
included tests that throw checked and unchecked exceptions, but those do not 
trigger the bug).

Removing the 'finally' block that is inside the for (;;) block of Worker, and 
making execution of idleWorkers.incrementAndGet(); happen only when no 
Throwable is thrown, will make all these tests go green. In other words, 
replace:


{code:java}
try {
if (session != null) {
runTasks(getSessionTasksQueue(session)); }
} finally {
idleWorkers.incrementAndGet();
}
{code}

with this:

{code:java}
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
{code}

(a slightly different implementation is needed for PriorityThreadPoolExecutor).


was (Author: guus.der.kinderen):
Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:

> I've asked a user that suffers from the problem to run with a slightly 
> modified version of MINA. The modification logs (and rethrows) any Throwable 
> in Worker's run(). This caused an Error to be logged! The Error was thrown by 
> an IoHandler implementation. Full details are available at 
> https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738

Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
PriorityThreadPoolExecutorTest.java, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java]  
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the 
`Worker` class in each ThreadPoolExecutor implementation is susceptible to this 
problem, but uses an implementation that bypasses some of the usage patterns 
that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a Session that has a Handler that throws an Error (I've 
included tests that throw checked and unchecked exceptions, but those do not 
trigger the bug). 

Removing the 'finally' block that is inside the for (;;) block of Worker, and 
making execution of idleWorkers.incrementAndGet(); happen only when no 
Throwable is thrown, will make all these tests go green. In other words, 
replace:

```
try {
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
} finally {
idleWorkers.incrementAndGet();
}
```

with this:

```
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
```

(a slightly different implementation is needed for PriorityThreadPoolExecutor).

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Critical
>

[jira] [Comment Edited] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686
 ] 

Guus der Kinderen edited comment on DIRMINA-1156 at 1/12/22, 4:39 PM:
--

Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:

> I've asked a user that suffers from the problem to run with a slightly 
> modified version of MINA. The modification logs (and rethrows) any Throwable 
> in Worker's run(). This caused an Error to be logged! The Error was thrown by 
> an IoHandler implementation. Full details are available at 
> https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738

Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
PriorityThreadPoolExecutorTest.java, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java]  
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the 
`Worker` class in each ThreadPoolExecutor implementation is susceptible to this 
problem, but uses an implementation that bypasses some of the usage patterns 
that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a Session that has a Handler that throws an Error (I've 
included tests that throw checked and unchecked exceptions, but those do not 
trigger the bug). 

Removing the 'finally' block that is inside the for (;;) block of Worker, and 
making execution of idleWorkers.incrementAndGet(); happen only when no 
Throwable is thrown, will make all these tests go green. In other words, 
replace:

```
try {
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
} finally {
idleWorkers.incrementAndGet();
}
```

with this:

```
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
```

(a slightly different implementation is needed for PriorityThreadPoolExecutor).


was (Author: guus.der.kinderen):
Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:

> I've asked a user that suffers from the problem to run with a slightly 
> modified version of MINA. The modification logs (and rethrows) any Throwable 
> in Worker's run(). This caused an Error to be logged! The Error was thrown by 
> an IoHandler implementation. Full details are available at 
> https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738

Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
PriorityThreadPoolExecutorTest.java, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java]  
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the 
`Worker` class in each ThreadPoolExecutor implementation is susceptible to this 
problem, but uses an implementation that bypasses some of the usage patterns 
that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a Session that has a Handler that throws an Error (I've 
included tests that throw checked and unchecked exceptions, but those do not 
trigger the bug). 

Removing the 'finally' block that is inside the for (;;) block of Worker, and 
making execution of idleWorkers.incrementAndGet(); happen only when no 
Throwable is thrown, will make all these tests go green. In other words, 
replace:

```java
try {
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
} finally {
idleWorkers.incrementAndGet();
}
```

with this:

```java
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
```

(a slightly different implementation is needed for PriorityThreadPoolExecutor).

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
> 

[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686
 ] 

Guus der Kinderen commented on DIRMINA-1156:


Since your last comment in this issue, we've followed up in the mailing list. 
To summarize:

> I've asked a user that suffers from the problem to run with a slightly 
> modified version of MINA. The modification logs (and rethrows) any Throwable 
> in Worker's run(). This caused an Error to be logged! The Error was thrown by 
> an IoHandler implementation. Full details are available at 
> https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738

Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered 
one!

I have written various unit tests to illustrate the issue, which are attached 
to this issue (note that these are all new classes, except for 
PriorityThreadPoolExecutorTest.java, which expands on a pre-existing 
implementation).

The tests in [^PriorityThreadPoolExecutorTest.java]  
[^OrderedThreadPoolExecutorTest.java] and 
[^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the 
`Worker` class in each ThreadPoolExecutor implementation is susceptible to this 
problem, but uses an implementation that bypasses some of the usage patterns 
that would normally prevent this issue. These tests prove that the 
implementation is weak, without immediately identifying an exploit of that 
weakness.

The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of 
the issue: having a Session that has a Handler that throws an Error (I've 
included tests that throw checked and unchecked exceptions, but those do not 
trigger the bug). 

Removing the 'finally' block that is inside the for (;;) block of Worker, and 
making execution of idleWorkers.incrementAndGet(); happen only when no 
Throwable is thrown, will make all these tests go green. In other words, 
replace:

```java
try {
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
} finally {
idleWorkers.incrementAndGet();
}
```

with this:

```java
if (session != null) {
runTasks(getSessionTasksQueue(session));
}
idleWorkers.incrementAndGet();
```

(a slightly different implementation is needed for PriorityThreadPoolExecutor).

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Critical
> Attachments: DIRMINA1156Test.java, 
> OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, 
> UnorderedThreadPoolExecutorTest.java
>
>
> I stumbled upon this while performing analysis of heap dumps of a JVMs that 
> suffered from an issue where no user was able to establish a connection to 
> our server software anymore.
>  
> Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the 
> affected server seems to be in an inconsistent state:
>  * idleWorkers: 2
>  * waitingSessions.count: 2293
>  * workers.map.size: 0
> What strikes me as odd is:
>  * No workers, while there are sessions waiting to be processed
>  * No workers, but a non-zero idle workers count
> Servers that are unaffected by the issue have an idle worker count that is 
> equal to the amount of workers (I assume that the snapshots were taken when 
> that server was not actively processing data).
> We are using Apache MINA 2.1.3. I have no indication that this problem is or 
> is not present in other versions.
> This issue has also been discussed on the Apache MINA users mailinglist, 
> starting with this message: 
> https://www.mail-archive.com/users@mina.apache.org/msg06887.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Updated] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-1156:
---
Attachment: DIRMINA1156Test.java

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Critical
> Attachments: DIRMINA1156Test.java, 
> OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, 
> UnorderedThreadPoolExecutorTest.java
>
>
> I stumbled upon this while performing analysis of heap dumps of a JVMs that 
> suffered from an issue where no user was able to establish a connection to 
> our server software anymore.
>  
> Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the 
> affected server seems to be in an inconsistent state:
>  * idleWorkers: 2
>  * waitingSessions.count: 2293
>  * workers.map.size: 0
> What strikes me as odd is:
>  * No workers, while there are sessions waiting to be processed
>  * No workers, but a non-zero idle workers count
> Servers that are unaffected by the issue have an idle worker count that is 
> equal to the amount of workers (I assume that the snapshots were taken when 
> that server was not actively processing data).
> We are using Apache MINA 2.1.3. I have no indication that this problem is or 
> is not present in other versions.
> This issue has also been discussed on the Apache MINA users mailinglist, 
> starting with this message: 
> https://www.mail-archive.com/users@mina.apache.org/msg06887.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Updated] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-1156:
---
Attachment: PriorityThreadPoolExecutorTest.java
OrderedThreadPoolExecutorTest.java
UnorderedThreadPoolExecutorTest.java

> Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
> -
>
> Key: DIRMINA-1156
> URL: https://issues.apache.org/jira/browse/DIRMINA-1156
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Critical
> Attachments: OrderedThreadPoolExecutorTest.java, 
> PriorityThreadPoolExecutorTest.java, UnorderedThreadPoolExecutorTest.java
>
>
> I stumbled upon this while performing analysis of heap dumps of a JVMs that 
> suffered from an issue where no user was able to establish a connection to 
> our server software anymore.
>  
> Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the 
> affected server seems to be in an inconsistent state:
>  * idleWorkers: 2
>  * waitingSessions.count: 2293
>  * workers.map.size: 0
> What strikes me as odd is:
>  * No workers, while there are sessions waiting to be processed
>  * No workers, but a non-zero idle workers count
> Servers that are unaffected by the issue have an idle worker count that is 
> equal to the amount of workers (I assume that the snapshots were taken when 
> that server was not actively processing data).
> We are using Apache MINA 2.1.3. I have no indication that this problem is or 
> is not present in other versions.
> This issue has also been discussed on the Apache MINA users mailinglist, 
> starting with this message: 
> https://www.mail-archive.com/users@mina.apache.org/msg06887.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Created] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor

2022-01-12 Thread Guus der Kinderen (Jira)
Guus der Kinderen created DIRMINA-1156:
--

 Summary: Inconsistent worker / idleWorker in 
OrderedThreadPoolExecutor
 Key: DIRMINA-1156
 URL: https://issues.apache.org/jira/browse/DIRMINA-1156
 Project: MINA
  Issue Type: Bug
Reporter: Guus der Kinderen


I stumbled upon this while performing analysis of heap dumps of a JVMs that 
suffered from an issue where no user was able to establish a connection to our 
server software anymore.
 
Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the 
affected server seems to be in an inconsistent state:
 * idleWorkers: 2
 * waitingSessions.count: 2293
 * workers.map.size: 0

What strikes me as odd is:
 * No workers, while there are sessions waiting to be processed
 * No workers, but a non-zero idle workers count

Servers that are unaffected by the issue have an idle worker count that is 
equal to the amount of workers (I assume that the snapshots were taken when 
that server was not actively processing data).

We are using Apache MINA 2.1.3. I have no indication that this problem is or is 
not present in other versions.

This issue has also been discussed on the Apache MINA users mailinglist, 
starting with this message: 
https://www.mail-archive.com/users@mina.apache.org/msg06887.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org
For additional commands, e-mail: dev-h...@mina.apache.org



[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-24 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848040#comment-16848040
 ] 

Guus der Kinderen commented on DIRMINA-:


The fix seems to have a positive effect in two different environments. I'd 
consider this one resolved!

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844880#comment-16844880
 ] 

Guus der Kinderen commented on DIRMINA-:


Maybe bike-shidding, but wouldn't {{clear}} potentially cause some bytes to be 
'uncounted', even if they were processed? Even if you're not using {{mark}} 
now, at some point in the future, that might be introduced again. Checking for 
a mark, and repositioning to the mark if present, otherwise to 0, seems like a 
desirable defensive construct. 

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844880#comment-16844880
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 2:17 PM:
-

Maybe bike-shedding, but wouldn't {{clear}} potentially cause some bytes to be 
'uncounted', even if they were processed? Even if you're not using {{mark}} 
now, at some point in the future, that might be introduced again. Checking for 
a mark, and repositioning to the mark if present, otherwise to 0, seems like a 
desirable defensive construct. 


was (Author: guus.der.kinderen):
Maybe bike-shidding, but wouldn't {{clear}} potentially cause some bytes to be 
'uncounted', even if they were processed? Even if you're not using {{mark}} 
now, at some point in the future, that might be introduced again. Checking for 
a mark, and repositioning to the mark if present, otherwise to 0, seems like a 
desirable defensive construct. 

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844834#comment-16844834
 ] 

Guus der Kinderen commented on DIRMINA-:


Thanks for both your efforts on this!

I still don't quite understand the fix, but am happy to accept it. {{Clear}}, 
unlike {{reset}} will prevent the {{InvalidMarkException}}.

I'm not confident that using {{slice}} instead if {{duplicate}} sufficiently 
prevents the buffer from being in an unpredictable state in 
{{clearWriteRequestQueue}}. Isn't it possible that after construction (slicing) 
of the WriteRequest and during the processing of that instance an exception 
occurs that causes it to be handled by {{clearWriteRequestQueue}}? We'd still 
be in potential trouble there.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844812#comment-16844812
 ] 

Guus der Kinderen commented on DIRMINA-:


I don't quite understand your suggestion. Could you illustrate it with a code 
snippet please?

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844793#comment-16844793
 ] 

Guus der Kinderen commented on DIRMINA-:


{{rewind()}} looks like a simpler solution than resetting-if-marked, but 
there's a small difference in that the position would always revert to 0, 
instead of the last marked position. I'm not sure if that makes any difference 
to code that depends on this.

In any case, preventing the {{InvalidMarkException}} would be achieved by both, 
and that is what seems to prevent the CPU-churning. I'd be happy with either 
solution.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844783#comment-16844783
 ] 

Guus der Kinderen commented on DIRMINA-:


That would change the behavior - instead of rewinding the buffer to be re-read, 
it would effectively discard information.

I'm not sure why we're resetting in the first place here, but I'm guessing it 
serves a purpose (to later process the remainder of the buffer?) With a reset, 
the data will likely be lost (although it's not effectively wiped from the 
buffer, recovering it would be harder).

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:05 PM:
--

Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=Proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() > -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=Proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug 

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:04 PM:
--

Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=Proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug 

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:03 PM:
--

Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We have not 
investigated why this resolves the CPU spikes, but we assume that the thrown 
exception prevents keys from being consumed.

We're still suffering from a different issue (all clients unexpectedly 
reconnect periodically), but we're now working on the assumption that the CPU 
spike is a result, not the cause, of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:java|title=proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We're still suffering 
from a different issue (all clients reconnecting periodically), but we're now 
working on the assumption that the CPU spike is a result, not the cause of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:title=proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue|java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> 

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762
 ] 

Guus der Kinderen commented on DIRMINA-:


Although we're not exactly sure how, we've "resolved" the issue by modifying 
the snippet below:
{code:java|title=Original code for 
org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}
As said, this triggered {{InvalidMarkExceptions}}. After we added a simple 
{{try/catch}} around the reset, the CPU spikes go away. We're still suffering 
from a different issue (all clients reconnecting periodically), but we're now 
working on the assumption that the CPU spike is a result, not the cause of this.

Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that 
this is often called from exception handlers. With that, the state of the 
buffer that it's operating on, is likely unpredictable. The call to {{reset()}} 
assumes that a mark is set, but there are various scenario's where this is 
untrue:
 * the buffer could have been completely unused (a buffer fresh from the 
constructor will cause {{InvalidMarkException}} here
 * the buffer might have been flipped, but not yet read.

We're uncertain if the reset is needed, but if it is, we suggest to explicitly 
check if the mark has been set. If not, then we don't believe a reset is needed.
{code:title=proposed fix for 
org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue|java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
if (buf.markValue() >= -1) {
buf.reset();
}
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 10:01 AM:
--

The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{"Broken connection"}} or {{"Create a new selector. 
>Selected is 0, delta ="}} being logged, which suggests that the workaround for 
>the EPOLL / spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!


was (Author: guus.der.kinderen):
The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -

[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844690#comment-16844690
 ] 

Guus der Kinderen commented on DIRMINA-:


{quote}Do you have a link to the OpenFire issue ?{quote}

No, this is being reported in a non-public project that uses Openfire.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687
 ] 

Guus der Kinderen commented on DIRMINA-:


The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 9:58 AM:
-

The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!


was (Author: guus.der.kinderen):
The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png|thumbnail!

> 100% CPU (epoll bug) on 2.1.x, Linux only
> 

[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 9:58 AM:
-

The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png|thumbnail!


was (Author: guus.der.kinderen):
The CPU cycles that get burned are reported to be "system", not "user" cycles.

We've obtained a Call Tree graph from JProfiler, from which I included a 
screenshot (the entire thing is 18MB). Just under the section from which I 
created this screenshot, there's another 10.2% native CPU usage for 
{{java.lang.System.currentTimeMillis}} that's called from within  
{{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}}

>From what I can deduce, this looks like it iterates over this snippet from a 
>lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected 
>is 0, delta =}} being logged, which suggests that the workaround for the EPOLL 
>/ spinning selector bug is not even being considered.

{code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java}
/*(...snip..)*/
// This select has a timeout so that we can manage
// idle session when we get out of the select every
// second. (note : this is a hack to avoid creating
// a dedicated thread).
long t0 = System.currentTimeMillis();
int selected = select(SELECT_TIMEOUT);
long t1 = System.currentTimeMillis();
long delta = t1 - t0;

if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) {
// Last chance : the select() may have been
// interrupted because we have had an closed channel.
if (isBrokenConnection()) {
LOG.warn("Broken connection");
} else {
// Ok, we are hit by the nasty epoll
// spinning.
// Basically, there is a race condition
// which causes a closing file descriptor not to be
// considered as available as a selected channel,
// but
// it stopped the select. The next time we will
// call select(), it will exit immediately for the
// same
// reason, and do so forever, consuming 100%
// CPU.
// We have to destroy the selector, and
// register all the socket on a new one.
if (nbTries == 0) {
LOG.warn("Create a new selector. Selected is 0, delta = " + delta);
registerNewSelector();
nbTries = 10;
} else {
nbTries--;
}
}
} else {
nbTries = 10;
}
/*(...snip..)*/
{code}

!image-2019-05-21-11-37-41-931.png!

> 100% CPU (epoll bug) on 2.1.x, Linux only
> 

[jira] [Updated] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-:
---
Attachment: image-2019-05-21-11-37-41-931.png

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: image-2019-05-21-11-37-41-931.png
>
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 8:50 AM:
-

The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

Specifically the {{InvalidMarkException}} is thrown when {{reset}} is called in 
this snippet:
{code:java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

Specifically, when {{reset}} is called in this snippet:
{code:java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639
 ] 

Guus der Kinderen edited comment on DIRMINA- at 5/21/19 8:49 AM:
-

The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

Specifically, when {{reset}} is called in this snippet:
{code:java}
// The first unwritten empty buffer must be
// forwarded to the filter chain.
if (buf.hasRemaining()) {
buf.reset();
failedRequests.add(req);
} else {
IoFilterChain filterChain = session.getFilterChain();
filterChain.fireMessageSent(req);
}{code}


was (Author: guus.der.kinderen):
The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639
 ] 

Guus der Kinderen commented on DIRMINA-:


The report claims that the CPU increase is combined with InvalidMarkException 
instances being thrown by MINA 2.1.2's implementation of 
{{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}}

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844614#comment-16844614
 ] 

Guus der Kinderen commented on DIRMINA-:


For what it's worth, I've received another report (unrelated to the first one) 
that appears to be related to this issue. This new environment recently 
upgraded to MINA 2.1.2 too. I'm working on obtaining the profiling information.

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844606#comment-16844606
 ] 

Guus der Kinderen commented on DIRMINA-1107:


We've replaced our initial fix with what Jonathan provided. His fix appears to 
prevent the original issue from occurring. We're happy to close this issue with 
that fix! Thanks!

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only

2019-05-17 Thread Guus der Kinderen (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-:
---
Summary: 100% CPU (epoll bug) on 2.1.x, Linux only  (was: 100% CPU (epoll 
bug) on 2.1.x)

> 100% CPU (epoll bug) on 2.1.x, Linux only
> -
>
> Key: DIRMINA-
> URL: https://issues.apache.org/jira/browse/DIRMINA-
> Project: MINA
>  Issue Type: Improvement
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
>
> We're getting reports 
> [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
>  of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
> Windows). 
> This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.
> Is this a regression of the epoll selector bug in DIRMINA-678 ?
> A stack trace of one of the spinning threads:
> {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
> runnable [0x7f0476dee000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0004c486b748> (a sun.nio.ch.Util$3)
>   - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
>   - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
>   at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
>   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
>   - <0x0004c4d03530> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker){code}
> More info is available at 
> https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x

2019-05-17 Thread Guus der Kinderen (JIRA)
Guus der Kinderen created DIRMINA-:
--

 Summary: 100% CPU (epoll bug) on 2.1.x
 Key: DIRMINA-
 URL: https://issues.apache.org/jira/browse/DIRMINA-
 Project: MINA
  Issue Type: Improvement
Affects Versions: 2.1.2
Reporter: Guus der Kinderen


We're getting reports 
[reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13]
 of a bug that causes 100% CPU usage on Linux (the problem does not occur on 
Windows). 

This problem occurs in 2.1.2. but does _not_ occur in 2.0.21.

Is this a regression of the epoll selector bug in DIRMINA-678 ?

A stack trace of one of the spinning threads:

{code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a 
runnable [0x7f0476dee000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0004c486b748> (a sun.nio.ch.Util$3)
- locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet)
- locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
at 
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004c4d03530> (a 
java.util.concurrent.ThreadPoolExecutor$Worker){code}

More info is available at 
https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-16 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841103#comment-16841103
 ] 

Guus der Kinderen commented on DIRMINA-1107:


\o/

For what it's worth: since I wrote that comment, we have successfully applied 
the solution that I've outlined in 
https://issues.apache.org/jira/browse/DIRMINA-1107?focusedCommentId=16838709=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16838709
 

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-16 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841084#comment-16841084
 ] 

Guus der Kinderen commented on DIRMINA-1107:


I'd love to see a solution for the 2.1 branch, even if that's a patch of the 
existing, sub-par, implementation. We're currently suffering from issues (which 
prompted me to create this issue), which we'd like to fix before/without 
upgrading to a new major release of MINA.

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DIRMINA-1110) Ordered Executors concurrency

2019-05-15 Thread Guus der Kinderen (JIRA)
Guus der Kinderen created DIRMINA-1110:
--

 Summary: Ordered Executors concurrency
 Key: DIRMINA-1110
 URL: https://issues.apache.org/jira/browse/DIRMINA-1110
 Project: MINA
  Issue Type: Improvement
Affects Versions: 2.1.2
Reporter: Guus der Kinderen
 Attachments: mina-ordered-executors.patch

MINA contains two ThreadPoolExecutor implementations that maintains the order 
of IoEvents per session:
* OrderedThreadPoolExecutor
* PriorityThreadPoolExecutor

This issue applies to both.

A private class called {{SessionTasksQueue}} (confusingly using different names 
in both implementations, which is an undesired side-effect having code 
duplication) is used to represent the ordered collection of events to be 
processed for each individual session. It also contains a boolean that 
represents the state of the queue prior to addition of the task: 'true' if the 
collection was empty ("processing complete"), otherwise 'false'. This queue is 
stored as an attribute on the session.

An {{IoEvent}} is _scheduled_ for execution by being passed to the {{execute}} 
method. "Scheduling" an {{IoEvent}} involves identifying the session that it 
belongs to, and placing it in its SessionTaskQueue. Finally, the session itself 
is, conditionally (more in this later), placed in a queue named 
{{waitingSessions}}.

The {{IoEvent}} execution itself is implemented by {{Runnable}} implementations 
called {{Worker}}. These workers take their workload from the 
{{waitingSessions}} queue, doing blocking polls.

The placing of the Session on the {{waitingSessions}} queue depends on the 
state of the {{SessionTasksQueue}}. If it was empty ("processingComplete"), the 
session is placed on the {{waitingSessions}} queue. There is comment that 
describes this as follows:

{quote}As the tasksQueue was empty, the task has been executed immediately, so 
we can move the session to the queue of sessions waiting for completion.{quote}

As an aside: I believe this comment to be misleading: there's no guarantee that 
the task has actually been executed immediately, as workers might still be 
working on other threads. On top of that, as both the production on, and 
consumption of that queue is guarded by the same mutex, I think it's quite 
impossible that the task has already been executed. A more proper comment would 
be:

{quote}Processing of the tasks queue of this session is currently not scheduled 
or underway. As new tasks have now been added, the session needs to be offered 
for processing.{quote}

The determination if the session needs to be offered up for processing is based 
on an evaluation of the state of the {{sessionTasksQueue}} that happens under a 
mutex. The actual offer of the session for processing (adding the session to 
the {{waitingSessions}} queue, is not. We believe, but have not been able to 
conclusively prove, that this can lead to concurrency issues, where a session 
might not be queued, even though it has tasks to be executed. Nonetheless, we 
suggest moving the addition of the session to  {{waitingSessions}} into the 
guarded code block. At the very least, this reduces code complexity.

The patch attached to this issue applies this change. It also changes the name 
of the Session tasks queue in {{PriorityThreadPoolExecutor}}, to match the name 
used in {{OrderedThreadPoolExecutor}}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1108) ThreadPoolExecutors should terminate faster

2019-05-15 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840436#comment-16840436
 ] 

Guus der Kinderen commented on DIRMINA-1108:


{{shutdownNow}} would attempt to stop all executing tasks (at least, that's 
what its contract says - unsure if that's actually implemented in the MINA 
implementation that way). That's not what I want for the particular use-case I 
had in mind.

Ideally, termination should be faster, which would be a minor improvement.

> ThreadPoolExecutors should terminate faster
> ---
>
> Key: DIRMINA-1108
> URL: https://issues.apache.org/jira/browse/DIRMINA-1108
> Project: MINA
>  Issue Type: Improvement
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Minor
>
> I've observed that the termination of 
> {{org.apache.mina.filter.executor.PriorityThreadPoolExecutor}} takes another 
> 30 seconds after the last task was executed. More specifically, it takes 30 
> seconds "to long" before 
> {{org.apache.mina.filter.executor.PriorityThreadPoolExecutor#awaitTermination}}
>  returns {{true}}.
> It is likely that this issue also applies to 
> {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}}, and maybe 
> {{org.apache.mina.filter.executor.UnorderedThreadPoolExecutor}}, as a lot of 
> code is shared between these implementations. I did not explicitly verify 
> these implementations though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DIRMINA-1109) Improve PriorityThreadPoolExecutor unit test

2019-05-15 Thread Guus der Kinderen (JIRA)
Guus der Kinderen created DIRMINA-1109:
--

 Summary: Improve PriorityThreadPoolExecutor unit test
 Key: DIRMINA-1109
 URL: https://issues.apache.org/jira/browse/DIRMINA-1109
 Project: MINA
  Issue Type: Improvement
Reporter: Guus der Kinderen
 Attachments: PriorityThreadPoolExecutorTest.java

The original test was authored by me.

The test should be improved to:
 * include a non-single-thread executor thread pool.
 * take into account that once a task queue for a session has already been 
picked up by a worker thread, any 'prioritized' session tasks will still have 
to wait for the work that's already underway to be finished (the original code 
does not allow for this, hence it only works with a single thread).
 * should take considerably less time to execute (currently runs for ~34 
seconds - see DIRMINA-1108)
 * be self-validating.

Also, much of the formatting appears to have been lost when my original code 
was merged. That could be improved upon too.

As the formatting changes changed ~80% of all lines, I've not bothered to 
create a diff of my work. Instead, I'm supplying the java file in its entirety 
in an attachment to this JIRA issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DIRMINA-1108) ThreadPoolExecutors should terminate faster

2019-05-15 Thread Guus der Kinderen (JIRA)
Guus der Kinderen created DIRMINA-1108:
--

 Summary: ThreadPoolExecutors should terminate faster
 Key: DIRMINA-1108
 URL: https://issues.apache.org/jira/browse/DIRMINA-1108
 Project: MINA
  Issue Type: Improvement
Affects Versions: 2.1.2
Reporter: Guus der Kinderen


I've observed that the termination of 
{{org.apache.mina.filter.executor.PriorityThreadPoolExecutor}} takes another 30 
seconds after the last task was executed. More specifically, it takes 30 
seconds "to long" before 
{{org.apache.mina.filter.executor.PriorityThreadPoolExecutor#awaitTermination}} 
returns {{true}}.

It is likely that this issue also applies to 
{{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}}, and maybe 
{{org.apache.mina.filter.executor.UnorderedThreadPoolExecutor}}, as a lot of 
code is shared between these implementations. I did not explicitly verify these 
implementations though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-14 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839151#comment-16839151
 ] 

Guus der Kinderen commented on DIRMINA-1107:


[~elecharny] wrote
{quote} The flushScheduleEvents() method, which is called by the startTls, 
stopTls}}, messageReceived, filterWrite, filterClose and initiateHandshake 
methods, should first propagate the received messages - because it's 
synchronous - and then process all the messages to be written, in a protected 
section.{quote}

I do not understand what you're trying to say. I don't think we can predict if 
the thread calling {{flushScheduledEvents}} is the single thread that processes 
incoming messages, or if it's a thread (potentially from a thread pool 
associated to an Executor) that is processing outgoing messages. Even if the 
outgoing messages would be guaranteed to be processed by a single thread, we'd 
still have to guard against the inbound and outbound thread calling 
{{flushScheduledEvents}} at the same time, right?

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838709#comment-16838709
 ] 

Guus der Kinderen edited comment on DIRMINA-1107 at 5/13/19 5:03 PM:
-

Thanks for all the feedback!

For what it's worth: we're currently experimenting with a code change that 
intends to:
* Ensure that queues are emptied at least as often as {{flushScheduledEvents}} 
is invoked (which should prevent that events remain in the queue indefinitely 
after the flush has been requested - we can't be sure that another flush is 
requested in a timely manner).
* Not block any threads while one thread is operating on the filters (in other 
words: continue to allow for the 'tryLock' if statement, to prevent worker 
threads from being blocked).

We are attempting to do this by introducing a lock that is released, 
atomically, only after it has been "used" as often as the lock has attempted to 
be acquired.

Our work-in-progress code:

{code:java}private class CountReentrantLock {
private int i = 0;
private final ReentrantLock lock = new ReentrantLock();

public synchronized boolean tryAcquireLock() {
i++;
return lock.tryLock();
}

public synchronized boolean tryUnlock() {
i--;
if(i<=0) {
i=0;
lock.unlock();
return true;
}
return false;
}

public synchronized void forceUnlock() {
i = 0;
lock.unlock();
}
}
{code}

This then could be used like this:

{code:java}
void flushScheduledEvents() {
if (sslLock.tryAcquireLock()) {
try {
do {
while ((event = filterWriteEventQueue.poll()) != null) {
// ...
}

while ((event = messageReceivedEventQueue.poll()) != null){
// ...
}
} while (!filterWriteEventQueue.isEmpty() || 
!messageReceivedEventQueue.isEmpty() || !sslLock.tryUnlock());
} catch( Throwable t) {
sslLock.forceUnlock();
}
}
}{code}

We've not tested this code yet. A concern that we haven't thought through yet 
is re-entry of a thread that already owns the lock.


was (Author: guus.der.kinderen):
Thanks for all the feedback!

For what it's worth: we're currently experimenting with a code change that 
intends to:
* Ensure that queues are emptied at least as often as {{flushScheduledEvents}} 
is invoked (which should prevent that events remain in the queue indefinitely 
after the flush has been requested - we can't be sure that another flush is 
requested in a timely manner).
* Not block any threads (continue to allow for the 'tryLock' if statement, to 
prevent worker threads from being blocked).

We are attempting to do this by introducing a lock that is released, 
atomically, only after it has been "used" as often as the lock has attempted to 
be acquired.

Our work-in-progress code:

{code:java}private class CountReentrantLock {
private int i = 0;
private final ReentrantLock lock = new ReentrantLock();

public synchronized boolean tryAcquireLock() {
i++;
return lock.tryLock();
}

public synchronized boolean tryUnlock() {
i--;
if(i<=0) {
i=0;
lock.unlock();
return true;
}
return false;
}

public synchronized void forceUnlock() {
i = 0;
lock.unlock();
}
}
{code}

This then could be used like this:

{code:java}
void flushScheduledEvents() {
if (sslLock.tryAcquireLock()) {
try {
do {
while ((event = filterWriteEventQueue.poll()) != null) {
// ...
}

while ((event = messageReceivedEventQueue.poll()) != null){
// ...
}
} while (!filterWriteEventQueue.isEmpty() || 
!messageReceivedEventQueue.isEmpty() || !sslLock.tryUnlock());
} catch( Throwable t) {
sslLock.forceUnlock();
}
}
}{code}

We've not tested this code yet. A concern that we haven't thought through yet 
is re-entry of a thread that already owns the lock.

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> 

[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838709#comment-16838709
 ] 

Guus der Kinderen commented on DIRMINA-1107:


Thanks for all the feedback!

For what it's worth: we're currently experimenting with a code change that 
intends to:
* Ensure that queues are emptied at least as often as {{flushScheduledEvents}} 
is invoked (which should prevent that events remain in the queue indefinitely 
after the flush has been requested - we can't be sure that another flush is 
requested in a timely manner).
* Not block any threads (continue to allow for the 'tryLock' if statement, to 
prevent worker threads from being blocked).

We are attempting to do this by introducing a lock that is released, 
atomically, only after it has been "used" as often as the lock has attempted to 
be acquired.

Our work-in-progress code:

{code:java}private class CountReentrantLock {
private int i = 0;
private final ReentrantLock lock = new ReentrantLock();

public synchronized boolean tryAcquireLock() {
i++;
return lock.tryLock();
}

public synchronized boolean tryUnlock() {
i--;
if(i<=0) {
i=0;
lock.unlock();
return true;
}
return false;
}

public synchronized void forceUnlock() {
i = 0;
lock.unlock();
}
}
{code}

This then could be used like this:

{code:java}
void flushScheduledEvents() {
if (sslLock.tryAcquireLock()) {
try {
do {
while ((event = filterWriteEventQueue.poll()) != null) {
// ...
}

while ((event = messageReceivedEventQueue.poll()) != null){
// ...
}
} while (!filterWriteEventQueue.isEmpty() || 
!messageReceivedEventQueue.isEmpty() || !sslLock.tryUnlock());
} catch( Throwable t) {
sslLock.forceUnlock();
}
}
}{code}

We've not tested this code yet. A concern that we haven't thought through yet 
is re-entry of a thread that already owns the lock.

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // 

[jira] [Comment Edited] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838662#comment-16838662
 ] 

Guus der Kinderen edited comment on DIRMINA-1107 at 5/13/19 4:15 PM:
-

We've found the issue in version 2.1.2, but didn't attempt to test against 
other versions.

Placing the counter inside the protected code block would open the possibility 
for an invocation of {{flushScheduledEvents}} that does nothing (fails to 
acquire the lock), while there _are_ events in the queue. This would occur if 
the first thread just exited the outer 'do/while' loop, but before it released 
the lock in the 'finally' block, when the second thread tries to acquire the 
lock.


was (Author: guus.der.kinderen):
We've found the issue in version 2.1.2.

Placing the counter inside the protected code block would open the possibility 
for an invocation of {{flushScheduledEvents}} that does nothing (fails to 
acquire the lock), while there _are_ events in the queue. This would occur if 
the first thread just exited the outer 'do/while' loop, but before it released 
the lock in the 'finally' block, when the second thread tries to acquire the 
lock.

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3, 2.0.23
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838662#comment-16838662
 ] 

Guus der Kinderen edited comment on DIRMINA-1107 at 5/13/19 4:13 PM:
-

We've found the issue in version 2.1.2.

Placing the counter inside the protected code block would open the possibility 
for an invocation of {{flushScheduledEvents}} that does nothing (fails to 
acquire the lock), while there _are_ events in the queue. This would occur if 
the first thread just exited the outer 'do/while' loop, but before it released 
the lock in the 'finally' block, when the second thread tries to acquire the 
lock.


was (Author: guus.der.kinderen):
We've found the issue in version 2.1.2.

Placing the counter inside the protected code block would open the possibility 
for an invocation of {{flushScheudledEvents}} that does nothing (fails to 
acquire the lock), while there _are_ events in the queue. This would occur if 
the first thread just exited the outer 'do/while' loop, but before it released 
the lock in the 'finally' block, when the second thread tries to acquire the 
lock.

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3, 2.0.23
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838662#comment-16838662
 ] 

Guus der Kinderen commented on DIRMINA-1107:


We've found the issue in version 2.1.2.

Placing the counter inside the protected code block would open the possibility 
for an invocation of {{flushScheudledEvents}} that does nothing (fails to 
acquire the lock), while there _are_ events in the queue. This would occur if 
the first thread just exited the outer 'do/while' loop, but before it released 
the lock in the 'finally' block, when the second thread tries to acquire the 
lock.

> SslHandler flushScheduledEvents race condition, redux
> -
>
> Key: DIRMINA-1107
> URL: https://issues.apache.org/jira/browse/DIRMINA-1107
> Project: MINA
>  Issue Type: Bug
>Reporter: Guus der Kinderen
>Priority: Major
> Fix For: 2.1.3, 2.0.23
>
>
> DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
> replaces it with another multithreading issue.
> The fix for DIRMINA-1019 introduces a counter that contains the number of 
> events to be processed. A simplified version of the code is included below.
> {code:java}
> private final AtomicInteger scheduledEvents = new AtomicInteger(0);
> void flushScheduledEvents() {
> scheduledEvents.incrementAndGet();
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (scheduledEvents.decrementAndGet() > 0);
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> We have observed occasions where the value of {{scheduledEvents}} becomes a 
> negative value, while at the same time {{filterWriteEventQueue}} go 
> unprocessed.
> We suspect that this issue is triggered by a concurrency issue caused by the 
> first thread decrementing the counter after a second thread incremented it, 
> but before it attempted to acquire the lock.
> This allows the the first thread to empty the queues, decrementing the 
> counter to zero and release the lock, after which the second thread acquires 
> the lock successfully. Now, the second thread processes any elements in 
> {{filterWriteEventQueue}}, and then processes any elements in 
> {{messageReceivedEventQueue}}. If in between these two checks yet another 
> thread adds a new element to {{filterWriteEventQueue}}, this element can go 
> unprocessed (as the second thread does not loop, since the counter is zero or 
> negative, and the third thread can fail to acquire the lock).
> It's a seemingly unlikely scenario, but we are observing the behavior when 
> our systems are under high load.
> We've applied a code change after which this problem is no longer observed. 
> We've removed the counter, and check on the size of the queues instead:
> {code:java}
> void flushScheduledEvents() {
> if (sslLock.tryLock()) {
> try {
> do {
> while ((event = filterWriteEventQueue.poll()) != null) {
> // ...
> }
> 
> while ((event = messageReceivedEventQueue.poll()) != null){
> // ...
> }
> } while (!filterWriteEventQueue.isEmpty() || 
> !messageReceivedEventQueue.isEmpty());
> } finally {
> sslLock.unlock();
> }
> }
> }{code}
> This code change, as illustrated above, does introduce a new potential 
> problem. Theoretically, an event could be added to the queues and 
> {{flushScheduledEvents}} be called returning {{false}} for 
> {{sslLock.tryLock()}}, exactly after another thread just finished the 
> {{while}} loop, but before releasing the lock. This again would cause events 
> to go unprocessed.
> We've not observed this problem in the wild yet, but we're uncomfortable 
> applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux

2019-05-13 Thread Guus der Kinderen (JIRA)
Guus der Kinderen created DIRMINA-1107:
--

 Summary: SslHandler flushScheduledEvents race condition, redux
 Key: DIRMINA-1107
 URL: https://issues.apache.org/jira/browse/DIRMINA-1107
 Project: MINA
  Issue Type: Bug
Reporter: Guus der Kinderen


DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally 
replaces it with another multithreading issue.

The fix for DIRMINA-1019 introduces a counter that contains the number of 
events to be processed. A simplified version of the code is included below.
{code:java}
private final AtomicInteger scheduledEvents = new AtomicInteger(0);

void flushScheduledEvents() {
scheduledEvents.incrementAndGet();

if (sslLock.tryLock()) {
try {
do {
while ((event = filterWriteEventQueue.poll()) != null) {
// ...
}

while ((event = messageReceivedEventQueue.poll()) != null){
// ...
}
} while (scheduledEvents.decrementAndGet() > 0);
} finally {
sslLock.unlock();
}
}
}{code}
We have observed occasions where the value of {{scheduledEvents}} becomes a 
negative value, while at the same time {{filterWriteEventQueue}} go unprocessed.

We suspect that this issue is triggered by a concurrency issue caused by the 
first thread decrementing the counter after a second thread incremented it, but 
before it attempted to acquire the lock.

This allows the the first thread to empty the queues, decrementing the counter 
to zero and release the lock, after which the second thread acquires the lock 
successfully. Now, the second thread processes any elements in 
{{filterWriteEventQueue}}, and then processes any elements in 
{{messageReceivedEventQueue}}. If in between these two checks yet another 
thread adds a new element to {{filterWriteEventQueue}}, this element can go 
unprocessed (as the second thread does not loop, since the counter is zero or 
negative, and the third thread can fail to acquire the lock).

It's a seemingly unlikely scenario, but we are observing the behavior when our 
systems are under high load.

We've applied a code change after which this problem is no longer observed. 
We've removed the counter, and check on the size of the queues instead:

{code:java}
void flushScheduledEvents() {
if (sslLock.tryLock()) {
try {
do {
while ((event = filterWriteEventQueue.poll()) != null) {
// ...
}

while ((event = messageReceivedEventQueue.poll()) != null){
// ...
}
} while (!filterWriteEventQueue.isEmpty() || 
!messageReceivedEventQueue.isEmpty());
} finally {
sslLock.unlock();
}
}
}{code}

This code change, as illustrated above, does introduce a new potential problem. 
Theoretically, an event could be added to the queues and 
{{flushScheduledEvents}} be called returning {{false}} for 
{{sslLock.tryLock()}}, exactly after another thread just finished the {{while}} 
loop, but before releasing the lock. This again would cause events to go 
unprocessed.

We've not observed this problem in the wild yet, but we're uncomfortable 
applying this change as-is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1101) InvalidMarkException on session.write when using CompressionFilter.

2019-03-21 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798213#comment-16798213
 ] 

Guus der Kinderen commented on DIRMINA-1101:


This issue is also being discussed in the mailing list thread that starts here: 
[http://mail-archives.apache.org/mod_mbox/mina-dev/201903.mbox/%3CCA+GjePVToJ1hzFDTy50BwpEY1AFjPea+ETKe=leoohbwfre...@mail.gmail.com%3E]

> InvalidMarkException on session.write when using CompressionFilter.
> ---
>
> Key: DIRMINA-1101
> URL: https://issues.apache.org/jira/browse/DIRMINA-1101
> Project: MINA
>  Issue Type: Bug
>Affects Versions: 2.0.20
>Reporter: Jörg Michelberger
>Priority: Major
>
> I'm updated from a MINA 2.0.7 to the 2.0.20 and am a user of 
> CompressionFilter. Writing of Messages fails with a InvalidMarkException.
> Reproducible Test is:
>  * Copy MINA Core Test org.apache.mina.core.service.AbstractIoServiceTest to 
> MINA Compression Filter org.apache.mina.filter.compression Test Packages 
> package.
>  * Adapt package statement to org.apache.mina.filter.compression.
>  * Add Compression to acceptor and connector
>  *     acceptor.getFilterChain().addLast("compression", new 
> CompressionFilter());
>      acceptor.getFilterChain().addLast("logger", new LoggingFilter());
>      acceptor.getFilterChain().addLast("codec",
>      new ProtocolCodecFilter(new 
> TextLineCodecFactory(StandardCharsets.UTF_8)));
>  *     connector.getFilterChain().addLast("compression", new 
> CompressionFilter());
>      connector.getFilterChain().addLast("logger", new LoggingFilter());
>      connector.getFilterChain().addLast("codec",
>      new ProtocolCodecFilter(new 
> TextLineCodecFactory(StandardCharsets.UTF_8)));
>  * Set a Breakpoint to java.nio.Buffer.reset() method, where the 
> InvalidMarkException is thrown.
>  * Run Debug Testfile on 
> org.apache.mina.filter.compression.AbstractIoServiceTest 
> After the Exception the session is immediatelly scheduled for disconnect.
> It seems that there is a discrepancy between the mark() and reset() calls on 
> the underlaying Buffer. In case of compression, a Buffer with the compressed 
> content is created and is wrapped with the original Buffer in a 
> FilteredWriteRequest, because CompressionFilter is a WriteRequestFilter. This 
> is in WriteRequestFilter.filterWrite()
> In DefaultIoFilterChain$HeadFilter.filterWrite() is then the mark() call, 
> which is done on the compressed Buffer.
> In AbstractPollingIoProcessor.writeBuffer() is the reset() call, which is in 
> this case done on the original Buffer, leading to the Exception.
> It seems that the change at date 16.02.2016
> SHA-1: 44b58469f84ce991074cdc187b1c1f23b94cf445
> * Don't try to reset a message when it's not a IoBuffer
> which reassignes the buf before reset() is called broke it. The buf before 
> reassign looks much better as the right to reset() in this case.
>  
> {{java.nio.InvalidMarkException}}
>  {{    at java.nio.Buffer.reset(Buffer.java:306)}}
>  {{    at 
> org.apache.mina.core.buffer.AbstractIoBuffer.reset(AbstractIoBuffer.java:425)}}
>  {{    at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.writeBuffer(AbstractPollingIoProcessor.java:1131)}}
>  {{    at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.flushNow(AbstractPollingIoProcessor.java:994)}}
>  {{    at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.flush(AbstractPollingIoProcessor.java:921)}}
>  {{    at 
> org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:688)}}
>  {{    at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)}}
>  {{    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
>  {{    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
>  {{    at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-07-30 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562094#comment-16562094
 ] 

Guus der Kinderen commented on DIRMINA-1078:


Awesome, thanks!




> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Assignee: Jonathan Valliere
>Priority: Minor
> Attachments: 20180216.patch, 20180706.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-07-25 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555608#comment-16555608
 ] 

Guus der Kinderen commented on DIRMINA-1078:


Thanks! Please note that for Java 10, the changes applied to resolve 
DIRMINA-1088 should also be applied to this new class.

> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Assignee: Jonathan Valliere
>Priority: Minor
> Attachments: 20180216.patch, 20180706.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10

2018-07-25 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555606#comment-16555606
 ] 

Guus der Kinderen commented on DIRMINA-1088:


[~elecharny], would you mind applying the same change to 
https://git1-us-west.apache.org/repos/asf?p=mina.git;a=blob;f=mina-core/src/main/java/org/apache/mina/filter/executor/PriorityThreadPoolExecutor.java;h=dd6b6e140f2ed953efbb291a0410a7a37f691aa0;hb=56ca189e1b2de1b6d9ab3635dd8f331f13762009
 which got introduced as part of DIRMINA-1078?

> OrderedThreadPool implementation should be compatible with Java 10
> --
>
> Key: DIRMINA-1088
> URL: https://issues.apache.org/jira/browse/DIRMINA-1088
> Project: MINA
>  Issue Type: Bug
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: orderedthreadpool.patch
>
>
> {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from 
> {{java.util.concurrent.ThreadPoolExecutor}}
> OrderedThreadPoolExecutor, in its constructor, calls these two methods from 
> its parent to determine pool sizing:
> {code:java}
>  super.setCorePoolSize(corePoolSize);
>  super.setMaximumPoolSize(maximumPoolSize);{code}
> This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), 
> an additional input validation was added to 
> {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}}
> ThreadPoolExecutor Java 8:
> {code:java}
> public void setCorePoolSize(int corePoolSize) {
> if (corePoolSize < 0)
> throw new IllegalArgumentException();
> public void setMaximumPoolSize(int maximumPoolSize) {
> if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> {code}
> ThreadPoolExecutor Java 10:
> {code:java}
> public void setCorePoolSize(int corePoolSize) {
> if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> public void setMaximumPoolSize(int maximumPoolSize) {
> if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> {code}
> As a result, the first line of this part of the constructor of 
> OrderedThreadPoolExecutor now throws an IllegalArgumentException.
> {code:java}
>  super.setCorePoolSize(corePoolSize);
>  super.setMaximumPoolSize(maximumPoolSize);{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-07-06 Thread Guus der Kinderen (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-1078:
---
Attachment: 20180706.patch

> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Assignee: Jonathan Valliere
>Priority: Minor
> Attachments: 20180216.patch, 20180706.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10

2018-06-29 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527732#comment-16527732
 ] 

Guus der Kinderen commented on DIRMINA-1088:


I think the problem is resolved by setting Max before Core. That should have 
negligible side-effects otherwise. I've attached a patch to this issue. 

> OrderedThreadPool implementation should be compatible with Java 10
> --
>
> Key: DIRMINA-1088
> URL: https://issues.apache.org/jira/browse/DIRMINA-1088
> Project: MINA
>  Issue Type: Bug
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: orderedthreadpool.patch
>
>
> {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from 
> {{java.util.concurrent.ThreadPoolExecutor}}
> OrderedThreadPoolExecutor, in its constructor, calls these two methods from 
> its parent to determine pool sizing:
> {code:java}
>  super.setCorePoolSize(corePoolSize);
>  super.setMaximumPoolSize(maximumPoolSize);{code}
> This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), 
> an additional input validation was added to 
> {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}}
> ThreadPoolExecutor Java 8:
> {code:java}
> public void setCorePoolSize(int corePoolSize) {
> if (corePoolSize < 0)
> throw new IllegalArgumentException();
> public void setMaximumPoolSize(int maximumPoolSize) {
> if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> {code}
> ThreadPoolExecutor Java 10:
> {code:java}
> public void setCorePoolSize(int corePoolSize) {
> if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> public void setMaximumPoolSize(int maximumPoolSize) {
> if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> {code}
> As a result, the first line of this part of the constructor of 
> OrderedThreadPoolExecutor now throws an IllegalArgumentException.
> {code:java}
>  super.setCorePoolSize(corePoolSize);
>  super.setMaximumPoolSize(maximumPoolSize);{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10

2018-06-29 Thread Guus der Kinderen (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-1088:
---
Attachment: orderedthreadpool.patch

> OrderedThreadPool implementation should be compatible with Java 10
> --
>
> Key: DIRMINA-1088
> URL: https://issues.apache.org/jira/browse/DIRMINA-1088
> Project: MINA
>  Issue Type: Bug
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: orderedthreadpool.patch
>
>
> {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from 
> {{java.util.concurrent.ThreadPoolExecutor}}
> OrderedThreadPoolExecutor, in its constructor, calls these two methods from 
> its parent to determine pool sizing:
> {code:java}
>  super.setCorePoolSize(corePoolSize);
>  super.setMaximumPoolSize(maximumPoolSize);{code}
> This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), 
> an additional input validation was added to 
> {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}}
> ThreadPoolExecutor Java 8:
> {code:java}
> public void setCorePoolSize(int corePoolSize) {
> if (corePoolSize < 0)
> throw new IllegalArgumentException();
> public void setMaximumPoolSize(int maximumPoolSize) {
> if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> {code}
> ThreadPoolExecutor Java 10:
> {code:java}
> public void setCorePoolSize(int corePoolSize) {
> if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> public void setMaximumPoolSize(int maximumPoolSize) {
> if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> {code}
> As a result, the first line of this part of the constructor of 
> OrderedThreadPoolExecutor now throws an IllegalArgumentException.
> {code:java}
>  super.setCorePoolSize(corePoolSize);
>  super.setMaximumPoolSize(maximumPoolSize);{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10

2018-06-29 Thread Guus der Kinderen (JIRA)


 [ 
https://issues.apache.org/jira/browse/DIRMINA-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-1088:
---
Description: 
{{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from 
{{java.util.concurrent.ThreadPoolExecutor}}

OrderedThreadPoolExecutor, in its constructor, calls these two methods from its 
parent to determine pool sizing:

{code:java}
 super.setCorePoolSize(corePoolSize);
 super.setMaximumPoolSize(maximumPoolSize);{code}

This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), an 
additional input validation was added to 
{{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}}

ThreadPoolExecutor Java 8:
{code:java}
public void setCorePoolSize(int corePoolSize) {
if (corePoolSize < 0)
throw new IllegalArgumentException();

public void setMaximumPoolSize(int maximumPoolSize) {
if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();
{code}
ThreadPoolExecutor Java 10:
{code:java}
public void setCorePoolSize(int corePoolSize) {
if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();


public void setMaximumPoolSize(int maximumPoolSize) {
if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();
{code}

As a result, the first line of this part of the constructor of 
OrderedThreadPoolExecutor now throws an IllegalArgumentException.

{code:java}
 super.setCorePoolSize(corePoolSize);
 super.setMaximumPoolSize(maximumPoolSize);{code}


  was:
{{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from 
{{java.util.concurrent.ThreadPoolExecutor}}

OrderedThreadPool, in its constructor, calls these two methods from its parent 
to determine pool sizing:

{code:java}
 super.setCorePoolSize(corePoolSize);
 super.setMaximumPoolSize(maximumPoolSize);{code}

This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), an 
additional input validation was added to 
{{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}}

ThreadPoolExecutor Java 8:
{code:java}
public void setCorePoolSize(int corePoolSize) {
if (corePoolSize < 0)
throw new IllegalArgumentException();

public void setMaximumPoolSize(int maximumPoolSize) {
if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();
{code}
ThreadPoolExecutor Java 10:
{code:java}
public void setCorePoolSize(int corePoolSize) {
if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();


public void setMaximumPoolSize(int maximumPoolSize) {
if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();
{code}

As a result, the first line of this part of the constructor of 
OrderedThreadPool now throws an IllegalArgumentException.

{code:java}
 super.setCorePoolSize(corePoolSize);
 super.setMaximumPoolSize(maximumPoolSize);{code}



> OrderedThreadPool implementation should be compatible with Java 10
> --
>
> Key: DIRMINA-1088
> URL: https://issues.apache.org/jira/browse/DIRMINA-1088
> Project: MINA
>  Issue Type: Bug
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Major
>
> {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from 
> {{java.util.concurrent.ThreadPoolExecutor}}
> OrderedThreadPoolExecutor, in its constructor, calls these two methods from 
> its parent to determine pool sizing:
> {code:java}
>  super.setCorePoolSize(corePoolSize);
>  super.setMaximumPoolSize(maximumPoolSize);{code}
> This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), 
> an additional input validation was added to 
> {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}}
> ThreadPoolExecutor Java 8:
> {code:java}
> public void setCorePoolSize(int corePoolSize) {
> if (corePoolSize < 0)
> throw new IllegalArgumentException();
> public void setMaximumPoolSize(int maximumPoolSize) {
> if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> {code}
> ThreadPoolExecutor Java 10:
> {code:java}
> public void setCorePoolSize(int corePoolSize) {
> if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> public void setMaximumPoolSize(int maximumPoolSize) {
> if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
> throw new IllegalArgumentException();
> {code}
> As a 

[jira] [Created] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10

2018-06-29 Thread Guus der Kinderen (JIRA)
Guus der Kinderen created DIRMINA-1088:
--

 Summary: OrderedThreadPool implementation should be compatible 
with Java 10
 Key: DIRMINA-1088
 URL: https://issues.apache.org/jira/browse/DIRMINA-1088
 Project: MINA
  Issue Type: Bug
  Components: Core
Reporter: Guus der Kinderen


{{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from 
{{java.util.concurrent.ThreadPoolExecutor}}

OrderedThreadPool, in its constructor, calls these two methods from its parent 
to determine pool sizing:

{code:java}
 super.setCorePoolSize(corePoolSize);
 super.setMaximumPoolSize(maximumPoolSize);{code}

This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), an 
additional input validation was added to 
{{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}}

ThreadPoolExecutor Java 8:
{code:java}
public void setCorePoolSize(int corePoolSize) {
if (corePoolSize < 0)
throw new IllegalArgumentException();

public void setMaximumPoolSize(int maximumPoolSize) {
if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();
{code}
ThreadPoolExecutor Java 10:
{code:java}
public void setCorePoolSize(int corePoolSize) {
if (corePoolSize < 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();


public void setMaximumPoolSize(int maximumPoolSize) {
if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize)
throw new IllegalArgumentException();
{code}

As a result, the first line of this part of the constructor of 
OrderedThreadPool now throws an IllegalArgumentException.

{code:java}
 super.setCorePoolSize(corePoolSize);
 super.setMaximumPoolSize(maximumPoolSize);{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-06-29 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527330#comment-16527330
 ] 

Guus der Kinderen commented on DIRMINA-1078:


No worries, thanks for the update.

> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Minor
> Attachments: 20180216.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-06-04 Thread Guus der Kinderen (JIRA)


[ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500061#comment-16500061
 ] 

Guus der Kinderen commented on DIRMINA-1078:


[~johnnyv], [~elecharny], is the current state of the patches that I provided 
to your liking?

> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Minor
> Attachments: 20180216.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-04-20 Thread Guus der Kinderen (JIRA)

[ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445808#comment-16445808
 ] 

Guus der Kinderen commented on DIRMINA-1078:


Apologies for the long wait, gentlemen. I have now:
 * separated my implementation from the existing implementation, creating a new 
class
 * added a couple of unit tests.

This effectively created two new files, without changes to other files.

The patch can be downloaded here: 
https://github.com/guusdk/mina/compare/2.0...feature/prioritised-ordered-thread-pool-executor_standalone.patch

> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Minor
> Attachments: 20180216.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-02-19 Thread Guus der Kinderen (JIRA)

[ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369191#comment-16369191
 ] 

Guus der Kinderen commented on DIRMINA-1078:


I'm sorry, I don't understand the request. Change the name to what?

> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Minor
> Attachments: 20180216.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-02-19 Thread Guus der Kinderen (JIRA)

[ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368890#comment-16368890
 ] 

Guus der Kinderen commented on DIRMINA-1078:


Thanks for the feedback guys! I initially had this as a separate {{Executor}}, 
exactly for the reasons that you mention. You can see this variant of the code 
here: 
https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7 
(note that I fixed a bug later on - don't blindly use this version!)

There was a _lot_ of code duplication going on. I thought about creating an 
abstract base class from which this implementation, but also  
{{OrderedThreadPoolExecutor}} would extend - but that proved to much of a 
hassle.

With the patch that I introduced, I believe I was able to add the new 
functionality to the existing implementation of {{OrderedThreadPoolExecutor}}, 
as a completely _optional_, _not enabled by default_ behavior, that introduces 
next to _no behavioral changes_ if unused. Given this, I would personally 
prefer my addition to remain in {{OrderedThreadPoolExecutor}} instead of 
introducing a different {{ThreadPool}} implementation, but that's not a hill 
I'm willing to die on.

[~johnnyv], the {{FIFOEntry}} implementation is used as a tie-breaker, in case 
the {{Comparator}} that is provided does not distinguish between the sessions. 
It arguably can be left out, but I'd prefer to keep it in: It allows one to use 
relatively simple {{Comparator}} implementations, while still be ensured of 
defined behavior for those elements that are not covered in the {{Comparator}}. 
Again, not a hill I'm willing to die on though.

[~elecharny], if you're willing to transform my patch into a separate 
{{Executor}} (in case my arguments above were not compelling), please, be my 
guest. If anything, it'd save me time, and having a second set of eyes looking 
at this code will not hurt either.

> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Minor
> Attachments: 20180216.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-02-16 Thread Guus der Kinderen (JIRA)

 [ 
https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-1078:
---
Attachment: 20180216.patch

> OrderedThreadPoolExecutor should allow sessions to be prioritized
> -
>
> Key: DIRMINA-1078
> URL: https://issues.apache.org/jira/browse/DIRMINA-1078
> Project: MINA
>  Issue Type: New Feature
>  Components: Core
>Reporter: Guus der Kinderen
>Priority: Major
> Attachments: 20180216.patch
>
>
> The functionality provided in {{OrderedThreadPoolExecutor}} should be 
> augmented to, optionally, allow for assignment of priority to certain 
> sessions over others.
> We've introduced this functionality after observing issues in a deployment 
> where system resources where being starved by the sheer amount of sessions 
> that attempted to perform TLS. Without the class introduced by this commit, 
> events for any session could eventually fail (time-out), causing the session 
> to fail. If that session happened to be a session that had already 
> established TLS, the resources that were spent on establishing TLS are 
> wasted. The negative effect is amplified by the fact that a typical client in 
> such a situation would attempt
>  to reconnect, which further adds to the load of the system already being 
> starved.
> With the modifications introduced by the patch provided in this issue, 
> priority can be given to sessions that have already established TLS. This 
> dramatically reduces the issue described above, as the first sessions to fail 
> will be those that are still negotiating TLS. Using a specialized 
> {{Comparator}}, one can even prioritize between these, causing sessions for 
> which least effort has performed to fail before sessions that are more likely 
> to near TLS completion.
> The patch intends to add this feature as optional functionality to the 
> existing implementation, with little side effects to the existing, default 
> behavior.
> The implementation provided here was initially based on a copy of the 
> implementation of {{OrderedThreadPoolExecutor}} that introduced a 
> considerable amount of code duplication. For illustrative purposes, the line 
> of commits leading from that initial commit to the patch attached to this 
> JIRA issue can be found at 
> [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
>  and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized

2018-02-16 Thread Guus der Kinderen (JIRA)
Guus der Kinderen created DIRMINA-1078:
--

 Summary: OrderedThreadPoolExecutor should allow sessions to be 
prioritized
 Key: DIRMINA-1078
 URL: https://issues.apache.org/jira/browse/DIRMINA-1078
 Project: MINA
  Issue Type: New Feature
  Components: Core
Reporter: Guus der Kinderen


The functionality provided in {{OrderedThreadPoolExecutor}} should be augmented 
to, optionally, allow for assignment of priority to certain sessions over 
others.

We've introduced this functionality after observing issues in a deployment 
where system resources where being starved by the sheer amount of sessions that 
attempted to perform TLS. Without the class introduced by this commit, events 
for any session could eventually fail (time-out), causing the session to fail. 
If that session happened to be a session that had already established TLS, the 
resources that were spent on establishing TLS are wasted. The negative effect 
is amplified by the fact that a typical client in such a situation would attempt
 to reconnect, which further adds to the load of the system already being 
starved.

With the modifications introduced by the patch provided in this issue, priority 
can be given to sessions that have already established TLS. This dramatically 
reduces the issue described above, as the first sessions to fail will be those 
that are still negotiating TLS. Using a specialized {{Comparator}}, one can 
even prioritize between these, causing sessions for which least effort has 
performed to fail before sessions that are more likely to near TLS completion.

The patch intends to add this feature as optional functionality to the existing 
implementation, with little side effects to the existing, default behavior.

The implementation provided here was initially based on a copy of the 
implementation of {{OrderedThreadPoolExecutor}} that introduced a considerable 
amount of code duplication. For illustrative purposes, the line of commits 
leading from that initial commit to the patch attached to this JIRA issue can 
be found at 
[https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7]
 and later commits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DIRMINA-1072) SslFilter does not account for SSLEngine runtime exceptions

2017-10-07 Thread Guus der Kinderen (JIRA)

 [ 
https://issues.apache.org/jira/browse/DIRMINA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guus der Kinderen updated DIRMINA-1072:
---
Attachment: sslengine-exception-with-destroy.patch

Ah, I missed that. I've modified the patch, and attached the modification as 
[^sslengine-exception-with-destroy.patch].

This modification will, upon detecting the failed state, destroy the handler. 
With outbound done, it's unlikely to be useful anymore.

> SslFilter does not account for SSLEngine runtime exceptions
> ---
>
> Key: DIRMINA-1072
> URL: https://issues.apache.org/jira/browse/DIRMINA-1072
> Project: MINA
>  Issue Type: Bug
>  Components: SSL
>Affects Versions: 2.0.16
>Reporter: Guus der Kinderen
> Attachments: sslengine-exception.patch, 
> sslengine-exception-with-destroy.patch
>
>
> Mina's {{SslFilter}} wraps Mina's {{SslHandler}}, which itself wraps Java's 
> {{SSLEngine}}.
> {{SslFilter}} does not catch runtime exceptions that are thrown by 
> {{SSLEngine}} - I am unsure if this is by design.
> Ideally, we'd prevent the engine to get into a state where it can throw such 
> exceptions, but I'm not sure if that's completely feasible.
> None-the-less, I'm here providing an improvement that prevents at least one 
> occurrence of an unchecked exception from being thrown (instead, my patch 
> preemptively throws an {{SSLException}} that is then caught by the exception 
> handling that's already in place).
> An alternative to this fix could be an additional catch block, that handles 
> unchecked exceptions.
> The scenario that is causing the unchecked exception that is caught by this 
> patch, is this:
> * client connects, causes an SslFilter to be initialized, which causes the 
> SSLEngine to begin its handshake
> * server shuts down the input (for instance, for inactivity, or as a 
> side-effect of resource starvation)
> * client sends data
> The corresponding stack trace starts with this:
> {code}java.lang.IllegalStateException: Internal error
> at 
> sun.security.ssl.SSLEngineImpl.initHandshaker(SSLEngineImpl.java:470)
> at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1007)
> at 
> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
> at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
> at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624){code}
> Inspiration for this fix was obtain from the Jetty project, notably, this 
> change: https://github.com/eclipse/jetty.project/issues/1228



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DIRMINA-1072) SslFilter does not account for SSLEngine runtime exceptions

2017-10-06 Thread Guus der Kinderen (JIRA)
Guus der Kinderen created DIRMINA-1072:
--

 Summary: SslFilter does not account for SSLEngine runtime 
exceptions
 Key: DIRMINA-1072
 URL: https://issues.apache.org/jira/browse/DIRMINA-1072
 Project: MINA
  Issue Type: Bug
  Components: SSL
Affects Versions: 2.0.16
Reporter: Guus der Kinderen
 Attachments: sslengine-exception.patch

Mina's {{SslFilter}} wraps Mina's {{SslHandler}}, which itself wraps Java's 
{{SSLEngine}}.

{{SslFilter}} does not catch runtime exceptions that are thrown by 
{{SSLEngine}} - I am unsure if this is by design.

Ideally, we'd prevent the engine to get into a state where it can throw such 
exceptions, but I'm not sure if that's completely feasible.

None-the-less, I'm here providing an improvement that prevents at least one 
occurrence of an unchecked exception from being thrown (instead, my patch 
preemptively throws an {{SSLException}} that is then caught by the exception 
handling that's already in place).

An alternative to this fix could be an additional catch block, that handles 
unchecked exceptions.

The scenario that is causing the unchecked exception that is caught by this 
patch, is this:
* client connects, causes an SslFilter to be initialized, which causes the 
SSLEngine to begin its handshake
* server shuts down the input (for instance, for inactivity, or as a 
side-effect of resource starvation)
* client sends data

The corresponding stack trace starts with this:
{code}java.lang.IllegalStateException: Internal error
at sun.security.ssl.SSLEngineImpl.initHandshaker(SSLEngineImpl.java:470)
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1007)
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624){code}

Inspiration for this fix was obtain from the Jetty project, notably, this 
change: https://github.com/eclipse/jetty.project/issues/1228



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)