[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475267#comment-17475267 ] Guus der Kinderen commented on DIRMINA-1156: Ok, thanks! On Thu, 13 Jan 2022 at 11:58, Emmanuel Lécharny (Jira) > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Critical > Attachments: DIRMINA1156Test.java, > OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, > UnorderedThreadPoolExecutorTest.java > > > I stumbled upon this while performing analysis of heap dumps of a JVMs that > suffered from an issue where no user was able to establish a connection to > our server software anymore. > > Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the > affected server seems to be in an inconsistent state: > * idleWorkers: 2 > * waitingSessions.count: 2293 > * workers.map.size: 0 > What strikes me as odd is: > * No workers, while there are sessions waiting to be processed > * No workers, but a non-zero idle workers count > Servers that are unaffected by the issue have an idle worker count that is > equal to the amount of workers (I assume that the snapshots were taken when > that server was not actively processing data). > We are using Apache MINA 2.1.3. I have no indication that this problem is or > is not present in other versions. > This issue has also been discussed on the Apache MINA users mailinglist, > starting with this message: > https://www.mail-archive.com/users@mina.apache.org/msg06887.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org
[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475251#comment-17475251 ] Guus der Kinderen commented on DIRMINA-1156: Is there any idea of when this will make it in a release? I can use that to guide my own release plans. :) > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Critical > Attachments: DIRMINA1156Test.java, > OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, > UnorderedThreadPoolExecutorTest.java > > > I stumbled upon this while performing analysis of heap dumps of a JVMs that > suffered from an issue where no user was able to establish a connection to > our server software anymore. > > Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the > affected server seems to be in an inconsistent state: > * idleWorkers: 2 > * waitingSessions.count: 2293 > * workers.map.size: 0 > What strikes me as odd is: > * No workers, while there are sessions waiting to be processed > * No workers, but a non-zero idle workers count > Servers that are unaffected by the issue have an idle worker count that is > equal to the amount of workers (I assume that the snapshots were taken when > that server was not actively processing data). > We are using Apache MINA 2.1.3. I have no indication that this problem is or > is not present in other versions. > This issue has also been discussed on the Apache MINA users mailinglist, > starting with this message: > https://www.mail-archive.com/users@mina.apache.org/msg06887.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org
[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474959#comment-17474959 ] Guus der Kinderen commented on DIRMINA-1156: Thank you for the super fast help! > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Critical > Attachments: DIRMINA1156Test.java, > OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, > UnorderedThreadPoolExecutorTest.java > > > I stumbled upon this while performing analysis of heap dumps of a JVMs that > suffered from an issue where no user was able to establish a connection to > our server software anymore. > > Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the > affected server seems to be in an inconsistent state: > * idleWorkers: 2 > * waitingSessions.count: 2293 > * workers.map.size: 0 > What strikes me as odd is: > * No workers, while there are sessions waiting to be processed > * No workers, but a non-zero idle workers count > Servers that are unaffected by the issue have an idle worker count that is > equal to the amount of workers (I assume that the snapshots were taken when > that server was not actively processing data). > We are using Apache MINA 2.1.3. I have no indication that this problem is or > is not present in other versions. > This issue has also been discussed on the Apache MINA users mailinglist, > starting with this message: > https://www.mail-archive.com/users@mina.apache.org/msg06887.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org
[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474951#comment-17474951 ] Guus der Kinderen commented on DIRMINA-1156: Thank you. My last comment was a mistake. Please ignore it. > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Critical > Attachments: DIRMINA1156Test.java, > OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, > UnorderedThreadPoolExecutorTest.java > > > I stumbled upon this while performing analysis of heap dumps of a JVMs that > suffered from an issue where no user was able to establish a connection to > our server software anymore. > > Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the > affected server seems to be in an inconsistent state: > * idleWorkers: 2 > * waitingSessions.count: 2293 > * workers.map.size: 0 > What strikes me as odd is: > * No workers, while there are sessions waiting to be processed > * No workers, but a non-zero idle workers count > Servers that are unaffected by the issue have an idle worker count that is > equal to the amount of workers (I assume that the snapshots were taken when > that server was not actively processing data). > We are using Apache MINA 2.1.3. I have no indication that this problem is or > is not present in other versions. > This issue has also been discussed on the Apache MINA users mailinglist, > starting with this message: > https://www.mail-archive.com/users@mina.apache.org/msg06887.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org
[jira] [Comment Edited] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686 ] Guus der Kinderen edited comment on DIRMINA-1156 at 1/12/22, 5:24 PM: -- Since your last comment in this issue, we've followed up in the mailing list. To summarize: {quote}I've asked a user that suffers from the problem to run with a slightly modified version of MINA. The modification logs (and rethrows) any Throwable in Worker's run(). This caused an Error to be logged! The Error was thrown by an IoHandler implementation. Full details are available at [https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738] {quote} Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for {{{}PriorityThreadPoolExecutorTest.java{}}}, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the {{run()}} method of the {{Worker}} class in each {{ThreadPoolExecutor}} implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a {{Session}} that has a {{Handler}} that throws an {{Error}} (I've included tests that throw checked and unchecked \{{Exception}}s, but those do not trigger the bug). Removing the 'finally' block that is inside the {{for (;;)}} block of Worker, and making execution of {{idleWorkers.incrementAndGet();}} happen only when no {{Throwable}} is thrown, will make all these tests go green. In other words, replace: {code:java} try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } {code} with this: {code:java} if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); {code} (a slightly different implementation is needed for PriorityThreadPoolExecutor). was (Author: guus.der.kinderen): Since your last comment in this issue, we've followed up in the mailing list. To summarize: bq. I've asked a user that suffers from the problem to run with a slightly modified version of MINA. The modification logs (and rethrows) any Throwable in Worker's run(). This caused an Error to be logged! The Error was thrown by an IoHandler implementation. Full details are available at [https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738] Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for {{PriorityThreadPoolExecutorTest.java}}, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the {{run()}} method of the {{Worker}} class in each {{ThreadPoolExecutor}} implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a {{Session}} that has a {{Handler}} that throws an {{Error}} (I've included tests that throw checked and unchecked {{Exception}}s, but those do not trigger the bug). Removing the 'finally' block that is inside the {{for (;;)}} block of Worker, and making execution of {{idleWorkers.incrementAndGet();}} happen only when no {{Throwable}} is thrown, will make all these tests go green. In other words, replace: {code:java} try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } {code} with this: {code:java} if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); {code} (a slightly different implementation is needed for PriorityThreadPoolExecutor). > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >
[jira] [Comment Edited] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686 ] Guus der Kinderen edited comment on DIRMINA-1156 at 1/12/22, 5:23 PM: -- Since your last comment in this issue, we've followed up in the mailing list. To summarize: bq. I've asked a user that suffers from the problem to run with a slightly modified version of MINA. The modification logs (and rethrows) any Throwable in Worker's run(). This caused an Error to be logged! The Error was thrown by an IoHandler implementation. Full details are available at [https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738] Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for {{PriorityThreadPoolExecutorTest.java}}, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the {{run()}} method of the {{Worker}} class in each {{ThreadPoolExecutor}} implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a {{Session}} that has a {{Handler}} that throws an {{Error}} (I've included tests that throw checked and unchecked {{Exception}}s, but those do not trigger the bug). Removing the 'finally' block that is inside the {{for (;;)}} block of Worker, and making execution of {{idleWorkers.incrementAndGet();}} happen only when no {{Throwable}} is thrown, will make all these tests go green. In other words, replace: {code:java} try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } {code} with this: {code:java} if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); {code} (a slightly different implementation is needed for PriorityThreadPoolExecutor). was (Author: guus.der.kinderen): Since your last comment in this issue, we've followed up in the mailing list. To summarize: bq. I've asked a user that suffers from the problem to run with a slightly modified version of MINA. The modification logs (and rethrows) any Throwable in Worker's run(). This caused an Error to be logged! The Error was thrown by an IoHandler implementation. Full details are available at [https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738] Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for PriorityThreadPoolExecutorTest.java, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the `Worker` class in each ThreadPoolExecutor implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a Session that has a Handler that throws an Error (I've included tests that throw checked and unchecked exceptions, but those do not trigger the bug). Removing the 'finally' block that is inside the for (;;) block of Worker, and making execution of idleWorkers.incrementAndGet(); happen only when no Throwable is thrown, will make all these tests go green. In other words, replace: {code:java} try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } {code} with this: {code:java} if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); {code} (a slightly different implementation is needed for PriorityThreadPoolExecutor). > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority:
[jira] [Comment Edited] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686 ] Guus der Kinderen edited comment on DIRMINA-1156 at 1/12/22, 5:20 PM: -- Since your last comment in this issue, we've followed up in the mailing list. To summarize: bq. I've asked a user that suffers from the problem to run with a slightly modified version of MINA. The modification logs (and rethrows) any Throwable in Worker's run(). This caused an Error to be logged! The Error was thrown by an IoHandler implementation. Full details are available at [https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738] Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for PriorityThreadPoolExecutorTest.java, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the `Worker` class in each ThreadPoolExecutor implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a Session that has a Handler that throws an Error (I've included tests that throw checked and unchecked exceptions, but those do not trigger the bug). Removing the 'finally' block that is inside the for (;;) block of Worker, and making execution of idleWorkers.incrementAndGet(); happen only when no Throwable is thrown, will make all these tests go green. In other words, replace: {code:java} try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } {code} with this: {code:java} if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); {code} (a slightly different implementation is needed for PriorityThreadPoolExecutor). was (Author: guus.der.kinderen): Since your last comment in this issue, we've followed up in the mailing list. To summarize: > I've asked a user that suffers from the problem to run with a slightly > modified version of MINA. The modification logs (and rethrows) any Throwable > in Worker's run(). This caused an Error to be logged! The Error was thrown by > an IoHandler implementation. Full details are available at > https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738 Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for PriorityThreadPoolExecutorTest.java, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the `Worker` class in each ThreadPoolExecutor implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a Session that has a Handler that throws an Error (I've included tests that throw checked and unchecked exceptions, but those do not trigger the bug). Removing the 'finally' block that is inside the for (;;) block of Worker, and making execution of idleWorkers.incrementAndGet(); happen only when no Throwable is thrown, will make all these tests go green. In other words, replace: ``` try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } ``` with this: ``` if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); ``` (a slightly different implementation is needed for PriorityThreadPoolExecutor). > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Critical >
[jira] [Comment Edited] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686 ] Guus der Kinderen edited comment on DIRMINA-1156 at 1/12/22, 4:39 PM: -- Since your last comment in this issue, we've followed up in the mailing list. To summarize: > I've asked a user that suffers from the problem to run with a slightly > modified version of MINA. The modification logs (and rethrows) any Throwable > in Worker's run(). This caused an Error to be logged! The Error was thrown by > an IoHandler implementation. Full details are available at > https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738 Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for PriorityThreadPoolExecutorTest.java, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the `Worker` class in each ThreadPoolExecutor implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a Session that has a Handler that throws an Error (I've included tests that throw checked and unchecked exceptions, but those do not trigger the bug). Removing the 'finally' block that is inside the for (;;) block of Worker, and making execution of idleWorkers.incrementAndGet(); happen only when no Throwable is thrown, will make all these tests go green. In other words, replace: ``` try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } ``` with this: ``` if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); ``` (a slightly different implementation is needed for PriorityThreadPoolExecutor). was (Author: guus.der.kinderen): Since your last comment in this issue, we've followed up in the mailing list. To summarize: > I've asked a user that suffers from the problem to run with a slightly > modified version of MINA. The modification logs (and rethrows) any Throwable > in Worker's run(). This caused an Error to be logged! The Error was thrown by > an IoHandler implementation. Full details are available at > https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738 Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for PriorityThreadPoolExecutorTest.java, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the `Worker` class in each ThreadPoolExecutor implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a Session that has a Handler that throws an Error (I've included tests that throw checked and unchecked exceptions, but those do not trigger the bug). Removing the 'finally' block that is inside the for (;;) block of Worker, and making execution of idleWorkers.incrementAndGet(); happen only when no Throwable is thrown, will make all these tests go green. In other words, replace: ```java try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } ``` with this: ```java if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); ``` (a slightly different implementation is needed for PriorityThreadPoolExecutor). > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >
[jira] [Commented] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474686#comment-17474686 ] Guus der Kinderen commented on DIRMINA-1156: Since your last comment in this issue, we've followed up in the mailing list. To summarize: > I've asked a user that suffers from the problem to run with a slightly > modified version of MINA. The modification logs (and rethrows) any Throwable > in Worker's run(). This caused an Error to be logged! The Error was thrown by > an IoHandler implementation. Full details are available at > https://igniterealtime.atlassian.net/browse/OF-2367?focusedCommentId=22738 Note that this problem affects _all_ ThreadPoolExecutors, not just the ordered one! I have written various unit tests to illustrate the issue, which are attached to this issue (note that these are all new classes, except for PriorityThreadPoolExecutorTest.java, which expands on a pre-existing implementation). The tests in [^PriorityThreadPoolExecutorTest.java] [^OrderedThreadPoolExecutorTest.java] and [^UnorderedThreadPoolExecutorTest.java] assert that the `run()` method of the `Worker` class in each ThreadPoolExecutor implementation is susceptible to this problem, but uses an implementation that bypasses some of the usage patterns that would normally prevent this issue. These tests prove that the implementation is weak, without immediately identifying an exploit of that weakness. The tests in [^DIRMINA1156Test.java] simulate one of the real-world causes of the issue: having a Session that has a Handler that throws an Error (I've included tests that throw checked and unchecked exceptions, but those do not trigger the bug). Removing the 'finally' block that is inside the for (;;) block of Worker, and making execution of idleWorkers.incrementAndGet(); happen only when no Throwable is thrown, will make all these tests go green. In other words, replace: ```java try { if (session != null) { runTasks(getSessionTasksQueue(session)); } } finally { idleWorkers.incrementAndGet(); } ``` with this: ```java if (session != null) { runTasks(getSessionTasksQueue(session)); } idleWorkers.incrementAndGet(); ``` (a slightly different implementation is needed for PriorityThreadPoolExecutor). > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Critical > Attachments: DIRMINA1156Test.java, > OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, > UnorderedThreadPoolExecutorTest.java > > > I stumbled upon this while performing analysis of heap dumps of a JVMs that > suffered from an issue where no user was able to establish a connection to > our server software anymore. > > Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the > affected server seems to be in an inconsistent state: > * idleWorkers: 2 > * waitingSessions.count: 2293 > * workers.map.size: 0 > What strikes me as odd is: > * No workers, while there are sessions waiting to be processed > * No workers, but a non-zero idle workers count > Servers that are unaffected by the issue have an idle worker count that is > equal to the amount of workers (I assume that the snapshots were taken when > that server was not actively processing data). > We are using Apache MINA 2.1.3. I have no indication that this problem is or > is not present in other versions. > This issue has also been discussed on the Apache MINA users mailinglist, > starting with this message: > https://www.mail-archive.com/users@mina.apache.org/msg06887.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org
[jira] [Updated] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-1156: --- Attachment: DIRMINA1156Test.java > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Critical > Attachments: DIRMINA1156Test.java, > OrderedThreadPoolExecutorTest.java, PriorityThreadPoolExecutorTest.java, > UnorderedThreadPoolExecutorTest.java > > > I stumbled upon this while performing analysis of heap dumps of a JVMs that > suffered from an issue where no user was able to establish a connection to > our server software anymore. > > Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the > affected server seems to be in an inconsistent state: > * idleWorkers: 2 > * waitingSessions.count: 2293 > * workers.map.size: 0 > What strikes me as odd is: > * No workers, while there are sessions waiting to be processed > * No workers, but a non-zero idle workers count > Servers that are unaffected by the issue have an idle worker count that is > equal to the amount of workers (I assume that the snapshots were taken when > that server was not actively processing data). > We are using Apache MINA 2.1.3. I have no indication that this problem is or > is not present in other versions. > This issue has also been discussed on the Apache MINA users mailinglist, > starting with this message: > https://www.mail-archive.com/users@mina.apache.org/msg06887.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org
[jira] [Updated] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/DIRMINA-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-1156: --- Attachment: PriorityThreadPoolExecutorTest.java OrderedThreadPoolExecutorTest.java UnorderedThreadPoolExecutorTest.java > Inconsistent worker / idleWorker in OrderedThreadPoolExecutor > - > > Key: DIRMINA-1156 > URL: https://issues.apache.org/jira/browse/DIRMINA-1156 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Critical > Attachments: OrderedThreadPoolExecutorTest.java, > PriorityThreadPoolExecutorTest.java, UnorderedThreadPoolExecutorTest.java > > > I stumbled upon this while performing analysis of heap dumps of a JVMs that > suffered from an issue where no user was able to establish a connection to > our server software anymore. > > Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the > affected server seems to be in an inconsistent state: > * idleWorkers: 2 > * waitingSessions.count: 2293 > * workers.map.size: 0 > What strikes me as odd is: > * No workers, while there are sessions waiting to be processed > * No workers, but a non-zero idle workers count > Servers that are unaffected by the issue have an idle worker count that is > equal to the amount of workers (I assume that the snapshots were taken when > that server was not actively processing data). > We are using Apache MINA 2.1.3. I have no indication that this problem is or > is not present in other versions. > This issue has also been discussed on the Apache MINA users mailinglist, > starting with this message: > https://www.mail-archive.com/users@mina.apache.org/msg06887.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org
[jira] [Created] (DIRMINA-1156) Inconsistent worker / idleWorker in OrderedThreadPoolExecutor
Guus der Kinderen created DIRMINA-1156: -- Summary: Inconsistent worker / idleWorker in OrderedThreadPoolExecutor Key: DIRMINA-1156 URL: https://issues.apache.org/jira/browse/DIRMINA-1156 Project: MINA Issue Type: Bug Reporter: Guus der Kinderen I stumbled upon this while performing analysis of heap dumps of a JVMs that suffered from an issue where no user was able to establish a connection to our server software anymore. Our application uses an ExecutorFilter. The OrderedThreadPoolExecutor of the affected server seems to be in an inconsistent state: * idleWorkers: 2 * waitingSessions.count: 2293 * workers.map.size: 0 What strikes me as odd is: * No workers, while there are sessions waiting to be processed * No workers, but a non-zero idle workers count Servers that are unaffected by the issue have an idle worker count that is equal to the amount of workers (I assume that the snapshots were taken when that server was not actively processing data). We are using Apache MINA 2.1.3. I have no indication that this problem is or is not present in other versions. This issue has also been discussed on the Apache MINA users mailinglist, starting with this message: https://www.mail-archive.com/users@mina.apache.org/msg06887.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@mina.apache.org For additional commands, e-mail: dev-h...@mina.apache.org
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848040#comment-16848040 ] Guus der Kinderen commented on DIRMINA-: The fix seems to have a positive effect in two different environments. I'd consider this one resolved! > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844880#comment-16844880 ] Guus der Kinderen commented on DIRMINA-: Maybe bike-shidding, but wouldn't {{clear}} potentially cause some bytes to be 'uncounted', even if they were processed? Even if you're not using {{mark}} now, at some point in the future, that might be introduced again. Checking for a mark, and repositioning to the mark if present, otherwise to 0, seems like a desirable defensive construct. > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844880#comment-16844880 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 2:17 PM: - Maybe bike-shedding, but wouldn't {{clear}} potentially cause some bytes to be 'uncounted', even if they were processed? Even if you're not using {{mark}} now, at some point in the future, that might be introduced again. Checking for a mark, and repositioning to the mark if present, otherwise to 0, seems like a desirable defensive construct. was (Author: guus.der.kinderen): Maybe bike-shidding, but wouldn't {{clear}} potentially cause some bytes to be 'uncounted', even if they were processed? Even if you're not using {{mark}} now, at some point in the future, that might be introduced again. Checking for a mark, and repositioning to the mark if present, otherwise to 0, seems like a desirable defensive construct. > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844834#comment-16844834 ] Guus der Kinderen commented on DIRMINA-: Thanks for both your efforts on this! I still don't quite understand the fix, but am happy to accept it. {{Clear}}, unlike {{reset}} will prevent the {{InvalidMarkException}}. I'm not confident that using {{slice}} instead if {{duplicate}} sufficiently prevents the buffer from being in an unpredictable state in {{clearWriteRequestQueue}}. Isn't it possible that after construction (slicing) of the WriteRequest and during the processing of that instance an exception occurs that causes it to be handled by {{clearWriteRequestQueue}}? We'd still be in potential trouble there. > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844812#comment-16844812 ] Guus der Kinderen commented on DIRMINA-: I don't quite understand your suggestion. Could you illustrate it with a code snippet please? > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844793#comment-16844793 ] Guus der Kinderen commented on DIRMINA-: {{rewind()}} looks like a simpler solution than resetting-if-marked, but there's a small difference in that the position would always revert to 0, instead of the last marked position. I'm not sure if that makes any difference to code that depends on this. In any case, preventing the {{InvalidMarkException}} would be achieved by both, and that is what seems to prevent the CPU-churning. I'd be happy with either solution. > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844783#comment-16844783 ] Guus der Kinderen commented on DIRMINA-: That would change the behavior - instead of rewinding the buffer to be re-read, it would effectively discard information. I'm not sure why we're resetting in the first place here, but I'm guessing it serves a purpose (to later process the remainder of the buffer?) With a reset, the data will likely be lost (although it's not effectively wiped from the buffer, recovering it would be harder). > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:05 PM: -- Although we're not exactly sure how, we've "resolved" the issue by modifying the snippet below: {code:java|title=Original code for org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} As said, this triggered {{InvalidMarkExceptions}}. After we added a simple {{try/catch}} around the reset, the CPU spikes go away. We have not investigated why this resolves the CPU spikes, but we assume that the thrown exception prevents keys from being consumed. We're still suffering from a different issue (all clients unexpectedly reconnect periodically), but we're now working on the assumption that the CPU spike is a result, not the cause, of this. Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that this is often called from exception handlers. With that, the state of the buffer that it's operating on, is likely unpredictable. The call to {{reset()}} assumes that a mark is set, but there are various scenario's where this is untrue: * the buffer could have been completely unused (a buffer fresh from the constructor will cause {{InvalidMarkException}} here * the buffer might have been flipped, but not yet read. We're uncertain if the reset is needed, but if it is, we suggest to explicitly check if the mark has been set. If not, then we don't believe a reset is needed. {code:java|title=Proposed fix for org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { if (buf.markValue() > -1) { buf.reset(); } failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} was (Author: guus.der.kinderen): Although we're not exactly sure how, we've "resolved" the issue by modifying the snippet below: {code:java|title=Original code for org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} As said, this triggered {{InvalidMarkExceptions}}. After we added a simple {{try/catch}} around the reset, the CPU spikes go away. We have not investigated why this resolves the CPU spikes, but we assume that the thrown exception prevents keys from being consumed. We're still suffering from a different issue (all clients unexpectedly reconnect periodically), but we're now working on the assumption that the CPU spike is a result, not the cause, of this. Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that this is often called from exception handlers. With that, the state of the buffer that it's operating on, is likely unpredictable. The call to {{reset()}} assumes that a mark is set, but there are various scenario's where this is untrue: * the buffer could have been completely unused (a buffer fresh from the constructor will cause {{InvalidMarkException}} here * the buffer might have been flipped, but not yet read. We're uncertain if the reset is needed, but if it is, we suggest to explicitly check if the mark has been set. If not, then we don't believe a reset is needed. {code:java|title=Proposed fix for org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { if (buf.markValue() >= -1) { buf.reset(); } failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:04 PM: -- Although we're not exactly sure how, we've "resolved" the issue by modifying the snippet below: {code:java|title=Original code for org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} As said, this triggered {{InvalidMarkExceptions}}. After we added a simple {{try/catch}} around the reset, the CPU spikes go away. We have not investigated why this resolves the CPU spikes, but we assume that the thrown exception prevents keys from being consumed. We're still suffering from a different issue (all clients unexpectedly reconnect periodically), but we're now working on the assumption that the CPU spike is a result, not the cause, of this. Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that this is often called from exception handlers. With that, the state of the buffer that it's operating on, is likely unpredictable. The call to {{reset()}} assumes that a mark is set, but there are various scenario's where this is untrue: * the buffer could have been completely unused (a buffer fresh from the constructor will cause {{InvalidMarkException}} here * the buffer might have been flipped, but not yet read. We're uncertain if the reset is needed, but if it is, we suggest to explicitly check if the mark has been set. If not, then we don't believe a reset is needed. {code:java|title=Proposed fix for org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { if (buf.markValue() >= -1) { buf.reset(); } failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} was (Author: guus.der.kinderen): Although we're not exactly sure how, we've "resolved" the issue by modifying the snippet below: {code:java|title=Original code for org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} As said, this triggered {{InvalidMarkExceptions}}. After we added a simple {{try/catch}} around the reset, the CPU spikes go away. We have not investigated why this resolves the CPU spikes, but we assume that the thrown exception prevents keys from being consumed. We're still suffering from a different issue (all clients unexpectedly reconnect periodically), but we're now working on the assumption that the CPU spike is a result, not the cause, of this. Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that this is often called from exception handlers. With that, the state of the buffer that it's operating on, is likely unpredictable. The call to {{reset()}} assumes that a mark is set, but there are various scenario's where this is untrue: * the buffer could have been completely unused (a buffer fresh from the constructor will cause {{InvalidMarkException}} here * the buffer might have been flipped, but not yet read. We're uncertain if the reset is needed, but if it is, we suggest to explicitly check if the mark has been set. If not, then we don't believe a reset is needed. {code:java|title=proposed fix for org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { if (buf.markValue() >= -1) { buf.reset(); } failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 12:03 PM: -- Although we're not exactly sure how, we've "resolved" the issue by modifying the snippet below: {code:java|title=Original code for org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} As said, this triggered {{InvalidMarkExceptions}}. After we added a simple {{try/catch}} around the reset, the CPU spikes go away. We have not investigated why this resolves the CPU spikes, but we assume that the thrown exception prevents keys from being consumed. We're still suffering from a different issue (all clients unexpectedly reconnect periodically), but we're now working on the assumption that the CPU spike is a result, not the cause, of this. Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that this is often called from exception handlers. With that, the state of the buffer that it's operating on, is likely unpredictable. The call to {{reset()}} assumes that a mark is set, but there are various scenario's where this is untrue: * the buffer could have been completely unused (a buffer fresh from the constructor will cause {{InvalidMarkException}} here * the buffer might have been flipped, but not yet read. We're uncertain if the reset is needed, but if it is, we suggest to explicitly check if the mark has been set. If not, then we don't believe a reset is needed. {code:java|title=proposed fix for org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { if (buf.markValue() >= -1) { buf.reset(); } failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} was (Author: guus.der.kinderen): Although we're not exactly sure how, we've "resolved" the issue by modifying the snippet below: {code:java|title=Original code for org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} As said, this triggered {{InvalidMarkExceptions}}. After we added a simple {{try/catch}} around the reset, the CPU spikes go away. We're still suffering from a different issue (all clients reconnecting periodically), but we're now working on the assumption that the CPU spike is a result, not the cause of this. Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that this is often called from exception handlers. With that, the state of the buffer that it's operating on, is likely unpredictable. The call to {{reset()}} assumes that a mark is set, but there are various scenario's where this is untrue: * the buffer could have been completely unused (a buffer fresh from the constructor will cause {{InvalidMarkException}} here * the buffer might have been flipped, but not yet read. We're uncertain if the reset is needed, but if it is, we suggest to explicitly check if the mark has been set. If not, then we don't believe a reset is needed. {code:title=proposed fix for org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue|java} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { if (buf.markValue() >= -1) { buf.reset(); } failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. >
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844762#comment-16844762 ] Guus der Kinderen commented on DIRMINA-: Although we're not exactly sure how, we've "resolved" the issue by modifying the snippet below: {code:java|title=Original code for org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} As said, this triggered {{InvalidMarkExceptions}}. After we added a simple {{try/catch}} around the reset, the CPU spikes go away. We're still suffering from a different issue (all clients reconnecting periodically), but we're now working on the assumption that the CPU spike is a result, not the cause of this. Looking further into the {{clearWriteRequestQueue}} snippet, we noticed that this is often called from exception handlers. With that, the state of the buffer that it's operating on, is likely unpredictable. The call to {{reset()}} assumes that a mark is set, but there are various scenario's where this is untrue: * the buffer could have been completely unused (a buffer fresh from the constructor will cause {{InvalidMarkException}} here * the buffer might have been flipped, but not yet read. We're uncertain if the reset is needed, but if it is, we suggest to explicitly check if the mark has been set. If not, then we don't believe a reset is needed. {code:title=proposed fix for org.apache.mina.core.polling.abstractpollingioprocessor.processor#clearwriterequestqueue|java} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { if (buf.markValue() >= -1) { buf.reset(); } failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 10:01 AM: -- The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{"Broken connection"}} or {{"Create a new selector. >Selected is 0, delta ="}} being logged, which suggests that the workaround for >the EPOLL / spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png! was (Author: guus.der.kinderen): The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected >is 0, delta =}} being logged, which suggests that the workaround for the EPOLL >/ spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png! > 100% CPU (epoll bug) on 2.1.x, Linux only > -
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844690#comment-16844690 ] Guus der Kinderen commented on DIRMINA-: {quote}Do you have a link to the OpenFire issue ?{quote} No, this is being reported in a non-public project that uses Openfire. > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687 ] Guus der Kinderen commented on DIRMINA-: The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected >is 0, delta =}} being logged, which suggests that the workaround for the EPOLL >/ spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png! > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 9:58 AM: - The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected >is 0, delta =}} being logged, which suggests that the workaround for the EPOLL >/ spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png! was (Author: guus.der.kinderen): The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected >is 0, delta =}} being logged, which suggests that the workaround for the EPOLL >/ spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png|thumbnail! > 100% CPU (epoll bug) on 2.1.x, Linux only >
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844687#comment-16844687 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 9:58 AM: - The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected >is 0, delta =}} being logged, which suggests that the workaround for the EPOLL >/ spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png|thumbnail! was (Author: guus.der.kinderen): The CPU cycles that get burned are reported to be "system", not "user" cycles. We've obtained a Call Tree graph from JProfiler, from which I included a screenshot (the entire thing is 18MB). Just under the section from which I created this screenshot, there's another 10.2% native CPU usage for {{java.lang.System.currentTimeMillis}} that's called from within {{org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.process}} >From what I can deduce, this looks like it iterates over this snippet from a >lot. We don't see {{Broken connection}} or "{{Create a new selector. Selected >is 0, delta =}} being logged, which suggests that the workaround for the EPOLL >/ spinning selector bug is not even being considered. {code:title=org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#run|java} /*(...snip..)*/ // This select has a timeout so that we can manage // idle session when we get out of the select every // second. (note : this is a hack to avoid creating // a dedicated thread). long t0 = System.currentTimeMillis(); int selected = select(SELECT_TIMEOUT); long t1 = System.currentTimeMillis(); long delta = t1 - t0; if (!wakeupCalled.getAndSet(false) && (selected == 0) && (delta < 100)) { // Last chance : the select() may have been // interrupted because we have had an closed channel. if (isBrokenConnection()) { LOG.warn("Broken connection"); } else { // Ok, we are hit by the nasty epoll // spinning. // Basically, there is a race condition // which causes a closing file descriptor not to be // considered as available as a selected channel, // but // it stopped the select. The next time we will // call select(), it will exit immediately for the // same // reason, and do so forever, consuming 100% // CPU. // We have to destroy the selector, and // register all the socket on a new one. if (nbTries == 0) { LOG.warn("Create a new selector. Selected is 0, delta = " + delta); registerNewSelector(); nbTries = 10; } else { nbTries--; } } } else { nbTries = 10; } /*(...snip..)*/ {code} !image-2019-05-21-11-37-41-931.png! > 100% CPU (epoll bug) on 2.1.x, Linux only >
[jira] [Updated] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-: --- Attachment: image-2019-05-21-11-37-41-931.png > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Attachments: image-2019-05-21-11-37-41-931.png > > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 8:50 AM: - The report claims that the CPU increase is combined with InvalidMarkException instances being thrown by MINA 2.1.2's implementation of {{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}} Specifically the {{InvalidMarkException}} is thrown when {{reset}} is called in this snippet: {code:java} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} was (Author: guus.der.kinderen): The report claims that the CPU increase is combined with InvalidMarkException instances being thrown by MINA 2.1.2's implementation of {{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}} Specifically, when {{reset}} is called in this snippet: {code:java} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639 ] Guus der Kinderen edited comment on DIRMINA- at 5/21/19 8:49 AM: - The report claims that the CPU increase is combined with InvalidMarkException instances being thrown by MINA 2.1.2's implementation of {{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}} Specifically, when {{reset}} is called in this snippet: {code:java} // The first unwritten empty buffer must be // forwarded to the filter chain. if (buf.hasRemaining()) { buf.reset(); failedRequests.add(req); } else { IoFilterChain filterChain = session.getFilterChain(); filterChain.fireMessageSent(req); }{code} was (Author: guus.der.kinderen): The report claims that the CPU increase is combined with InvalidMarkException instances being thrown by MINA 2.1.2's implementation of {{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}} > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844639#comment-16844639 ] Guus der Kinderen commented on DIRMINA-: The report claims that the CPU increase is combined with InvalidMarkException instances being thrown by MINA 2.1.2's implementation of {{org.apache.mina.core.polling.AbstractPollingIoProcessor.Processor#clearWriteRequestQueue}} > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844614#comment-16844614 ] Guus der Kinderen commented on DIRMINA-: For what it's worth, I've received another report (unrelated to the first one) that appears to be related to this issue. This new environment recently upgraded to MINA 2.1.2 too. I'm working on obtaining the profiling information. > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844606#comment-16844606 ] Guus der Kinderen commented on DIRMINA-1107: We've replaced our initial fix with what Jonathan provided. His fix appears to prevent the original issue from occurring. We're happy to close this issue with that fix! Thanks! > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x, Linux only
[ https://issues.apache.org/jira/browse/DIRMINA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-: --- Summary: 100% CPU (epoll bug) on 2.1.x, Linux only (was: 100% CPU (epoll bug) on 2.1.x) > 100% CPU (epoll bug) on 2.1.x, Linux only > - > > Key: DIRMINA- > URL: https://issues.apache.org/jira/browse/DIRMINA- > Project: MINA > Issue Type: Improvement >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > > We're getting reports > [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] > of a bug that causes 100% CPU usage on Linux (the problem does not occur on > Windows). > This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. > Is this a regression of the epoll selector bug in DIRMINA-678 ? > A stack trace of one of the spinning threads: > {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a > runnable [0x7f0476dee000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x0004c486b748> (a sun.nio.ch.Util$3) > - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) > - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) > at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) > at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x0004c4d03530> (a > java.util.concurrent.ThreadPoolExecutor$Worker){code} > More info is available at > https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DIRMINA-1111) 100% CPU (epoll bug) on 2.1.x
Guus der Kinderen created DIRMINA-: -- Summary: 100% CPU (epoll bug) on 2.1.x Key: DIRMINA- URL: https://issues.apache.org/jira/browse/DIRMINA- Project: MINA Issue Type: Improvement Affects Versions: 2.1.2 Reporter: Guus der Kinderen We're getting reports [reports|https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13] of a bug that causes 100% CPU usage on Linux (the problem does not occur on Windows). This problem occurs in 2.1.2. but does _not_ occur in 2.0.21. Is this a regression of the epoll selector bug in DIRMINA-678 ? A stack trace of one of the spinning threads: {code}"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x7f0408002000 nid=0x2a6a runnable [0x7f0476dee000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x0004c486b748> (a sun.nio.ch.Util$3) - locked <0x0004c486b738> (a java.util.Collections$UnmodifiableSet) - locked <0x0004c420ccb0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112) at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x0004c4d03530> (a java.util.concurrent.ThreadPoolExecutor$Worker){code} More info is available at https://discourse.igniterealtime.org/t/openfire-4-3-2-cpu-goes-at-100-after-a-few-hours-on-linux/85119/13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841103#comment-16841103 ] Guus der Kinderen commented on DIRMINA-1107: \o/ For what it's worth: since I wrote that comment, we have successfully applied the solution that I've outlined in https://issues.apache.org/jira/browse/DIRMINA-1107?focusedCommentId=16838709=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16838709 > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841084#comment-16841084 ] Guus der Kinderen commented on DIRMINA-1107: I'd love to see a solution for the 2.1 branch, even if that's a patch of the existing, sub-par, implementation. We're currently suffering from issues (which prompted me to create this issue), which we'd like to fix before/without upgrading to a new major release of MINA. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DIRMINA-1110) Ordered Executors concurrency
Guus der Kinderen created DIRMINA-1110: -- Summary: Ordered Executors concurrency Key: DIRMINA-1110 URL: https://issues.apache.org/jira/browse/DIRMINA-1110 Project: MINA Issue Type: Improvement Affects Versions: 2.1.2 Reporter: Guus der Kinderen Attachments: mina-ordered-executors.patch MINA contains two ThreadPoolExecutor implementations that maintains the order of IoEvents per session: * OrderedThreadPoolExecutor * PriorityThreadPoolExecutor This issue applies to both. A private class called {{SessionTasksQueue}} (confusingly using different names in both implementations, which is an undesired side-effect having code duplication) is used to represent the ordered collection of events to be processed for each individual session. It also contains a boolean that represents the state of the queue prior to addition of the task: 'true' if the collection was empty ("processing complete"), otherwise 'false'. This queue is stored as an attribute on the session. An {{IoEvent}} is _scheduled_ for execution by being passed to the {{execute}} method. "Scheduling" an {{IoEvent}} involves identifying the session that it belongs to, and placing it in its SessionTaskQueue. Finally, the session itself is, conditionally (more in this later), placed in a queue named {{waitingSessions}}. The {{IoEvent}} execution itself is implemented by {{Runnable}} implementations called {{Worker}}. These workers take their workload from the {{waitingSessions}} queue, doing blocking polls. The placing of the Session on the {{waitingSessions}} queue depends on the state of the {{SessionTasksQueue}}. If it was empty ("processingComplete"), the session is placed on the {{waitingSessions}} queue. There is comment that describes this as follows: {quote}As the tasksQueue was empty, the task has been executed immediately, so we can move the session to the queue of sessions waiting for completion.{quote} As an aside: I believe this comment to be misleading: there's no guarantee that the task has actually been executed immediately, as workers might still be working on other threads. On top of that, as both the production on, and consumption of that queue is guarded by the same mutex, I think it's quite impossible that the task has already been executed. A more proper comment would be: {quote}Processing of the tasks queue of this session is currently not scheduled or underway. As new tasks have now been added, the session needs to be offered for processing.{quote} The determination if the session needs to be offered up for processing is based on an evaluation of the state of the {{sessionTasksQueue}} that happens under a mutex. The actual offer of the session for processing (adding the session to the {{waitingSessions}} queue, is not. We believe, but have not been able to conclusively prove, that this can lead to concurrency issues, where a session might not be queued, even though it has tasks to be executed. Nonetheless, we suggest moving the addition of the session to {{waitingSessions}} into the guarded code block. At the very least, this reduces code complexity. The patch attached to this issue applies this change. It also changes the name of the Session tasks queue in {{PriorityThreadPoolExecutor}}, to match the name used in {{OrderedThreadPoolExecutor}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1108) ThreadPoolExecutors should terminate faster
[ https://issues.apache.org/jira/browse/DIRMINA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840436#comment-16840436 ] Guus der Kinderen commented on DIRMINA-1108: {{shutdownNow}} would attempt to stop all executing tasks (at least, that's what its contract says - unsure if that's actually implemented in the MINA implementation that way). That's not what I want for the particular use-case I had in mind. Ideally, termination should be faster, which would be a minor improvement. > ThreadPoolExecutors should terminate faster > --- > > Key: DIRMINA-1108 > URL: https://issues.apache.org/jira/browse/DIRMINA-1108 > Project: MINA > Issue Type: Improvement >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Minor > > I've observed that the termination of > {{org.apache.mina.filter.executor.PriorityThreadPoolExecutor}} takes another > 30 seconds after the last task was executed. More specifically, it takes 30 > seconds "to long" before > {{org.apache.mina.filter.executor.PriorityThreadPoolExecutor#awaitTermination}} > returns {{true}}. > It is likely that this issue also applies to > {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}}, and maybe > {{org.apache.mina.filter.executor.UnorderedThreadPoolExecutor}}, as a lot of > code is shared between these implementations. I did not explicitly verify > these implementations though. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DIRMINA-1109) Improve PriorityThreadPoolExecutor unit test
Guus der Kinderen created DIRMINA-1109: -- Summary: Improve PriorityThreadPoolExecutor unit test Key: DIRMINA-1109 URL: https://issues.apache.org/jira/browse/DIRMINA-1109 Project: MINA Issue Type: Improvement Reporter: Guus der Kinderen Attachments: PriorityThreadPoolExecutorTest.java The original test was authored by me. The test should be improved to: * include a non-single-thread executor thread pool. * take into account that once a task queue for a session has already been picked up by a worker thread, any 'prioritized' session tasks will still have to wait for the work that's already underway to be finished (the original code does not allow for this, hence it only works with a single thread). * should take considerably less time to execute (currently runs for ~34 seconds - see DIRMINA-1108) * be self-validating. Also, much of the formatting appears to have been lost when my original code was merged. That could be improved upon too. As the formatting changes changed ~80% of all lines, I've not bothered to create a diff of my work. Instead, I'm supplying the java file in its entirety in an attachment to this JIRA issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DIRMINA-1108) ThreadPoolExecutors should terminate faster
Guus der Kinderen created DIRMINA-1108: -- Summary: ThreadPoolExecutors should terminate faster Key: DIRMINA-1108 URL: https://issues.apache.org/jira/browse/DIRMINA-1108 Project: MINA Issue Type: Improvement Affects Versions: 2.1.2 Reporter: Guus der Kinderen I've observed that the termination of {{org.apache.mina.filter.executor.PriorityThreadPoolExecutor}} takes another 30 seconds after the last task was executed. More specifically, it takes 30 seconds "to long" before {{org.apache.mina.filter.executor.PriorityThreadPoolExecutor#awaitTermination}} returns {{true}}. It is likely that this issue also applies to {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}}, and maybe {{org.apache.mina.filter.executor.UnorderedThreadPoolExecutor}}, as a lot of code is shared between these implementations. I did not explicitly verify these implementations though. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839151#comment-16839151 ] Guus der Kinderen commented on DIRMINA-1107: [~elecharny] wrote {quote} The flushScheduleEvents() method, which is called by the startTls, stopTls}}, messageReceived, filterWrite, filterClose and initiateHandshake methods, should first propagate the received messages - because it's synchronous - and then process all the messages to be written, in a protected section.{quote} I do not understand what you're trying to say. I don't think we can predict if the thread calling {{flushScheduledEvents}} is the single thread that processes incoming messages, or if it's a thread (potentially from a thread pool associated to an Executor) that is processing outgoing messages. Even if the outgoing messages would be guaranteed to be processed by a single thread, we'd still have to guard against the inbound and outbound thread calling {{flushScheduledEvents}} at the same time, right? > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838709#comment-16838709 ] Guus der Kinderen edited comment on DIRMINA-1107 at 5/13/19 5:03 PM: - Thanks for all the feedback! For what it's worth: we're currently experimenting with a code change that intends to: * Ensure that queues are emptied at least as often as {{flushScheduledEvents}} is invoked (which should prevent that events remain in the queue indefinitely after the flush has been requested - we can't be sure that another flush is requested in a timely manner). * Not block any threads while one thread is operating on the filters (in other words: continue to allow for the 'tryLock' if statement, to prevent worker threads from being blocked). We are attempting to do this by introducing a lock that is released, atomically, only after it has been "used" as often as the lock has attempted to be acquired. Our work-in-progress code: {code:java}private class CountReentrantLock { private int i = 0; private final ReentrantLock lock = new ReentrantLock(); public synchronized boolean tryAcquireLock() { i++; return lock.tryLock(); } public synchronized boolean tryUnlock() { i--; if(i<=0) { i=0; lock.unlock(); return true; } return false; } public synchronized void forceUnlock() { i = 0; lock.unlock(); } } {code} This then could be used like this: {code:java} void flushScheduledEvents() { if (sslLock.tryAcquireLock()) { try { do { while ((event = filterWriteEventQueue.poll()) != null) { // ... } while ((event = messageReceivedEventQueue.poll()) != null){ // ... } } while (!filterWriteEventQueue.isEmpty() || !messageReceivedEventQueue.isEmpty() || !sslLock.tryUnlock()); } catch( Throwable t) { sslLock.forceUnlock(); } } }{code} We've not tested this code yet. A concern that we haven't thought through yet is re-entry of a thread that already owns the lock. was (Author: guus.der.kinderen): Thanks for all the feedback! For what it's worth: we're currently experimenting with a code change that intends to: * Ensure that queues are emptied at least as often as {{flushScheduledEvents}} is invoked (which should prevent that events remain in the queue indefinitely after the flush has been requested - we can't be sure that another flush is requested in a timely manner). * Not block any threads (continue to allow for the 'tryLock' if statement, to prevent worker threads from being blocked). We are attempting to do this by introducing a lock that is released, atomically, only after it has been "used" as often as the lock has attempted to be acquired. Our work-in-progress code: {code:java}private class CountReentrantLock { private int i = 0; private final ReentrantLock lock = new ReentrantLock(); public synchronized boolean tryAcquireLock() { i++; return lock.tryLock(); } public synchronized boolean tryUnlock() { i--; if(i<=0) { i=0; lock.unlock(); return true; } return false; } public synchronized void forceUnlock() { i = 0; lock.unlock(); } } {code} This then could be used like this: {code:java} void flushScheduledEvents() { if (sslLock.tryAcquireLock()) { try { do { while ((event = filterWriteEventQueue.poll()) != null) { // ... } while ((event = messageReceivedEventQueue.poll()) != null){ // ... } } while (!filterWriteEventQueue.isEmpty() || !messageReceivedEventQueue.isEmpty() || !sslLock.tryUnlock()); } catch( Throwable t) { sslLock.forceUnlock(); } } }{code} We've not tested this code yet. A concern that we haven't thought through yet is re-entry of a thread that already owns the lock. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. >
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838709#comment-16838709 ] Guus der Kinderen commented on DIRMINA-1107: Thanks for all the feedback! For what it's worth: we're currently experimenting with a code change that intends to: * Ensure that queues are emptied at least as often as {{flushScheduledEvents}} is invoked (which should prevent that events remain in the queue indefinitely after the flush has been requested - we can't be sure that another flush is requested in a timely manner). * Not block any threads (continue to allow for the 'tryLock' if statement, to prevent worker threads from being blocked). We are attempting to do this by introducing a lock that is released, atomically, only after it has been "used" as often as the lock has attempted to be acquired. Our work-in-progress code: {code:java}private class CountReentrantLock { private int i = 0; private final ReentrantLock lock = new ReentrantLock(); public synchronized boolean tryAcquireLock() { i++; return lock.tryLock(); } public synchronized boolean tryUnlock() { i--; if(i<=0) { i=0; lock.unlock(); return true; } return false; } public synchronized void forceUnlock() { i = 0; lock.unlock(); } } {code} This then could be used like this: {code:java} void flushScheduledEvents() { if (sslLock.tryAcquireLock()) { try { do { while ((event = filterWriteEventQueue.poll()) != null) { // ... } while ((event = messageReceivedEventQueue.poll()) != null){ // ... } } while (!filterWriteEventQueue.isEmpty() || !messageReceivedEventQueue.isEmpty() || !sslLock.tryUnlock()); } catch( Throwable t) { sslLock.forceUnlock(); } } }{code} We've not tested this code yet. A concern that we haven't thought through yet is re-entry of a thread that already owns the lock. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > //
[jira] [Comment Edited] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838662#comment-16838662 ] Guus der Kinderen edited comment on DIRMINA-1107 at 5/13/19 4:15 PM: - We've found the issue in version 2.1.2, but didn't attempt to test against other versions. Placing the counter inside the protected code block would open the possibility for an invocation of {{flushScheduledEvents}} that does nothing (fails to acquire the lock), while there _are_ events in the queue. This would occur if the first thread just exited the outer 'do/while' loop, but before it released the lock in the 'finally' block, when the second thread tries to acquire the lock. was (Author: guus.der.kinderen): We've found the issue in version 2.1.2. Placing the counter inside the protected code block would open the possibility for an invocation of {{flushScheduledEvents}} that does nothing (fails to acquire the lock), while there _are_ events in the queue. This would occur if the first thread just exited the outer 'do/while' loop, but before it released the lock in the 'finally' block, when the second thread tries to acquire the lock. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3, 2.0.23 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838662#comment-16838662 ] Guus der Kinderen edited comment on DIRMINA-1107 at 5/13/19 4:13 PM: - We've found the issue in version 2.1.2. Placing the counter inside the protected code block would open the possibility for an invocation of {{flushScheduledEvents}} that does nothing (fails to acquire the lock), while there _are_ events in the queue. This would occur if the first thread just exited the outer 'do/while' loop, but before it released the lock in the 'finally' block, when the second thread tries to acquire the lock. was (Author: guus.der.kinderen): We've found the issue in version 2.1.2. Placing the counter inside the protected code block would open the possibility for an invocation of {{flushScheudledEvents}} that does nothing (fails to acquire the lock), while there _are_ events in the queue. This would occur if the first thread just exited the outer 'do/while' loop, but before it released the lock in the 'finally' block, when the second thread tries to acquire the lock. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3, 2.0.23 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
[ https://issues.apache.org/jira/browse/DIRMINA-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838662#comment-16838662 ] Guus der Kinderen commented on DIRMINA-1107: We've found the issue in version 2.1.2. Placing the counter inside the protected code block would open the possibility for an invocation of {{flushScheudledEvents}} that does nothing (fails to acquire the lock), while there _are_ events in the queue. This would occur if the first thread just exited the outer 'do/while' loop, but before it released the lock in the 'finally' block, when the second thread tries to acquire the lock. > SslHandler flushScheduledEvents race condition, redux > - > > Key: DIRMINA-1107 > URL: https://issues.apache.org/jira/browse/DIRMINA-1107 > Project: MINA > Issue Type: Bug >Reporter: Guus der Kinderen >Priority: Major > Fix For: 2.1.3, 2.0.23 > > > DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally > replaces it with another multithreading issue. > The fix for DIRMINA-1019 introduces a counter that contains the number of > events to be processed. A simplified version of the code is included below. > {code:java} > private final AtomicInteger scheduledEvents = new AtomicInteger(0); > void flushScheduledEvents() { > scheduledEvents.incrementAndGet(); > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (scheduledEvents.decrementAndGet() > 0); > } finally { > sslLock.unlock(); > } > } > }{code} > We have observed occasions where the value of {{scheduledEvents}} becomes a > negative value, while at the same time {{filterWriteEventQueue}} go > unprocessed. > We suspect that this issue is triggered by a concurrency issue caused by the > first thread decrementing the counter after a second thread incremented it, > but before it attempted to acquire the lock. > This allows the the first thread to empty the queues, decrementing the > counter to zero and release the lock, after which the second thread acquires > the lock successfully. Now, the second thread processes any elements in > {{filterWriteEventQueue}}, and then processes any elements in > {{messageReceivedEventQueue}}. If in between these two checks yet another > thread adds a new element to {{filterWriteEventQueue}}, this element can go > unprocessed (as the second thread does not loop, since the counter is zero or > negative, and the third thread can fail to acquire the lock). > It's a seemingly unlikely scenario, but we are observing the behavior when > our systems are under high load. > We've applied a code change after which this problem is no longer observed. > We've removed the counter, and check on the size of the queues instead: > {code:java} > void flushScheduledEvents() { > if (sslLock.tryLock()) { > try { > do { > while ((event = filterWriteEventQueue.poll()) != null) { > // ... > } > > while ((event = messageReceivedEventQueue.poll()) != null){ > // ... > } > } while (!filterWriteEventQueue.isEmpty() || > !messageReceivedEventQueue.isEmpty()); > } finally { > sslLock.unlock(); > } > } > }{code} > This code change, as illustrated above, does introduce a new potential > problem. Theoretically, an event could be added to the queues and > {{flushScheduledEvents}} be called returning {{false}} for > {{sslLock.tryLock()}}, exactly after another thread just finished the > {{while}} loop, but before releasing the lock. This again would cause events > to go unprocessed. > We've not observed this problem in the wild yet, but we're uncomfortable > applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DIRMINA-1107) SslHandler flushScheduledEvents race condition, redux
Guus der Kinderen created DIRMINA-1107: -- Summary: SslHandler flushScheduledEvents race condition, redux Key: DIRMINA-1107 URL: https://issues.apache.org/jira/browse/DIRMINA-1107 Project: MINA Issue Type: Bug Reporter: Guus der Kinderen DIRMINA-1019 addresses a race condition in SslHandler, but unintentionally replaces it with another multithreading issue. The fix for DIRMINA-1019 introduces a counter that contains the number of events to be processed. A simplified version of the code is included below. {code:java} private final AtomicInteger scheduledEvents = new AtomicInteger(0); void flushScheduledEvents() { scheduledEvents.incrementAndGet(); if (sslLock.tryLock()) { try { do { while ((event = filterWriteEventQueue.poll()) != null) { // ... } while ((event = messageReceivedEventQueue.poll()) != null){ // ... } } while (scheduledEvents.decrementAndGet() > 0); } finally { sslLock.unlock(); } } }{code} We have observed occasions where the value of {{scheduledEvents}} becomes a negative value, while at the same time {{filterWriteEventQueue}} go unprocessed. We suspect that this issue is triggered by a concurrency issue caused by the first thread decrementing the counter after a second thread incremented it, but before it attempted to acquire the lock. This allows the the first thread to empty the queues, decrementing the counter to zero and release the lock, after which the second thread acquires the lock successfully. Now, the second thread processes any elements in {{filterWriteEventQueue}}, and then processes any elements in {{messageReceivedEventQueue}}. If in between these two checks yet another thread adds a new element to {{filterWriteEventQueue}}, this element can go unprocessed (as the second thread does not loop, since the counter is zero or negative, and the third thread can fail to acquire the lock). It's a seemingly unlikely scenario, but we are observing the behavior when our systems are under high load. We've applied a code change after which this problem is no longer observed. We've removed the counter, and check on the size of the queues instead: {code:java} void flushScheduledEvents() { if (sslLock.tryLock()) { try { do { while ((event = filterWriteEventQueue.poll()) != null) { // ... } while ((event = messageReceivedEventQueue.poll()) != null){ // ... } } while (!filterWriteEventQueue.isEmpty() || !messageReceivedEventQueue.isEmpty()); } finally { sslLock.unlock(); } } }{code} This code change, as illustrated above, does introduce a new potential problem. Theoretically, an event could be added to the queues and {{flushScheduledEvents}} be called returning {{false}} for {{sslLock.tryLock()}}, exactly after another thread just finished the {{while}} loop, but before releasing the lock. This again would cause events to go unprocessed. We've not observed this problem in the wild yet, but we're uncomfortable applying this change as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1101) InvalidMarkException on session.write when using CompressionFilter.
[ https://issues.apache.org/jira/browse/DIRMINA-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798213#comment-16798213 ] Guus der Kinderen commented on DIRMINA-1101: This issue is also being discussed in the mailing list thread that starts here: [http://mail-archives.apache.org/mod_mbox/mina-dev/201903.mbox/%3CCA+GjePVToJ1hzFDTy50BwpEY1AFjPea+ETKe=leoohbwfre...@mail.gmail.com%3E] > InvalidMarkException on session.write when using CompressionFilter. > --- > > Key: DIRMINA-1101 > URL: https://issues.apache.org/jira/browse/DIRMINA-1101 > Project: MINA > Issue Type: Bug >Affects Versions: 2.0.20 >Reporter: Jörg Michelberger >Priority: Major > > I'm updated from a MINA 2.0.7 to the 2.0.20 and am a user of > CompressionFilter. Writing of Messages fails with a InvalidMarkException. > Reproducible Test is: > * Copy MINA Core Test org.apache.mina.core.service.AbstractIoServiceTest to > MINA Compression Filter org.apache.mina.filter.compression Test Packages > package. > * Adapt package statement to org.apache.mina.filter.compression. > * Add Compression to acceptor and connector > * acceptor.getFilterChain().addLast("compression", new > CompressionFilter()); > acceptor.getFilterChain().addLast("logger", new LoggingFilter()); > acceptor.getFilterChain().addLast("codec", > new ProtocolCodecFilter(new > TextLineCodecFactory(StandardCharsets.UTF_8))); > * connector.getFilterChain().addLast("compression", new > CompressionFilter()); > connector.getFilterChain().addLast("logger", new LoggingFilter()); > connector.getFilterChain().addLast("codec", > new ProtocolCodecFilter(new > TextLineCodecFactory(StandardCharsets.UTF_8))); > * Set a Breakpoint to java.nio.Buffer.reset() method, where the > InvalidMarkException is thrown. > * Run Debug Testfile on > org.apache.mina.filter.compression.AbstractIoServiceTest > After the Exception the session is immediatelly scheduled for disconnect. > It seems that there is a discrepancy between the mark() and reset() calls on > the underlaying Buffer. In case of compression, a Buffer with the compressed > content is created and is wrapped with the original Buffer in a > FilteredWriteRequest, because CompressionFilter is a WriteRequestFilter. This > is in WriteRequestFilter.filterWrite() > In DefaultIoFilterChain$HeadFilter.filterWrite() is then the mark() call, > which is done on the compressed Buffer. > In AbstractPollingIoProcessor.writeBuffer() is the reset() call, which is in > this case done on the original Buffer, leading to the Exception. > It seems that the change at date 16.02.2016 > SHA-1: 44b58469f84ce991074cdc187b1c1f23b94cf445 > * Don't try to reset a message when it's not a IoBuffer > which reassignes the buf before reset() is called broke it. The buf before > reassign looks much better as the right to reset() in this case. > > {{java.nio.InvalidMarkException}} > {{ at java.nio.Buffer.reset(Buffer.java:306)}} > {{ at > org.apache.mina.core.buffer.AbstractIoBuffer.reset(AbstractIoBuffer.java:425)}} > {{ at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.writeBuffer(AbstractPollingIoProcessor.java:1131)}} > {{ at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.flushNow(AbstractPollingIoProcessor.java:994)}} > {{ at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.flush(AbstractPollingIoProcessor.java:921)}} > {{ at > org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:688)}} > {{ at > org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)}} > {{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}} > {{ at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}} > {{ at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562094#comment-16562094 ] Guus der Kinderen commented on DIRMINA-1078: Awesome, thanks! > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Assignee: Jonathan Valliere >Priority: Minor > Attachments: 20180216.patch, 20180706.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555608#comment-16555608 ] Guus der Kinderen commented on DIRMINA-1078: Thanks! Please note that for Java 10, the changes applied to resolve DIRMINA-1088 should also be applied to this new class. > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Assignee: Jonathan Valliere >Priority: Minor > Attachments: 20180216.patch, 20180706.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10
[ https://issues.apache.org/jira/browse/DIRMINA-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555606#comment-16555606 ] Guus der Kinderen commented on DIRMINA-1088: [~elecharny], would you mind applying the same change to https://git1-us-west.apache.org/repos/asf?p=mina.git;a=blob;f=mina-core/src/main/java/org/apache/mina/filter/executor/PriorityThreadPoolExecutor.java;h=dd6b6e140f2ed953efbb291a0410a7a37f691aa0;hb=56ca189e1b2de1b6d9ab3635dd8f331f13762009 which got introduced as part of DIRMINA-1078? > OrderedThreadPool implementation should be compatible with Java 10 > -- > > Key: DIRMINA-1088 > URL: https://issues.apache.org/jira/browse/DIRMINA-1088 > Project: MINA > Issue Type: Bug > Components: Core >Reporter: Guus der Kinderen >Priority: Major > Attachments: orderedthreadpool.patch > > > {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from > {{java.util.concurrent.ThreadPoolExecutor}} > OrderedThreadPoolExecutor, in its constructor, calls these two methods from > its parent to determine pool sizing: > {code:java} > super.setCorePoolSize(corePoolSize); > super.setMaximumPoolSize(maximumPoolSize);{code} > This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), > an additional input validation was added to > {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}} > ThreadPoolExecutor Java 8: > {code:java} > public void setCorePoolSize(int corePoolSize) { > if (corePoolSize < 0) > throw new IllegalArgumentException(); > public void setMaximumPoolSize(int maximumPoolSize) { > if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > {code} > ThreadPoolExecutor Java 10: > {code:java} > public void setCorePoolSize(int corePoolSize) { > if (corePoolSize < 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > public void setMaximumPoolSize(int maximumPoolSize) { > if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > {code} > As a result, the first line of this part of the constructor of > OrderedThreadPoolExecutor now throws an IllegalArgumentException. > {code:java} > super.setCorePoolSize(corePoolSize); > super.setMaximumPoolSize(maximumPoolSize);{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-1078: --- Attachment: 20180706.patch > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Assignee: Jonathan Valliere >Priority: Minor > Attachments: 20180216.patch, 20180706.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10
[ https://issues.apache.org/jira/browse/DIRMINA-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527732#comment-16527732 ] Guus der Kinderen commented on DIRMINA-1088: I think the problem is resolved by setting Max before Core. That should have negligible side-effects otherwise. I've attached a patch to this issue. > OrderedThreadPool implementation should be compatible with Java 10 > -- > > Key: DIRMINA-1088 > URL: https://issues.apache.org/jira/browse/DIRMINA-1088 > Project: MINA > Issue Type: Bug > Components: Core >Reporter: Guus der Kinderen >Priority: Major > Attachments: orderedthreadpool.patch > > > {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from > {{java.util.concurrent.ThreadPoolExecutor}} > OrderedThreadPoolExecutor, in its constructor, calls these two methods from > its parent to determine pool sizing: > {code:java} > super.setCorePoolSize(corePoolSize); > super.setMaximumPoolSize(maximumPoolSize);{code} > This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), > an additional input validation was added to > {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}} > ThreadPoolExecutor Java 8: > {code:java} > public void setCorePoolSize(int corePoolSize) { > if (corePoolSize < 0) > throw new IllegalArgumentException(); > public void setMaximumPoolSize(int maximumPoolSize) { > if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > {code} > ThreadPoolExecutor Java 10: > {code:java} > public void setCorePoolSize(int corePoolSize) { > if (corePoolSize < 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > public void setMaximumPoolSize(int maximumPoolSize) { > if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > {code} > As a result, the first line of this part of the constructor of > OrderedThreadPoolExecutor now throws an IllegalArgumentException. > {code:java} > super.setCorePoolSize(corePoolSize); > super.setMaximumPoolSize(maximumPoolSize);{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10
[ https://issues.apache.org/jira/browse/DIRMINA-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-1088: --- Attachment: orderedthreadpool.patch > OrderedThreadPool implementation should be compatible with Java 10 > -- > > Key: DIRMINA-1088 > URL: https://issues.apache.org/jira/browse/DIRMINA-1088 > Project: MINA > Issue Type: Bug > Components: Core >Reporter: Guus der Kinderen >Priority: Major > Attachments: orderedthreadpool.patch > > > {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from > {{java.util.concurrent.ThreadPoolExecutor}} > OrderedThreadPoolExecutor, in its constructor, calls these two methods from > its parent to determine pool sizing: > {code:java} > super.setCorePoolSize(corePoolSize); > super.setMaximumPoolSize(maximumPoolSize);{code} > This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), > an additional input validation was added to > {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}} > ThreadPoolExecutor Java 8: > {code:java} > public void setCorePoolSize(int corePoolSize) { > if (corePoolSize < 0) > throw new IllegalArgumentException(); > public void setMaximumPoolSize(int maximumPoolSize) { > if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > {code} > ThreadPoolExecutor Java 10: > {code:java} > public void setCorePoolSize(int corePoolSize) { > if (corePoolSize < 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > public void setMaximumPoolSize(int maximumPoolSize) { > if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > {code} > As a result, the first line of this part of the constructor of > OrderedThreadPoolExecutor now throws an IllegalArgumentException. > {code:java} > super.setCorePoolSize(corePoolSize); > super.setMaximumPoolSize(maximumPoolSize);{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10
[ https://issues.apache.org/jira/browse/DIRMINA-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-1088: --- Description: {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from {{java.util.concurrent.ThreadPoolExecutor}} OrderedThreadPoolExecutor, in its constructor, calls these two methods from its parent to determine pool sizing: {code:java} super.setCorePoolSize(corePoolSize); super.setMaximumPoolSize(maximumPoolSize);{code} This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), an additional input validation was added to {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}} ThreadPoolExecutor Java 8: {code:java} public void setCorePoolSize(int corePoolSize) { if (corePoolSize < 0) throw new IllegalArgumentException(); public void setMaximumPoolSize(int maximumPoolSize) { if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); {code} ThreadPoolExecutor Java 10: {code:java} public void setCorePoolSize(int corePoolSize) { if (corePoolSize < 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); public void setMaximumPoolSize(int maximumPoolSize) { if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); {code} As a result, the first line of this part of the constructor of OrderedThreadPoolExecutor now throws an IllegalArgumentException. {code:java} super.setCorePoolSize(corePoolSize); super.setMaximumPoolSize(maximumPoolSize);{code} was: {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from {{java.util.concurrent.ThreadPoolExecutor}} OrderedThreadPool, in its constructor, calls these two methods from its parent to determine pool sizing: {code:java} super.setCorePoolSize(corePoolSize); super.setMaximumPoolSize(maximumPoolSize);{code} This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), an additional input validation was added to {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}} ThreadPoolExecutor Java 8: {code:java} public void setCorePoolSize(int corePoolSize) { if (corePoolSize < 0) throw new IllegalArgumentException(); public void setMaximumPoolSize(int maximumPoolSize) { if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); {code} ThreadPoolExecutor Java 10: {code:java} public void setCorePoolSize(int corePoolSize) { if (corePoolSize < 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); public void setMaximumPoolSize(int maximumPoolSize) { if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); {code} As a result, the first line of this part of the constructor of OrderedThreadPool now throws an IllegalArgumentException. {code:java} super.setCorePoolSize(corePoolSize); super.setMaximumPoolSize(maximumPoolSize);{code} > OrderedThreadPool implementation should be compatible with Java 10 > -- > > Key: DIRMINA-1088 > URL: https://issues.apache.org/jira/browse/DIRMINA-1088 > Project: MINA > Issue Type: Bug > Components: Core >Reporter: Guus der Kinderen >Priority: Major > > {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from > {{java.util.concurrent.ThreadPoolExecutor}} > OrderedThreadPoolExecutor, in its constructor, calls these two methods from > its parent to determine pool sizing: > {code:java} > super.setCorePoolSize(corePoolSize); > super.setMaximumPoolSize(maximumPoolSize);{code} > This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), > an additional input validation was added to > {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}} > ThreadPoolExecutor Java 8: > {code:java} > public void setCorePoolSize(int corePoolSize) { > if (corePoolSize < 0) > throw new IllegalArgumentException(); > public void setMaximumPoolSize(int maximumPoolSize) { > if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > {code} > ThreadPoolExecutor Java 10: > {code:java} > public void setCorePoolSize(int corePoolSize) { > if (corePoolSize < 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > public void setMaximumPoolSize(int maximumPoolSize) { > if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) > throw new IllegalArgumentException(); > {code} > As a
[jira] [Created] (DIRMINA-1088) OrderedThreadPool implementation should be compatible with Java 10
Guus der Kinderen created DIRMINA-1088: -- Summary: OrderedThreadPool implementation should be compatible with Java 10 Key: DIRMINA-1088 URL: https://issues.apache.org/jira/browse/DIRMINA-1088 Project: MINA Issue Type: Bug Components: Core Reporter: Guus der Kinderen {{org.apache.mina.filter.executor.OrderedThreadPoolExecutor}} inherits from {{java.util.concurrent.ThreadPoolExecutor}} OrderedThreadPool, in its constructor, calls these two methods from its parent to determine pool sizing: {code:java} super.setCorePoolSize(corePoolSize); super.setMaximumPoolSize(maximumPoolSize);{code} This works fine up until Java 8. In Java 10 (possibly 9 - I did not check), an additional input validation was added to {{ThreadPoolExecutor#setCorePoolSize}}: {{maximumPoolSize < corePoolSize}} ThreadPoolExecutor Java 8: {code:java} public void setCorePoolSize(int corePoolSize) { if (corePoolSize < 0) throw new IllegalArgumentException(); public void setMaximumPoolSize(int maximumPoolSize) { if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); {code} ThreadPoolExecutor Java 10: {code:java} public void setCorePoolSize(int corePoolSize) { if (corePoolSize < 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); public void setMaximumPoolSize(int maximumPoolSize) { if (maximumPoolSize <= 0 || maximumPoolSize < corePoolSize) throw new IllegalArgumentException(); {code} As a result, the first line of this part of the constructor of OrderedThreadPool now throws an IllegalArgumentException. {code:java} super.setCorePoolSize(corePoolSize); super.setMaximumPoolSize(maximumPoolSize);{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527330#comment-16527330 ] Guus der Kinderen commented on DIRMINA-1078: No worries, thanks for the update. > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Priority: Minor > Attachments: 20180216.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500061#comment-16500061 ] Guus der Kinderen commented on DIRMINA-1078: [~johnnyv], [~elecharny], is the current state of the patches that I provided to your liking? > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Priority: Minor > Attachments: 20180216.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445808#comment-16445808 ] Guus der Kinderen commented on DIRMINA-1078: Apologies for the long wait, gentlemen. I have now: * separated my implementation from the existing implementation, creating a new class * added a couple of unit tests. This effectively created two new files, without changes to other files. The patch can be downloaded here: https://github.com/guusdk/mina/compare/2.0...feature/prioritised-ordered-thread-pool-executor_standalone.patch > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Priority: Minor > Attachments: 20180216.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369191#comment-16369191 ] Guus der Kinderen commented on DIRMINA-1078: I'm sorry, I don't understand the request. Change the name to what? > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Priority: Minor > Attachments: 20180216.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368890#comment-16368890 ] Guus der Kinderen commented on DIRMINA-1078: Thanks for the feedback guys! I initially had this as a separate {{Executor}}, exactly for the reasons that you mention. You can see this variant of the code here: https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7 (note that I fixed a bug later on - don't blindly use this version!) There was a _lot_ of code duplication going on. I thought about creating an abstract base class from which this implementation, but also {{OrderedThreadPoolExecutor}} would extend - but that proved to much of a hassle. With the patch that I introduced, I believe I was able to add the new functionality to the existing implementation of {{OrderedThreadPoolExecutor}}, as a completely _optional_, _not enabled by default_ behavior, that introduces next to _no behavioral changes_ if unused. Given this, I would personally prefer my addition to remain in {{OrderedThreadPoolExecutor}} instead of introducing a different {{ThreadPool}} implementation, but that's not a hill I'm willing to die on. [~johnnyv], the {{FIFOEntry}} implementation is used as a tie-breaker, in case the {{Comparator}} that is provided does not distinguish between the sessions. It arguably can be left out, but I'd prefer to keep it in: It allows one to use relatively simple {{Comparator}} implementations, while still be ensured of defined behavior for those elements that are not covered in the {{Comparator}}. Again, not a hill I'm willing to die on though. [~elecharny], if you're willing to transform my patch into a separate {{Executor}} (in case my arguments above were not compelling), please, be my guest. If anything, it'd save me time, and having a second set of eyes looking at this code will not hurt either. > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Priority: Minor > Attachments: 20180216.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
[ https://issues.apache.org/jira/browse/DIRMINA-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-1078: --- Attachment: 20180216.patch > OrderedThreadPoolExecutor should allow sessions to be prioritized > - > > Key: DIRMINA-1078 > URL: https://issues.apache.org/jira/browse/DIRMINA-1078 > Project: MINA > Issue Type: New Feature > Components: Core >Reporter: Guus der Kinderen >Priority: Major > Attachments: 20180216.patch > > > The functionality provided in {{OrderedThreadPoolExecutor}} should be > augmented to, optionally, allow for assignment of priority to certain > sessions over others. > We've introduced this functionality after observing issues in a deployment > where system resources where being starved by the sheer amount of sessions > that attempted to perform TLS. Without the class introduced by this commit, > events for any session could eventually fail (time-out), causing the session > to fail. If that session happened to be a session that had already > established TLS, the resources that were spent on establishing TLS are > wasted. The negative effect is amplified by the fact that a typical client in > such a situation would attempt > to reconnect, which further adds to the load of the system already being > starved. > With the modifications introduced by the patch provided in this issue, > priority can be given to sessions that have already established TLS. This > dramatically reduces the issue described above, as the first sessions to fail > will be those that are still negotiating TLS. Using a specialized > {{Comparator}}, one can even prioritize between these, causing sessions for > which least effort has performed to fail before sessions that are more likely > to near TLS completion. > The patch intends to add this feature as optional functionality to the > existing implementation, with little side effects to the existing, default > behavior. > The implementation provided here was initially based on a copy of the > implementation of {{OrderedThreadPoolExecutor}} that introduced a > considerable amount of code duplication. For illustrative purposes, the line > of commits leading from that initial commit to the patch attached to this > JIRA issue can be found at > [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] > and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DIRMINA-1078) OrderedThreadPoolExecutor should allow sessions to be prioritized
Guus der Kinderen created DIRMINA-1078: -- Summary: OrderedThreadPoolExecutor should allow sessions to be prioritized Key: DIRMINA-1078 URL: https://issues.apache.org/jira/browse/DIRMINA-1078 Project: MINA Issue Type: New Feature Components: Core Reporter: Guus der Kinderen The functionality provided in {{OrderedThreadPoolExecutor}} should be augmented to, optionally, allow for assignment of priority to certain sessions over others. We've introduced this functionality after observing issues in a deployment where system resources where being starved by the sheer amount of sessions that attempted to perform TLS. Without the class introduced by this commit, events for any session could eventually fail (time-out), causing the session to fail. If that session happened to be a session that had already established TLS, the resources that were spent on establishing TLS are wasted. The negative effect is amplified by the fact that a typical client in such a situation would attempt to reconnect, which further adds to the load of the system already being starved. With the modifications introduced by the patch provided in this issue, priority can be given to sessions that have already established TLS. This dramatically reduces the issue described above, as the first sessions to fail will be those that are still negotiating TLS. Using a specialized {{Comparator}}, one can even prioritize between these, causing sessions for which least effort has performed to fail before sessions that are more likely to near TLS completion. The patch intends to add this feature as optional functionality to the existing implementation, with little side effects to the existing, default behavior. The implementation provided here was initially based on a copy of the implementation of {{OrderedThreadPoolExecutor}} that introduced a considerable amount of code duplication. For illustrative purposes, the line of commits leading from that initial commit to the patch attached to this JIRA issue can be found at [https://github.com/guusdk/mina/commit/c0a421cf445696fbfd4d5b10d650d7c71d8faab7] and later commits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DIRMINA-1072) SslFilter does not account for SSLEngine runtime exceptions
[ https://issues.apache.org/jira/browse/DIRMINA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guus der Kinderen updated DIRMINA-1072: --- Attachment: sslengine-exception-with-destroy.patch Ah, I missed that. I've modified the patch, and attached the modification as [^sslengine-exception-with-destroy.patch]. This modification will, upon detecting the failed state, destroy the handler. With outbound done, it's unlikely to be useful anymore. > SslFilter does not account for SSLEngine runtime exceptions > --- > > Key: DIRMINA-1072 > URL: https://issues.apache.org/jira/browse/DIRMINA-1072 > Project: MINA > Issue Type: Bug > Components: SSL >Affects Versions: 2.0.16 >Reporter: Guus der Kinderen > Attachments: sslengine-exception.patch, > sslengine-exception-with-destroy.patch > > > Mina's {{SslFilter}} wraps Mina's {{SslHandler}}, which itself wraps Java's > {{SSLEngine}}. > {{SslFilter}} does not catch runtime exceptions that are thrown by > {{SSLEngine}} - I am unsure if this is by design. > Ideally, we'd prevent the engine to get into a state where it can throw such > exceptions, but I'm not sure if that's completely feasible. > None-the-less, I'm here providing an improvement that prevents at least one > occurrence of an unchecked exception from being thrown (instead, my patch > preemptively throws an {{SSLException}} that is then caught by the exception > handling that's already in place). > An alternative to this fix could be an additional catch block, that handles > unchecked exceptions. > The scenario that is causing the unchecked exception that is caught by this > patch, is this: > * client connects, causes an SslFilter to be initialized, which causes the > SSLEngine to begin its handshake > * server shuts down the input (for instance, for inactivity, or as a > side-effect of resource starvation) > * client sends data > The corresponding stack trace starts with this: > {code}java.lang.IllegalStateException: Internal error > at > sun.security.ssl.SSLEngineImpl.initHandshaker(SSLEngineImpl.java:470) > at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1007) > at > sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) > at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) > at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624){code} > Inspiration for this fix was obtain from the Jetty project, notably, this > change: https://github.com/eclipse/jetty.project/issues/1228 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DIRMINA-1072) SslFilter does not account for SSLEngine runtime exceptions
Guus der Kinderen created DIRMINA-1072: -- Summary: SslFilter does not account for SSLEngine runtime exceptions Key: DIRMINA-1072 URL: https://issues.apache.org/jira/browse/DIRMINA-1072 Project: MINA Issue Type: Bug Components: SSL Affects Versions: 2.0.16 Reporter: Guus der Kinderen Attachments: sslengine-exception.patch Mina's {{SslFilter}} wraps Mina's {{SslHandler}}, which itself wraps Java's {{SSLEngine}}. {{SslFilter}} does not catch runtime exceptions that are thrown by {{SSLEngine}} - I am unsure if this is by design. Ideally, we'd prevent the engine to get into a state where it can throw such exceptions, but I'm not sure if that's completely feasible. None-the-less, I'm here providing an improvement that prevents at least one occurrence of an unchecked exception from being thrown (instead, my patch preemptively throws an {{SSLException}} that is then caught by the exception handling that's already in place). An alternative to this fix could be an additional catch block, that handles unchecked exceptions. The scenario that is causing the unchecked exception that is caught by this patch, is this: * client connects, causes an SslFilter to be initialized, which causes the SSLEngine to begin its handshake * server shuts down the input (for instance, for inactivity, or as a side-effect of resource starvation) * client sends data The corresponding stack trace starts with this: {code}java.lang.IllegalStateException: Internal error at sun.security.ssl.SSLEngineImpl.initHandshaker(SSLEngineImpl.java:470) at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1007) at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624){code} Inspiration for this fix was obtain from the Jetty project, notably, this change: https://github.com/eclipse/jetty.project/issues/1228 -- This message was sent by Atlassian JIRA (v6.4.14#64029)