Re: observing hangs after update to MINA 2.0.14

2016-10-17 Thread Christoph John

Hi Emmanuel,

thanks for your input.

Best regards,
Chris.


On 15/10/16 01:10, Emmanuel Lecharny wrote:

On Fri, Oct 14, 2016 at 11:47 PM, Christoph John 
wrote:


Hi Emmanuel,

I think I have found the problem in our code and it is most likely related
to https://issues.apache.org/jira/browse/DIRMINA-1025 .
In the code we were calling ioSession.closeNow(). This worked before but
due to the changed flushing behaviour messages sometimes got lost on
disconnect which lead to hangs because expected messages were not received.


closeNow() don't try to flush anything anymore : it kind of 'brutally'
close the connection. Calling such a method will also make it impossibe to
receive any message. Is that the expected behaviour for you ?




I have another question regarding this. Here is the code that is called
when disconnecting the session:
--
 // This is primarily to allow logout messages to be sent before
 // closing the socket. Future versions of MINA may have support
 // in close() to force all pending messages to be written before
 // the socket close is performed.
 //
 // Only wait for a limited time since MINA may deadlock
 // in some rare cases where a socket dies in a strange way.
 for (int i = 0; i < 5 && ioSession.getScheduledWriteMessages() >
0; i++) {
 try {
 Thread.sleep(10L);
 } catch (InterruptedException e) {
 Thread.currentThread().interrupt();
 }
 }

 // We cannot call join() on the CloseFuture returned
 // by the following call. We are using a minimal
 // threading model and calling join will prevent the
 // close event from being processed by this thread (if
 // this thread is the MINA IO processor thread.
 ioSession.closeOnFlush();
--

This code was already there when we used MINA 1.x, so I do not know if the
for-loop is still necessary when calling closeOnFlush() afterwards.


closeOnFlush() is trying to send all the messages in the write queue,
before actually closing the session. I do think teh loop is useless.




It also did not help in our situation when we still called closeNow(). So
either waiting 5*10 milliseconds was too short or there were no scheduled
write messages present.

What about the comment about the deadlock?


My understanding is that the session disconnection was never detected, and
there were some pending message, which oviously weren't able to be
transmitted, so the closeOnFlush() call was blocking forever. MINA 2.0.14
should not behave this way because an exception occuring because the
session is closed will empty the write queue.



If I understand correctly then I have a probability of hitting
https://issues.apache.org/jira/browse/DIRMINA-893 ?? So I should probably
go for await() or await(timeout)?


I don't know. The JIRA you are mentionning is very secific : it's about
using MINA to implement a proxy, with one connection dying, leving the
other pending for ever. As this is an application we haven't written, it's
hard to reproduce it without a clear exemple that we don't have.



--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com



http://www.macd.com 



MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
 Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald




take care of the environment - print only if necessary


Re: observing hangs after update to MINA 2.0.14

2016-10-14 Thread Emmanuel Lecharny
On Fri, Oct 14, 2016 at 11:47 PM, Christoph John 
wrote:

> Hi Emmanuel,
>
> I think I have found the problem in our code and it is most likely related
> to https://issues.apache.org/jira/browse/DIRMINA-1025 .
> In the code we were calling ioSession.closeNow(). This worked before but
> due to the changed flushing behaviour messages sometimes got lost on
> disconnect which lead to hangs because expected messages were not received.
>

closeNow() don't try to flush anything anymore : it kind of 'brutally'
close the connection. Calling such a method will also make it impossibe to
receive any message. Is that the expected behaviour for you ?



> I have another question regarding this. Here is the code that is called
> when disconnecting the session:
> --
> // This is primarily to allow logout messages to be sent before
> // closing the socket. Future versions of MINA may have support
> // in close() to force all pending messages to be written before
> // the socket close is performed.
> //
> // Only wait for a limited time since MINA may deadlock
> // in some rare cases where a socket dies in a strange way.
> for (int i = 0; i < 5 && ioSession.getScheduledWriteMessages() >
> 0; i++) {
> try {
> Thread.sleep(10L);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> }
> }
>
> // We cannot call join() on the CloseFuture returned
> // by the following call. We are using a minimal
> // threading model and calling join will prevent the
> // close event from being processed by this thread (if
> // this thread is the MINA IO processor thread.
> ioSession.closeOnFlush();
> --
>
> This code was already there when we used MINA 1.x, so I do not know if the
> for-loop is still necessary when calling closeOnFlush() afterwards.


closeOnFlush() is trying to send all the messages in the write queue,
before actually closing the session. I do think teh loop is useless.



> It also did not help in our situation when we still called closeNow(). So
> either waiting 5*10 milliseconds was too short or there were no scheduled
> write messages present.
>
> What about the comment about the deadlock?


My understanding is that the session disconnection was never detected, and
there were some pending message, which oviously weren't able to be
transmitted, so the closeOnFlush() call was blocking forever. MINA 2.0.14
should not behave this way because an exception occuring because the
session is closed will empty the write queue.


> If I understand correctly then I have a probability of hitting
> https://issues.apache.org/jira/browse/DIRMINA-893 ?? So I should probably
> go for await() or await(timeout)?
>

I don't know. The JIRA you are mentionning is very secific : it's about
using MINA to implement a proxy, with one connection dying, leving the
other pending for ever. As this is an application we haven't written, it's
hard to reproduce it without a clear exemple that we don't have.

>
> Thanks in advance for your help.
> Best regards,
> Chris.
>
>
>
> On 26/09/16 11:22, Emmanuel Lécharny wrote:
>
>> Le 26/09/16 à 09:01, Christoph John a écrit :
>>
>>> Until now I could not reproduce the problem reliably. However, I think
>>> it might not be a "hang" but have to do with the changed behaviour
>>> regarding flushing that has been fixed in MINA 2.0.14 (DIRMINA-1025/1021)
>>> E.g. the tests wait for a message to arrive that the other side has
>>> sent but it doesn't seem to be received.
>>>
>>> I guess I'll have to review the closeNow/closeOnFlush calls in
>>> QuickFIX/J then? Maybe it is only related to the test suite, though.
>>>
>> I think you sent a test case that reproduce the issue (AFAIR). I ran
>> this test and got a hang, if I try to close both ends, that is what I'd
>> like to investigate.
>>
>> Sadly, I have been pretty busy lately, and I had to cut a new release
>> 2.0.15 urgently, due to a big bug in SSL, so I didn't had chance to go
>> any further. I hope to have time to do that this week.
>>
>>
> --
> Christoph John
> Development & Support
> Direct: +49 241 557080-28
> Mailto:christoph.j...@macd.com
>
>
>
> http://www.macd.com 
> 
> 
>
> 
> 
> MACD GmbH
> Oppenhoffallee 103
> D-52066 Aachen
> Tel: +49 241 557080-0 | Fax: +49 241 557080-10
>  Amtsgericht Aachen: HRB 8151
> Ust.-Id: DE 813021663
>
> Geschäftsführer: George Macdonald
> 
> 
>
> 
> 
>
> take care of the environment -

Re: observing hangs after update to MINA 2.0.14

2016-10-14 Thread Christoph John

Hi Emmanuel,

I think I have found the problem in our code and it is most likely related to 
https://issues.apache.org/jira/browse/DIRMINA-1025 .
In the code we were calling ioSession.closeNow(). This worked before but due to the changed flushing 
behaviour messages sometimes got lost on disconnect which lead to hangs because expected messages 
were not received.


I have another question regarding this. Here is the code that is called when 
disconnecting the session:
--
// This is primarily to allow logout messages to be sent before
// closing the socket. Future versions of MINA may have support
// in close() to force all pending messages to be written before
// the socket close is performed.
//
// Only wait for a limited time since MINA may deadlock
// in some rare cases where a socket dies in a strange way.
for (int i = 0; i < 5 && ioSession.getScheduledWriteMessages() > 0; 
i++) {
try {
Thread.sleep(10L);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}

// We cannot call join() on the CloseFuture returned
// by the following call. We are using a minimal
// threading model and calling join will prevent the
// close event from being processed by this thread (if
// this thread is the MINA IO processor thread.
ioSession.closeOnFlush();
--

This code was already there when we used MINA 1.x, so I do not know if the for-loop is still 
necessary when calling closeOnFlush() afterwards. It also did not help in our situation when we 
still called closeNow(). So either waiting 5*10 milliseconds was too short or there were no 
scheduled write messages present.


What about the comment about the deadlock? If I understand correctly then I have a probability of 
hitting https://issues.apache.org/jira/browse/DIRMINA-893 ?? So I should probably go for await() or 
await(timeout)?


Thanks in advance for your help.
Best regards,
Chris.


On 26/09/16 11:22, Emmanuel Lécharny wrote:

Le 26/09/16 à 09:01, Christoph John a écrit :

Until now I could not reproduce the problem reliably. However, I think
it might not be a "hang" but have to do with the changed behaviour
regarding flushing that has been fixed in MINA 2.0.14 (DIRMINA-1025/1021)
E.g. the tests wait for a message to arrive that the other side has
sent but it doesn't seem to be received.

I guess I'll have to review the closeNow/closeOnFlush calls in
QuickFIX/J then? Maybe it is only related to the test suite, though.

I think you sent a test case that reproduce the issue (AFAIR). I ran
this test and got a hang, if I try to close both ends, that is what I'd
like to investigate.

Sadly, I have been pretty busy lately, and I had to cut a new release
2.0.15 urgently, due to a big bug in SSL, so I didn't had chance to go
any further. I hope to have time to do that this week.



--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com



http://www.macd.com 



MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
 Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald




take care of the environment - print only if necessary


Re: observing hangs after update to MINA 2.0.14

2016-09-26 Thread Emmanuel Lécharny
Le 26/09/16 à 09:01, Christoph John a écrit :
> Until now I could not reproduce the problem reliably. However, I think
> it might not be a "hang" but have to do with the changed behaviour
> regarding flushing that has been fixed in MINA 2.0.14 (DIRMINA-1025/1021)
> E.g. the tests wait for a message to arrive that the other side has
> sent but it doesn't seem to be received.
>
> I guess I'll have to review the closeNow/closeOnFlush calls in
> QuickFIX/J then? Maybe it is only related to the test suite, though.

I think you sent a test case that reproduce the issue (AFAIR). I ran
this test and got a hang, if I try to close both ends, that is what I'd
like to investigate.

Sadly, I have been pretty busy lately, and I had to cut a new release
2.0.15 urgently, due to a big bug in SSL, so I didn't had chance to go
any further. I hope to have time to do that this week.



Re: observing hangs after update to MINA 2.0.14

2016-09-26 Thread Christoph John
Until now I could not reproduce the problem reliably. However, I think it might not be a "hang" but 
have to do with the changed behaviour regarding flushing that has been fixed in MINA 2.0.14 
(DIRMINA-1025/1021)
E.g. the tests wait for a message to arrive that the other side has sent but it doesn't seem to be 
received.


I guess I'll have to review the closeNow/closeOnFlush calls in QuickFIX/J then? Maybe it is only 
related to the test suite, though.


Thanks,
Chris.



On 21/09/16 14:51, Christoph John wrote:
OK, will open a JIRA issue and try to get a test case. It might very well be a race condition, but 
the question is why it is manifesting every time I run the test suite with MINA 2.0.14 and never 
with 2.0.13.

Will keep you updated.

Thanks,
Chris.


On 21/09/16 14:48, Emmanuel Lécharny wrote:

Le 21/09/16 à 13:17, Christoph John a écrit :

Hi,

do you need any more information from my side? Should I create a JIRA
issue? The problem can be reproduced with QuickFIX/J code but I could
check if can condense a MINA-only test case if required.

A test case would be very valuable, and, yes, a JIRA will definitvely help.

My take on this issue is that if both the client and the server close
the session more or less at the same time, one side will hang, due to
some race condition.





--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com



http://www.macd.com 



MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
 Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald




take care of the environment - print only if necessary


Re: observing hangs after update to MINA 2.0.14

2016-09-21 Thread Christoph John
OK, will open a JIRA issue and try to get a test case. It might very well be a race condition, but 
the question is why it is manifesting every time I run the test suite with MINA 2.0.14 and never 
with 2.0.13.

Will keep you updated.

Thanks,
Chris.


On 21/09/16 14:48, Emmanuel Lécharny wrote:

Le 21/09/16 à 13:17, Christoph John a écrit :

Hi,

do you need any more information from my side? Should I create a JIRA
issue? The problem can be reproduced with QuickFIX/J code but I could
check if can condense a MINA-only test case if required.

A test case would be very valuable, and, yes, a JIRA will definitvely help.

My take on this issue is that if both the client and the server close
the session more or less at the same time, one side will hang, due to
some race condition.





Re: observing hangs after update to MINA 2.0.14

2016-09-21 Thread Emmanuel Lécharny
Le 21/09/16 à 13:17, Christoph John a écrit :
> Hi,
>
> do you need any more information from my side? Should I create a JIRA
> issue? The problem can be reproduced with QuickFIX/J code but I could
> check if can condense a MINA-only test case if required.

A test case would be very valuable, and, yes, a JIRA will definitvely help.

My take on this issue is that if both the client and the server close
the session more or less at the same time, one side will hang, due to
some race condition.



Re: observing hangs after update to MINA 2.0.14

2016-09-21 Thread Christoph John

Hi,

do you need any more information from my side? Should I create a JIRA issue? The problem can be 
reproduced with QuickFIX/J code but I could check if can condense a MINA-only test case if required.


Thanks,
Chris.

On 15/09/16 11:20, Christoph John wrote:

Great, thanks in advance.

Cheers,
Chris.

On 13/09/16 11:11, Emmanuel Lecharny wrote:

Might be caused by some problem when the connection is closed on both sides
at the same time.

As soon as I get back my computer (it's being repaired atm), I will be able
to have a look at the pb.

On Mon, Sep 12, 2016 at 9:57 PM, Christoph John
wrote:


Hi,

I am maintaining the QuickFIX/J project (https://github.com/quickfix-
j/quickfixj) which uses MINA as communication library. Recently we
integrated a pull request which included calling dispose(true) on an
IoAcceptor to clean up the resources. This was no problem with MINA 2.0.13.
However, when running the QuickFIX/J test suite with MINA 2.0.14 there are
intermittent hangs when stopping the connection. I am attaching a stack
trace.
I browsed the changes in MINA 2.0.14 and found nothing that caught my eye.
But I am not so familiar with the MINA code.

Update: I just ran the test suite without the "dispose(true)" change and
am also observing the hangs with MINA 2.0.14.

Maybe someone can shed some light on this.
Many thanks in advance and best regards,
Chris.


--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com  



http://www.macd.com
--
--
MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
  Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald
--
--
take care of the environment - print only if necessary





--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com



http://www.macd.com 



MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
 Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald




take care of the environment - print only if necessary


--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com



http://www.macd.com 



MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
 Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald




take care of the environment - print only if necessary


Re: observing hangs after update to MINA 2.0.14

2016-09-15 Thread Christoph John

Great, thanks in advance.

Cheers,
Chris.

On 13/09/16 11:11, Emmanuel Lecharny wrote:

Might be caused by some problem when the connection is closed on both sides
at the same time.

As soon as I get back my computer (it's being repaired atm), I will be able
to have a look at the pb.

On Mon, Sep 12, 2016 at 9:57 PM, Christoph John 
wrote:


Hi,

I am maintaining the QuickFIX/J project (https://github.com/quickfix-
j/quickfixj) which uses MINA as communication library. Recently we
integrated a pull request which included calling dispose(true) on an
IoAcceptor to clean up the resources. This was no problem with MINA 2.0.13.
However, when running the QuickFIX/J test suite with MINA 2.0.14 there are
intermittent hangs when stopping the connection. I am attaching a stack
trace.
I browsed the changes in MINA 2.0.14 and found nothing that caught my eye.
But I am not so familiar with the MINA code.

Update: I just ran the test suite without the "dispose(true)" change and
am also observing the hangs with MINA 2.0.14.

Maybe someone can shed some light on this.
Many thanks in advance and best regards,
Chris.


--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com 



http://www.macd.com
--
--
MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
  Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald
--
--
take care of the environment - print only if necessary






--
Christoph John
Development & Support
Direct: +49 241 557080-28
Mailto:christoph.j...@macd.com



http://www.macd.com 



MACD GmbH
Oppenhoffallee 103
D-52066 Aachen
Tel: +49 241 557080-0 | Fax: +49 241 557080-10
 Amtsgericht Aachen: HRB 8151
Ust.-Id: DE 813021663

Geschäftsführer: George Macdonald




take care of the environment - print only if necessary


Re: observing hangs after update to MINA 2.0.14

2016-09-13 Thread Emmanuel Lecharny
Might be caused by some problem when the connection is closed on both sides
at the same time.

As soon as I get back my computer (it's being repaired atm), I will be able
to have a look at the pb.

On Mon, Sep 12, 2016 at 9:57 PM, Christoph John 
wrote:

> Hi,
>
> I am maintaining the QuickFIX/J project (https://github.com/quickfix-
> j/quickfixj) which uses MINA as communication library. Recently we
> integrated a pull request which included calling dispose(true) on an
> IoAcceptor to clean up the resources. This was no problem with MINA 2.0.13.
> However, when running the QuickFIX/J test suite with MINA 2.0.14 there are
> intermittent hangs when stopping the connection. I am attaching a stack
> trace.
> I browsed the changes in MINA 2.0.14 and found nothing that caught my eye.
> But I am not so familiar with the MINA code.
>
> Update: I just ran the test suite without the "dispose(true)" change and
> am also observing the hangs with MINA 2.0.14.
>
> Maybe someone can shed some light on this.
> Many thanks in advance and best regards,
> Chris.
>
>
> --
> Christoph John
> Development & Support
> Direct: +49 241 557080-28
> Mailto:christoph.j...@macd.com 
>
>
>
> http://www.macd.com
> --
> --
> MACD GmbH
> Oppenhoffallee 103
> D-52066 Aachen
> Tel: +49 241 557080-0 | Fax: +49 241 557080-10
>  Amtsgericht Aachen: HRB 8151
> Ust.-Id: DE 813021663
>
> Geschäftsführer: George Macdonald
> --
> --
> take care of the environment - print only if necessary
>



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com