On Sun, 2007-03-25 at 09:43 +0000, ant elder wrote: > The symptoms I get do seem to match what you describe. There's still > two problems with that though which I'd like to understand better. > > 1) Why don't I see this with the non-NIO transport? For example I can > run the Synapse server samples in either the Synapse sample server > which uses the NIO transport, or I can just use a separate axis2-1.1.1 > distro with the non-NIO transport. When using JMeter against > axis2-1.1.1 it works fine and i can send tens of thousands of requests > without any errors. Whats different here, the underlying TCP stack and > config is the same isn't it?
Anthony, Asankha, at al The problem appears to be caused by Synapse opening an I/O pipe per *every* incoming and outgoing HTTP message. On some platforms this can be a very expensive operation both in terms of performance and system resources. On Windows opening a I/O pipe apparently requires a local IP port to be allocated. No wonder Synapse chokes only after a few thousand of requests. I see absolutely no reason why Synapse should make use of I/O pipes. Essentially pipes are being used to bridge event-driven NIO and stream based classic IO. There are other ways to get the job done. A trivial shared buffer with synchronized access should perfectly suffice. I'll happily lend you a helping hand if necessary. > 2) Synapse often hangs after the IO error and needs to be restarted. > Is there any way we can make it recover from this without requiring a > restart? By handling the exception differently or something? > Please let me know if you see any unchecked exceptions thrown by I/O reactors, as those exceptions cause I/O dispatch threads to terminate, effectively locking up the I/O reactor. Oleg > ...ant > > On 3/24/07, Asankha C. Perera < [EMAIL PROTECTED]> wrote: > Ant > > This is the same error seen by Indika on Windows.. and I think > my analysis is correct. If you run the test for the first time > or after a few minutes of running the test last, you should be > able to go to around 1000 iterations. After you start to hit > this issue, even 200 iterations would give you the error. At > this time, doing a netstat -na should show you that most of > the tcp ports are in TIME_WAIT state. Usually it could take at > least one minute till a port is cleared up by the OS. The > tuning parameters I specified for Linux tells the OS to use > the full port range for applications, and to set the tcp fin > timeout to 30 secs - to clear up the ports as quickly as > possible. Without *any* OS tuning and on a Windows XP system - > you definitely will encounter this issue. > > > asankha > > ant elder wrote: > > I've tried again with the latest Synapse and HTTP components > > code and several JVMs. The results feel slightly different > > than before but the end result is still always the root > > exception included below. Sometime it doesn't occur till > > around 1000 requests, but sometimes it happens after not > > many requests at all. > > > > ...ant > > > > java.io.IOException: Unable to establish loopback connection > > at sun.nio.ch.PipeImpl$Initializer.run(Unknown > > Source) > > at > > java.security.AccessController.doPrivileged(Native Method) > > at sun.nio.ch.PipeImpl.<init>(Unknown Source) > > at sun.nio.ch.SelectorProviderImpl.openPipe(Unknown > > Source) > > at java.nio.channels.Pipe.open(Unknown Source) > > at > > org.apache.axis2.transport.nhttp.ServerHandler.requestReceived > (ServerHandler.java:108) > > at > > > org.apache.axis2.transport.nhttp.LoggingNHttpServiceHandler.requestReceived(LoggingNHttpServiceHandler.java:83) > > at > > org.apache.http.impl.nio.DefaultNHttpServerConnection.consumeInput > (DefaultNHttpServerConnection.java:96) > > at > > > org.apache.axis2.transport.nhttp.PlainServerIOEventDispatch.inputReady(PlainServerIOEventDispatch.java:67) > > at > > org.apache.http.impl.nio.reactor.BaseIOReactor.readable > > (BaseIOReactor.java:68) > > at > > > org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:160) > > at > > > org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java > :145) > > at > > > org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:127) > > at > > > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java > :153) > > at java.lang.Thread.run(Unknown Source) > > Caused by: java.net.BindException: Address already in use: > > connect > > at sun.nio.ch.Net.connect(Native Method) > > at sun.nio.ch.SocketChannelImpl.connect (Unknown > > Source) > > at java.nio.channels.SocketChannel.open(Unknown > > Source) > > > > On 3/23/07, Asankha C. Perera <[EMAIL PROTECTED]> wrote: > > Ant > > > > I am quite sure that the problem seen by Indika now > > was related to the ports being exhausted - see the > > following articles and esp. the "MaxUserPort" and > > "TcpTimedWaitDelay" parameters that could tweaked - > > to be consistent with what I am using before running > > a load test on Linux. I will ask Indika to check > > these on Monday - but you may try this in the > > meantime if you get a chance > > > > > http://www.microsoft.com/technet/network/deploy/depovg/tcpip2k.mspx > > > http://www.microsoft.com/technet/community/columns/cableguy/cg1205.mspx > > > http://www.psc.edu/networking/projects/tcptune/OStune/winxp/winxp_stepbystep.html > > > > > > > asankha > > > > Asankha C. Perera wrote: > > > Hi Ant > > > > > > I fixed this for Linux and JDK 1.5 - I am > > > confident of this fix as I was able to first > > > recreate the issue consistently and then see the > > > fix in action using 5 concurrent users sending a > > > total of 5000 messages multiple times. However > > > Indika is still seeing a 'similar' issue in > > > Windows using JDK 1.4. We will try to see if its > > > related to JDK 1.4 or Windows. If you get the > > > latest nhttp code and build the nhttp JAR you > > > could verify this fix - and let me know. > > > > > > I am listing some of the linux commands that came > > > in handy for the resolution incase someone wants > > > to check this. > > > > > > lsof -p 7426 => lists the open files for the pid > > > given after the -p option > > > > > > ls -l /proc/9976/fd | wc -l => for each process > > > the /proc filesystem lists the files used and thus > > > you could count the open files with this command > > > > > > asankha > > > > > > Asankha C. Perera wrote: > > > > Ant / Oleg > > > > > > > > I can recreate this issue on both Windows and > > > > Linux and think its caused by my code related to > > > > use of Pipes.. and I am actively looking into > > > > this right now.. will get back to you on what I > > > > find. > > > > > > > > asankha > > > > > > > > ant elder wrote: > > > > > I've tried on several JDKs now and _always_ > > > > > get similar intermittent I/O related errors. I > > > > > can use JMeter directly against Axis2-1.1.1 > > > > > without any problems at all, so this does look > > > > > like some issue with the NIO transport. Be > > > > > really good to hear from other Windows users > > > > > to see if this is just my specific environment > > > > > or a more general problem problem. > > > > > > > > > > To recreate: > > > > > > > > > > 1) build Synapse server sample by running > > > > > 'ant' in the samples\axis2Server\src > > > > > \SimpleStockQuoteService directory > > > > > 2) start the sample service by running samples > > > > > \axis2Server\axis2server.bat > > > > > 3) get the Synapse config (either 8 or 501) > > > > > from http://people.apache.org/~antelder/temp/, > > > > > put in repository\conf\sample and start > > > > > syanps: bin\synapse.bat -sample=8 > > > > > 4) get the JMeter config test1.jmx from > > > > > http://people.apache.org/~antelder/temp/, > > > > > start Jmeter and File -> Open and point to the > > > > > test1.jmx file > > > > > 5) JMeter Run -> Start and after not to long > > > > > IO errors should appear in the Syanpse > > > > > console > > > > > > > > > > ...ant > > > > > > > > > > ---------- Forwarded message ---------- > > > > > From: Asankha C. Perera <[EMAIL PROTECTED]> > > > > > Date: Mar 22, 2007 4:58 PM > > > > > Subject: Re: [jira] Resolved: (HTTPCORE-60) > > > > > Transport appears to be hanging because an > > > > > unchecked exception caused the I/O dispatch > > > > > thread to terminate > > > > > To: HttpComponents Project > > > > > <[email protected]> > > > > > > > > > > Oleg/Ant > > > > > > > > > > I am guessing this is something to do with > > > > > Windows or the JDK you use.. But I am unable > > > > > to test this week, so will try to my best to > > > > > try this sometime next week. As I said, on > > > > > Linux I have run the system through thousands > > > > > of messages and multiple threads concurrently > > > > > and have fixed all the issues I came across. > > > > > > > > > > So Oleg, I do not see this as a blocker for > > > > > the HttpCore release - but I will use your > > > > > latest snapshots in Synapse to check on this > > > > > in future if it occurs again > > > > > > > > > > thanks > > > > > asankha > > > > > > > > > > Oleg Kalnichevski (JIRA) wrote: > > > > > > [ > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HTTPCORE-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > > > > > > > > > > > > ] > > > > > > > > > > > > Oleg Kalnichevski resolved HTTPCORE-60. > > > > > > --------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > Resolution: Fixed > > > > > > > > > > > > Anthony > > > > > > It turned out ClosedChannelException is a checked > I/O exception so it cannot kill the I/O dispatch thread. So, apparently I was > wrong in my initial assertion about the cause of the Synapse I/O transport > lockup. I tweaked HttpCore code a little and changed the IOSessionImpl to > catch all ChannelClosedException-s thrown by the underlying byte channel just > in case. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please review the changes and let me know if it is > okay to proceed with the release > > > > > > > > > > > > Oleg > > > > > > > > > > > > > > > > > > > Transport appears to be hanging because an > unchecked exception caused the I/O dispatch thread to terminate > > > > > > > > ---------------------------------------------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Key: HTTPCORE-60 > > > > > > > URL: > https://issues.apache.org/jira/browse/HTTPCORE-60 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Project: HttpComponents Core > > > > > > > Issue Type: Bug > > > > > > > Affects Versions: 4.0-alpha4 > > > > > > > Reporter: ant elder > > > > > > > Assigned To: Oleg Kalnichevski > > > > > > > Fix For: 4.0-alpha4 > > > > > > > > > > > > > > > > > > > > > > > > > > > > See discussion on synapse-dev mailing list: > http://www.nabble.com/Intermittent-IO-Errors-using-Synapse-tf3439957.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The transport appears to be hanging because an > unchecked exception > > > > > > > caused the I/O dispatch thread to terminate. I > believe there are several > > > > > > > different types of problems (at least two) that > we are seeing here. > > > > > > > > > > > > > > [I/O reactor worker thread 5] ERROR ServerHandler > - I/O Error : null > > > > > > > > > > > > > > > java.nio.channels.ClosedChannelException > > > > > > > > at > > > > > > > > > sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:112) > > > > > > > > at > > > > > > > > > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java > > > > > > > > > > > > > > > > > > > > > > > > :139) > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- To > unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To > unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: > [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- To > unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: > [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To > unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: > [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: > [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
