On Sun, 2007-03-25 at 09:43 +0000, ant elder wrote:
> The symptoms I get do seem to match what you describe. There's still
> two problems with that though which I'd like to understand better.
> 
> 1) Why don't I see this with the non-NIO transport? For example  I can
> run the Synapse server samples in either the Synapse sample server
> which uses the NIO transport, or I can just use a separate axis2-1.1.1
> distro with the non-NIO transport. When using JMeter against
> axis2-1.1.1 it works fine and i can send tens of thousands of requests
> without any errors. Whats different here, the underlying TCP stack and
> config is the same isn't it? 

Anthony, Asankha, at al

The problem appears to be caused by Synapse opening an I/O pipe per
*every* incoming and outgoing HTTP message. On some platforms this can
be a very expensive operation both in terms of performance and system
resources. On Windows opening a I/O pipe apparently requires a local IP
port to be allocated. No wonder Synapse chokes only after a few thousand
of requests.

I see absolutely no reason why Synapse should make use of I/O pipes.
Essentially pipes are being used to bridge event-driven NIO and stream
based classic IO. There are other ways to get the job done. A trivial
shared buffer with synchronized access should perfectly suffice. I'll
happily lend you a helping hand if necessary.

> 2) Synapse often hangs after the IO error and needs to be restarted.
> Is there any way we can make it recover from this without requiring a
> restart? By handling the exception differently or something?
> 

Please let me know if you see any unchecked exceptions thrown by I/O
reactors, as those exceptions cause I/O dispatch threads to terminate,
effectively locking up the I/O reactor.

Oleg


>    ...ant 
> 
> On 3/24/07, Asankha C. Perera < [EMAIL PROTECTED]> wrote:
>         Ant
>         
>         This is the same error seen by Indika on Windows.. and I think
>         my analysis is correct. If you run the test for the first time
>         or after a few minutes of running the test last, you should be
>         able to go to around 1000 iterations. After you start to hit
>         this issue, even 200 iterations would give you the error. At
>         this time, doing a netstat -na should show you that most of
>         the tcp ports are in TIME_WAIT state. Usually it could take at
>         least one minute till a port is cleared up by the OS. The
>         tuning parameters I specified for Linux tells the OS to use
>         the full port range for applications, and to set the tcp fin
>         timeout to 30 secs - to clear up the ports as quickly as
>         possible. Without *any* OS tuning and on a Windows XP system -
>         you definitely will encounter this issue.
>         
>         
>         asankha
>         
>         ant elder wrote: 
>         > I've tried again with the latest Synapse and HTTP components
>         > code and several JVMs. The results feel slightly different
>         > than before but the end result is still always the root
>         > exception included below. Sometime it doesn't occur till
>         > around 1000 requests, but sometimes it happens after not
>         > many requests at all.  
>         > 
>         >    ...ant
>         > 
>         > java.io.IOException: Unable to establish loopback connection
>         >         at sun.nio.ch.PipeImpl$Initializer.run(Unknown
>         > Source)
>         >         at
>         > java.security.AccessController.doPrivileged(Native Method) 
>         >         at sun.nio.ch.PipeImpl.<init>(Unknown Source)
>         >         at sun.nio.ch.SelectorProviderImpl.openPipe(Unknown
>         > Source)
>         >         at java.nio.channels.Pipe.open(Unknown Source)
>         >         at
>         > org.apache.axis2.transport.nhttp.ServerHandler.requestReceived 
> (ServerHandler.java:108)
>         >         at
>         > 
> org.apache.axis2.transport.nhttp.LoggingNHttpServiceHandler.requestReceived(LoggingNHttpServiceHandler.java:83)
>         >         at
>         > org.apache.http.impl.nio.DefaultNHttpServerConnection.consumeInput 
> (DefaultNHttpServerConnection.java:96)
>         >         at
>         > 
> org.apache.axis2.transport.nhttp.PlainServerIOEventDispatch.inputReady(PlainServerIOEventDispatch.java:67)
>         >         at
>         > org.apache.http.impl.nio.reactor.BaseIOReactor.readable
>         > (BaseIOReactor.java:68)
>         >         at
>         > 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:160)
>         >         at
>         > 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java
>  :145)
>         >         at
>         > 
> org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:127)
>         >         at
>         > 
> org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java
>  :153)
>         >         at java.lang.Thread.run(Unknown Source)
>         > Caused by: java.net.BindException: Address already in use:
>         > connect
>         >         at sun.nio.ch.Net.connect(Native Method)
>         >         at sun.nio.ch.SocketChannelImpl.connect (Unknown
>         > Source)
>         >         at java.nio.channels.SocketChannel.open(Unknown
>         > Source)
>         > 
>         > On 3/23/07, Asankha C. Perera <[EMAIL PROTECTED]> wrote: 
>         >         Ant
>         >         
>         >         I am quite sure that the problem seen by Indika now
>         >         was related to the ports being exhausted - see the
>         >         following articles and esp. the "MaxUserPort" and
>         >         "TcpTimedWaitDelay" parameters that could tweaked -
>         >         to be consistent with what I am using before running
>         >         a load test on Linux. I will ask Indika to check
>         >         these on Monday - but you may try this in the
>         >         meantime if you get a chance
>         >         
>         >         
> http://www.microsoft.com/technet/network/deploy/depovg/tcpip2k.mspx 
>         >         
> http://www.microsoft.com/technet/community/columns/cableguy/cg1205.mspx 
>         >         
> http://www.psc.edu/networking/projects/tcptune/OStune/winxp/winxp_stepbystep.html
>   
>         >         
>         >         
>         >         asankha
>         >         
>         >         Asankha C. Perera wrote: 
>         >         > Hi Ant
>         >         > 
>         >         > I fixed this for Linux and JDK 1.5 - I am
>         >         > confident of this fix as I was able to first
>         >         > recreate the issue consistently and then see the
>         >         > fix in action using 5 concurrent users sending a
>         >         > total of 5000 messages multiple times. However
>         >         > Indika is still seeing a 'similar' issue in
>         >         > Windows using JDK 1.4. We will try to see if its
>         >         > related to JDK 1.4 or Windows. If you get the
>         >         > latest nhttp code and build the nhttp JAR you
>         >         > could verify this fix - and let me know.
>         >         > 
>         >         > I am listing some of the linux commands that came
>         >         > in handy for the resolution incase someone wants
>         >         > to check this.
>         >         > 
>         >         > lsof -p 7426 => lists the open files for the pid
>         >         > given after the -p option
>         >         > 
>         >         > ls -l /proc/9976/fd | wc -l => for each process
>         >         > the /proc filesystem lists the files used and thus
>         >         > you could count the open files with this command
>         >         > 
>         >         > asankha
>         >         > 
>         >         > Asankha C. Perera wrote: 
>         >         > > Ant / Oleg
>         >         > > 
>         >         > > I can recreate this issue on both Windows and
>         >         > > Linux and think its caused by my code related to
>         >         > > use of Pipes.. and I am actively looking into
>         >         > > this right now.. will get back to you on what I
>         >         > > find.
>         >         > > 
>         >         > > asankha
>         >         > > 
>         >         > > ant elder wrote: 
>         >         > > > I've tried on several JDKs now and _always_
>         >         > > > get similar intermittent I/O related errors. I
>         >         > > > can use JMeter directly against Axis2-1.1.1
>         >         > > > without any problems at all, so this does look
>         >         > > > like some issue with the NIO transport. Be
>         >         > > > really good to hear from other Windows users
>         >         > > > to see if this is just my specific environment
>         >         > > > or  a more general problem problem. 
>         >         > > > 
>         >         > > > To recreate:
>         >         > > > 
>         >         > > > 1) build Synapse server sample by running
>         >         > > > 'ant' in the samples\axis2Server\src
>         >         > > > \SimpleStockQuoteService directory
>         >         > > > 2) start the sample service by running samples
>         >         > > > \axis2Server\axis2server.bat 
>         >         > > > 3) get the Synapse config  (either 8 or 501)
>         >         > > > from http://people.apache.org/~antelder/temp/,
>         >         > > > put in repository\conf\sample and start
>         >         > > > syanps: bin\synapse.bat -sample=8 
>         >         > > > 4) get the JMeter config test1.jmx from
>         >         > > > http://people.apache.org/~antelder/temp/,
>         >         > > > start Jmeter and File -> Open and point to the
>         >         > > > test1.jmx file
>         >         > > > 5) JMeter Run -> Start and after not to long
>         >         > > > IO errors should appear in the Syanpse
>         >         > > > console 
>         >         > > > 
>         >         > > >    ...ant 
>         >         > > > 
>         >         > > > ---------- Forwarded message ----------
>         >         > > > From: Asankha C. Perera <[EMAIL PROTECTED]>
>         >         > > > Date: Mar 22, 2007 4:58 PM 
>         >         > > > Subject: Re: [jira] Resolved: (HTTPCORE-60)
>         >         > > > Transport appears to be hanging because an
>         >         > > > unchecked exception caused the I/O dispatch
>         >         > > > thread to terminate
>         >         > > > To: HttpComponents Project
>         >         > > > <[email protected]>
>         >         > > > 
>         >         > > > Oleg/Ant 
>         >         > > > 
>         >         > > > I am guessing this is something to do with
>         >         > > > Windows or the JDK you use.. But I am unable
>         >         > > > to test this week, so will try to my best to
>         >         > > > try this sometime next week. As I said, on
>         >         > > > Linux I have run the system through thousands
>         >         > > > of messages and multiple threads concurrently
>         >         > > > and have fixed all the issues I came across.
>         >         > > > 
>         >         > > > So Oleg, I do not see this as a blocker for
>         >         > > > the HttpCore release - but I will use your
>         >         > > > latest snapshots in Synapse to check on this
>         >         > > > in future if it occurs again
>         >         > > > 
>         >         > > > thanks
>         >         > > > asankha
>         >         > > > 
>         >         > > > Oleg Kalnichevski (JIRA) wrote: 
>         >         > > > >      [ 
>         >         > > > > 
>         >         > > > > 
> https://issues.apache.org/jira/browse/HTTPCORE-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>         >         > > > > 
>         >         > > > >  ]
>         >         > > > > 
>         >         > > > > Oleg Kalnichevski resolved HTTPCORE-60.
>         >         > > > > ---------------------------------------
>         >         > > > > 
>         >         > > > > 
>         >         > > > > 
>         >         > > > >     Resolution: Fixed
>         >         > > > > 
>         >         > > > > Anthony
>         >         > > > > It turned out ClosedChannelException is a checked 
> I/O exception so it cannot kill the I/O dispatch thread. So, apparently I was 
> wrong in my initial assertion about the cause of the Synapse I/O transport 
> lockup. I tweaked HttpCore code a little and changed the IOSessionImpl to 
> catch all ChannelClosedException-s thrown by the underlying byte channel just 
> in case.
>         >         > > > > 
>         >         > > > > 
>         >         > > > > 
>         >         > > > > 
>         >         > > > > 
>         >         > > > > Please review the changes and let me know if it is 
> okay to proceed with the release
>         >         > > > > 
>         >         > > > > Oleg
>         >         > > > > 
>         >         > > > >   
>         >         > > > > > Transport appears to be hanging because an 
> unchecked exception caused the I/O dispatch thread to terminate
>         >         > > > > > 
> ----------------------------------------------------------------------------------------------------------
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > >                 Key: HTTPCORE-60
>         >         > > > > >                 URL: 
> https://issues.apache.org/jira/browse/HTTPCORE-60
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > >             Project: HttpComponents Core
>         >         > > > > >          Issue Type: Bug
>         >         > > > > >    Affects Versions: 4.0-alpha4
>         >         > > > > >            Reporter: ant elder
>         >         > > > > >         Assigned To: Oleg Kalnichevski
>         >         > > > > >             Fix For: 4.0-alpha4
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > See discussion on synapse-dev mailing list: 
> http://www.nabble.com/Intermittent-IO-Errors-using-Synapse-tf3439957.html
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > The transport appears to be hanging because an 
> unchecked exception
>         >         > > > > > caused the I/O dispatch thread to terminate. I 
> believe there are several
>         >         > > > > > different types of problems (at least two) that 
> we are seeing here.
>         >         > > > > > 
>         >         > > > > > [I/O reactor worker thread 5] ERROR ServerHandler 
> - I/O Error : null
>         >         > > > > >     
>         >         > > > > > > java.nio.channels.ClosedChannelException
>         >         > > > > > >         at
>         >         > > > > > > 
> sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:112)
>         >         > > > > > >         at
>         >         > > > > > > 
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java
>         >         > > > > > > 
>         >         > > > > > > 
>         >         > > > > > > :139)
>         >         > > > > > > 
>         >         > > > > > >       
>         >         > > > >   
>         >         > > > 
> --------------------------------------------------------------------- To 
> unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: 
> [EMAIL PROTECTED] 
>         >         > > 
> --------------------------------------------------------------------- To 
> unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: 
> [EMAIL PROTECTED] 
>         >         > 
> --------------------------------------------------------------------- To 
> unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: 
> [EMAIL PROTECTED] 
>         >         
> --------------------------------------------------------------------- To 
> unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: 
> [EMAIL PROTECTED] 
>         > 
>         --------------------------------------------------------------------- 
> To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: 
> [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to