Re: tomcat 5.0.16 Replication

2004-01-09 Thread jean-philippe . belanger
Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!
Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:

Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't work.
The problem is that when RH9 tries to write the ACK back to the NIO socket,
it never reaches the other node. and times out after a long time.
I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip
-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.
this is good info, I just got RH9 installed. will be trying it out this and
next week.
Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
The only changes in the ReplicationListener class is the try catch that
was added.
the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.
Jean-Philippe Bélanger

Steve Nelson wrote:

 

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.
To verify your results I just put a Thread.Sleep(1); where you
   

suggested and
 

I also see the jump in performance.

Something must have changed in ReplicationListener that causes this because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure it out
when I get back to where I can diff the files.
-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java
Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey
   

available
 

That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.
I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.
I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.
Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:



   

Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with
-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:
Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.
Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.
Maybe this can help.
If you need the complete hprof file I can send them to you.
1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
1 11.48% 11.48%  5485 java.lang.Object.wait
2 11.46% 22.94% 11786 java.lang.Object.wait
3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
4 10.93% 44.81%4114   224 java.lang.Thread.sleep
5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
6  7.37% 63.09%  28   495 java.lang.Object.wait
7  7.24% 70.34%  10   576 java.lang.Object.wait
8  4.57% 74.90%  90   716 java.lang.Thread.sleep
9  4.48% 79.38%   1   909 java.lang.Object.wait
10  4.48% 83.86%   1   908 java.lang.Object.wait
11  4.48% 88.34%  15   810 java.lang.Object.wait
12  4.47% 92.81%   1   910 java.net.PlainSocketImpl.socketAccept
13  0.71% 93.52%   2   623 java.lang.Object.wait
14  0.56% 94.08%   2   706 java.lang.Object.wait
15  0.38% 94.46%   2   914 java.lang.Object.wait
16  0.24% 94.70% 775   913

Re: tomcat 5.0.16 Replication

2004-01-09 Thread jean-philippe . belanger
Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4
One more question for you Filip, is the useDirtyFlag working at all? It 
seams like even if it's set to true, the whole session gets replicated 
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:

Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!
Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:

Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't 
work.
The problem is that when RH9 tries to write the ACK back to the NIO 
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip
-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.
this is good info, I just got RH9 installed. will be trying it out 
this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
The only changes in the ReplicationListener class is the try catch that
was added.
the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.
Jean-Philippe Bélanger

Steve Nelson wrote:

 

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.
To verify your results I just put a Thread.Sleep(1); where you
  
suggested and
 

I also see the jump in performance.

Something must have changed in ReplicationListener that causes this 
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure 
it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java
Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey
  
available
 

That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.
I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.
I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.
Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:



  

Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with
-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: 

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.
Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.
Maybe this can help.
If you need the complete hprof file I can send them to you.
1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
1 11.48% 11.48%  5485 java.lang.Object.wait
2 11.46% 22.94% 11786 java.lang.Object.wait
3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
4 10.93% 44.81%4114   224 java.lang.Thread.sleep
5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
6  7.37% 63.09%  28   495 java.lang.Object.wait
7  7.24% 70.34%  10   576 java.lang.Object.wait
8  4.57% 74.90%  90   716 java.lang.Thread.sleep
9  4.48% 79.38%   1   909 java.lang.Object.wait
10  4.48% 83.86%   1   908 java.lang.Object.wait
11  4.48

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Filip Hanik
I will be implementing some performance improvements today.
I'll let you know how it goes

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 4:33 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!

Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:

Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't work.
The problem is that when RH9 tries to write the ACK back to the NIO socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:



I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you


suggested and


I also see the jump in performance.

Something must have changed in ReplicationListener that causes
this because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey


available


That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:





Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with
-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.

Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.

Maybe this can help.
If you need the complete hprof file I can send them to you.

1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
 1 11.48% 11.48%  5485 java.lang.Object.wait
 2 11.46% 22.94% 11786 java.lang.Object.wait
 3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
 4 10.93% 44.81%4114   224 java.lang.Thread.sleep
 5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
 6  7.37% 63.09%  28   495 java.lang.Object.wait
 7  7.24% 70.34%  10   576 java.lang.Object.wait
 8  4.57% 74.90%  90   716 java.lang.Thread.sleep
 9  4.48% 79.38%   1   909 java.lang.Object.wait
10  4.48% 83.86%   1   908 java.lang.Object.wait
11  4.48% 88.34%  15   810

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Filip Hanik
useDirtyFlag=true

means that session (yes the whole) only gets replicated when setAttribute
and removeAttribute is  called

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:33 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:

 Hurray for Fillip! :)

 I'll get the CVS head for the module today and test this out.
 Happy to see that it got fixed that quickly!

 Thanks again and I'll let you know how it goes

 Jean-Philippe

 Filip Hanik wrote:

 Jean-Philippe and Steve,
 I fixed the bug, and tried replication on RH9. Immediately it didn't
 work.
 The problem is that when RH9 tries to write the ACK back to the NIO
 socket,
 it never reaches the other node. and times out after a long time.

 I set LD_ASSUME_KERNEL=2.4 and it started to work

 Filip

 -Original Message-
 From: Filip Hanik [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 08, 2004 6:43 PM
 To: Tomcat Users List
 Subject: RE: tomcat 5.0.16 Replication


 ok guys,
 good news. The 100% cpu is totally my fault. I messed up on that one.
 I was registering OP_WRITE as an interest
 this is not good :)
 checking in the working code in 15 min, some more regression tests
 Filip

 -Original Message-
 From: Filip Hanik [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 08, 2004 2:54 PM
 To: Tomcat Users List
 Subject: RE: tomcat 5.0.16 Replication


 another code change was, that I am now accepting keys for OP_READ and
 OP_WRITE. before it was only OP_READ,
 but for synchronous replication I need both.

 this is good info, I just got RH9 installed. will be trying it out
 this and
 next week.

 Filip

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 08, 2004 11:46 AM
 To: Tomcat Users List
 Subject: Re: tomcat 5.0.16 Replication


 The only changes in the ReplicationListener class is the try catch that
 was added.

 the code logic is the same. Weird enough. So it's probably elsewhere
 that something changed in the state of the SelectionKey.

 Jean-Philippe Bélanger

 Steve Nelson wrote:



 I was just about to try this actually. I found through googling alot of
 people
 having problems with select with 1.4 and NIO with Redhat 9. They were
 actually
 experiencing crashes though.

 To verify your results I just put a Thread.Sleep(1); where you


 suggested and


 I also see the jump in performance.

 Something must have changed in ReplicationListener that causes this
 because
 the 5.0.16
 version doesn't seem to have the problem. I'll see if I can figure
 it out
 when I get back to where I can diff the files.

 -Steve

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 08, 2004 12:25 PM
 To: Tomcat Users List
 Subject: Re: tomcat 5.0.16 Replication


 More content for you Filip.

 I've checked and followed the code of the listen event in
 ReplicationListener.java

 Here's what happening:

 selector.select(timeout) - return immediatly with one SelectorKey


 available


 That key is not Acceptable and not Readable so it immediatly skip those
 IFs and loops back to the beginning.

 I've put traces and this is executed once every millisecond hence the
 100% load on the server.
 Just to make sure, I've put a Thread.sleep(10) at the end of the loop
 and the CPU dropped back to 0% and the replication still worked nicely
 but probably a little slower since the wait of 10ms.

 I don't know much about those NIO packages but seams like the
 select(timeout) method shouldn't return a SelectorKey of that state.
 with any waiting.

 Let me know what you can dig from those.

 Jean-Philippe Bélanger

 [EMAIL PROTECTED] wrote:





 Hi Filip.

 I did some profiling of 40mins of tomcat with and without a 2nd node
 up. here are the results with

-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:


 Those number are cpu=times and not samples since the later one freezes
 on my systems.
 So that list shows the time spent in each methods.

 Major difference the some call to the sun.nio.ch.PollArrayWrapper
 class. I don't know much about those NIOs packages but 819000 call in
 40 mins is a lot.
 The Socket Interface was called more than twice with 2 hosts than with
 a single one. Which seams normal.

 Maybe this can help.
 If you need the complete hprof file I can send them to you.

 1 host in cluster:
 CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
 rank   self  accum   count trace method
 1 11.48% 11.48%  5485 java.lang.Object.wait
 2 11.46% 22.94% 11786 java.lang.Object.wait
 3 10.95% 33.89%4115   215

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Steve Nelson
I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It 
seams like even if it's set to true, the whole session gets replicated 
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:

 Hurray for Fillip! :)

 I'll get the CVS head for the module today and test this out.
 Happy to see that it got fixed that quickly!

 Thanks again and I'll let you know how it goes

 Jean-Philippe

 Filip Hanik wrote:

 Jean-Philippe and Steve,
 I fixed the bug, and tried replication on RH9. Immediately it didn't 
 work.
 The problem is that when RH9 tries to write the ACK back to the NIO 
 socket,
 it never reaches the other node. and times out after a long time.

 I set LD_ASSUME_KERNEL=2.4 and it started to work

 Filip

 -Original Message-
 From: Filip Hanik [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 08, 2004 6:43 PM
 To: Tomcat Users List
 Subject: RE: tomcat 5.0.16 Replication


 ok guys,
 good news. The 100% cpu is totally my fault. I messed up on that one.
 I was registering OP_WRITE as an interest
 this is not good :)
 checking in the working code in 15 min, some more regression tests
 Filip

 -Original Message-
 From: Filip Hanik [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 08, 2004 2:54 PM
 To: Tomcat Users List
 Subject: RE: tomcat 5.0.16 Replication


 another code change was, that I am now accepting keys for OP_READ and
 OP_WRITE. before it was only OP_READ,
 but for synchronous replication I need both.

 this is good info, I just got RH9 installed. will be trying it out 
 this and
 next week.

 Filip

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 08, 2004 11:46 AM
 To: Tomcat Users List
 Subject: Re: tomcat 5.0.16 Replication


 The only changes in the ReplicationListener class is the try catch that
 was added.

 the code logic is the same. Weird enough. So it's probably elsewhere
 that something changed in the state of the SelectionKey.

 Jean-Philippe Bélanger

 Steve Nelson wrote:

  

 I was just about to try this actually. I found through googling alot of
 people
 having problems with select with 1.4 and NIO with Redhat 9. They were
 actually
 experiencing crashes though.

 To verify your results I just put a Thread.Sleep(1); where you
   

 suggested and
  

 I also see the jump in performance.

 Something must have changed in ReplicationListener that causes this 
 because
 the 5.0.16
 version doesn't seem to have the problem. I'll see if I can figure 
 it out
 when I get back to where I can diff the files.

 -Steve

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 08, 2004 12:25 PM
 To: Tomcat Users List
 Subject: Re: tomcat 5.0.16 Replication


 More content for you Filip.

 I've checked and followed the code of the listen event in
 ReplicationListener.java

 Here's what happening:

 selector.select(timeout) - return immediatly with one SelectorKey
   

 available
  

 That key is not Acceptable and not Readable so it immediatly skip those
 IFs and loops back to the beginning.

 I've put traces and this is executed once every millisecond hence the
 100% load on the server.
 Just to make sure, I've put a Thread.sleep(10) at the end of the loop
 and the CPU dropped back to 0% and the replication still worked nicely
 but probably a little slower since the wait of 10ms.

 I don't know much about those NIO packages but seams like the
 select(timeout) method shouldn't return a SelectorKey of that state.
 with any waiting.

 Let me know what you can dig from those.

 Jean-Philippe Bélanger

 [EMAIL PROTECTED] wrote:



   

 Hi Filip.

 I did some profiling of 40mins of tomcat with and without a 2nd node
 up. here are the results with

-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: 


 Those number are cpu=times and not samples since the later one freezes
 on my systems.
 So that list shows the time spent in each methods.

 Major difference the some call to the sun.nio.ch.PollArrayWrapper
 class. I don't know much about those NIOs packages but 819000 call in
 40 mins is a lot.
 The Socket Interface was called more than twice with 2 hosts than with
 a single one. Which seams normal.

 Maybe this can help.
 If you need the complete hprof file I can send them to you.

 1 host in cluster:
 CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
 rank   self  accum   count trace method
 1 11.48% 11.48%  5485 java.lang.Object.wait
 2 11.46% 22.94% 11786 java.lang.Object.wait
 3 10.95% 33.89%4115   215

Re: tomcat 5.0.16 Replication

2004-01-09 Thread jean-philippe . belanger
A... I see.

So no way to have only the value which setAttribute was called on to be 
replicated (yet...) ?

Thanks

Jean-Philippe Bélanger

Filip Hanik wrote:

useDirtyFlag=true

means that session (yes the whole) only gets replicated when setAttribute
and removeAttribute is  called
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:33 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4
One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(
Jean-Philippe

[EMAIL PROTECTED] wrote:

 

Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!
Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:

   

Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't
work.
The problem is that when RH9 tries to write the ACK back to the NIO
socket,
it never reaches the other node. and times out after a long time.
I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip
-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.
this is good info, I just got RH9 installed. will be trying it out
this and
next week.
Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
The only changes in the ReplicationListener class is the try catch that
was added.
the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.
Jean-Philippe Bélanger

Steve Nelson wrote:



 

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.
To verify your results I just put a Thread.Sleep(1); where you

   

suggested and

 

I also see the jump in performance.

Something must have changed in ReplicationListener that causes this
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure
it out
when I get back to where I can diff the files.
-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java
Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey

   

available

 

That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.
I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.
I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.
Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:





   

Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with
 

-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:
 

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.
Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.
Maybe this can help.
If you need the complete hprof file I can send them to you.
1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
1 11.48% 11.48%  5485 java.lang.Object.wait

Re: tomcat 5.0.16 Replication

2004-01-09 Thread jean-philippe . belanger
The replication message ACK never get back to the sender.
So my webpages never loads without that flag.
I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4
One more question for you Filip, is the useDirtyFlag working at all? It 
seams like even if it's set to true, the whole session gets replicated 
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:

 

Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!
Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:

   

Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't 
work.
The problem is that when RH9 tries to write the ACK back to the NIO 
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip
-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.
this is good info, I just got RH9 installed. will be trying it out 
this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
The only changes in the ReplicationListener class is the try catch that
was added.
the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.
Jean-Philippe Bélanger

Steve Nelson wrote:



 

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.
To verify your results I just put a Thread.Sleep(1); where you
 
   

suggested and

 

I also see the jump in performance.

Something must have changed in ReplicationListener that causes this 
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure 
it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java
Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey
 
   

available

 

That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.
I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.
I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.
Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:



 

   

Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with
 

-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: 
 

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.
Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.
Maybe this can help.
If you need the complete hprof file I can send them to you.
1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
1 11.48% 11.48%  5485

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Steve Nelson

Now that's really very strange. I am running RH9 and everything seems to go
through just fine.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The replication message ACK never get back to the sender.
So my webpages never loads without that flag.

I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It 
seams like even if it's set to true, the whole session gets replicated 
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:

  

Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!

Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:



Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't 
work.
The problem is that when RH9 tries to write the ACK back to the NIO 
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out 
this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:

 

  

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you
  


suggested and
 

  

I also see the jump in performance.

Something must have changed in ReplicationListener that causes this 
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure 
it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey
  


available
 

  

That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:



  



Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with

  

-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: 
  

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.

Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Filip Hanik
interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL

what VM (version and name) are you using?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:59 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Now that's really very strange. I am running RH9 and everything seems to go
through just fine.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The replication message ACK never get back to the sender.
So my webpages never loads without that flag.

I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:



Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!

Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:



Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't
work.
The problem is that when RH9 tries to write the ACK back to the NIO
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out
this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:





I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you



suggested and




I also see the jump in performance.

Something must have changed in ReplicationListener that causes this
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure
it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey



available




That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:







Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with



-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:


Those number are cpu=times and not samples since the later one freezes
on my systems.
So

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Steve Nelson
sun JDK 1.4.2 for Linux
Kernel 2.4.20-8smp
Tomcat 5.0.16 with catalina-cluster.jar from CVS head

Hrmmmare yours SMP servers? Could be something odd with synch if that is
the case.


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 1:01 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL

what VM (version and name) are you using?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:59 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Now that's really very strange. I am running RH9 and everything seems to go
through just fine.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The replication message ACK never get back to the sender.
So my webpages never loads without that flag.

I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:



Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!

Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:



Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't
work.
The problem is that when RH9 tries to write the ACK back to the NIO
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out
this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:





I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you



suggested and




I also see the jump in performance.

Something must have changed in ReplicationListener that causes this
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure
it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey



available




That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.

Let me know what you

Re: tomcat 5.0.16 Replication

2004-01-09 Thread jean-philippe . belanger
uname -a reports:
2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux
Filip Hanik wrote:

interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL

what VM (version and name) are you using?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:59 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication


Now that's really very strange. I am running RH9 and everything seems to go
through just fine.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
The replication message ACK never get back to the sender.
So my webpages never loads without that flag.
I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

 

I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4
One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(
Jean-Philippe

[EMAIL PROTECTED] wrote:



   

Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!
Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:



 

Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't
work.
The problem is that when RH9 tries to write the ACK back to the NIO
socket,
it never reaches the other node. and times out after a long time.
I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip
-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.
this is good info, I just got RH9 installed. will be trying it out
this and
next week.
Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
The only changes in the ReplicationListener class is the try catch that
was added.
the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.
Jean-Philippe Bélanger

Steve Nelson wrote:





   

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.
To verify your results I just put a Thread.Sleep(1); where you



 

suggested and



   

I also see the jump in performance.

Something must have changed in ReplicationListener that causes this
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure
it out
when I get back to where I can diff the files.
-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java
Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey



 

available



   

That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.
I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.
I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.
Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:







 

Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with


   

-Xrunhprof:cpu

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Filip Hanik
[EMAIL PROTECTED] bin]# uname -a
Linux rh9 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux

[EMAIL PROTECTED] bin]# java -version
java version 1.4.2_03
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)


-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 11:05 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication


sun JDK 1.4.2 for Linux
Kernel 2.4.20-8smp
Tomcat 5.0.16 with catalina-cluster.jar from CVS head

Hrmmmare yours SMP servers? Could be something odd with synch if that is
the case.


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 1:01 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL

what VM (version and name) are you using?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:59 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Now that's really very strange. I am running RH9 and everything seems to go
through just fine.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The replication message ACK never get back to the sender.
So my webpages never loads without that flag.

I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:



Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!

Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:



Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't
work.
The problem is that when RH9 tries to write the ACK back to the NIO
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out
this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:





I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you



suggested and




I also see the jump in performance.

Something must have changed in ReplicationListener that causes this
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure
it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey



available




That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Steve Nelson
uname -a
machine #1) Linux draco 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686
i686 i386 GNU/Linux
machine #2) Linux scorpio 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003
i686 i686 i386 GNU/Linux


java -version:
java version 1.4.2_03
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

same on both


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 1:56 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


[EMAIL PROTECTED] bin]# uname -a
Linux rh9 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux

[EMAIL PROTECTED] bin]# java -version
java version 1.4.2_03
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)


-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 11:05 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication


sun JDK 1.4.2 for Linux
Kernel 2.4.20-8smp
Tomcat 5.0.16 with catalina-cluster.jar from CVS head

Hrmmmare yours SMP servers? Could be something odd with synch if that is
the case.


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 1:01 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL

what VM (version and name) are you using?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:59 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Now that's really very strange. I am running RH9 and everything seems to go
through just fine.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The replication message ACK never get back to the sender.
So my webpages never loads without that flag.

I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:



Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!

Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:



Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't
work.
The problem is that when RH9 tries to write the ACK back to the NIO
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out
this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:





I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you



suggested and




I also see the jump in performance.

Something must have changed in ReplicationListener that causes this
because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can

RE: tomcat 5.0.16 Replication

2004-01-09 Thread Steve Nelson


Hrmmm, perhaps I should reboot using the non-SMP kernel and try it. I'll
have to do that when I get back to the servers.


-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 2:04 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication


uname -a
machine #1) Linux draco 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686
i686 i386 GNU/Linux
machine #2) Linux scorpio 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003
i686 i686 i386 GNU/Linux


java -version:
java version 1.4.2_03
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

same on both


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 1:56 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


[EMAIL PROTECTED] bin]# uname -a
Linux rh9 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux

[EMAIL PROTECTED] bin]# java -version
java version 1.4.2_03
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)


-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 11:05 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication


sun JDK 1.4.2 for Linux
Kernel 2.4.20-8smp
Tomcat 5.0.16 with catalina-cluster.jar from CVS head

Hrmmmare yours SMP servers? Could be something odd with synch if that is
the case.


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 1:01 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL

what VM (version and name) are you using?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 10:59 AM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Now that's really very strange. I am running RH9 and everything seems to go
through just fine.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:56 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The replication message ACK never get back to the sender.
So my webpages never loads without that flag.

I think it is only needed under REDHAT 9.

Jean-Philippe Bélanger

Steve Nelson wrote:

I don't seem to need the ld_assume_kernel thing. What are the symptoms when
it is required?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Friday, January 09, 2004 12:33 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Just tried the CVS head and everything works with any CPU going crazy!
only if ld_assume_kernel is set to 2.4

One more question for you Filip, is the useDirtyFlag working at all? It
seams like even if it's set to true, the whole session gets replicated
after each request. :(

Jean-Philippe

[EMAIL PROTECTED] wrote:



Hurray for Fillip! :)

I'll get the CVS head for the module today and test this out.
Happy to see that it got fixed that quickly!

Thanks again and I'll let you know how it goes

Jean-Philippe

Filip Hanik wrote:



Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't
work.
The problem is that when RH9 tries to write the ACK back to the NIO
socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out
this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:





I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were

Re: tomcat 5.0.16 Replication

2004-01-08 Thread jean-philippe . belanger
Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node up. 
here are the results with 
-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:

Those number are cpu=times and not samples since the later one freezes 
on my systems.
So that list shows the time spent in each methods.

Major difference the some call to the sun.nio.ch.PollArrayWrapper class. 
I don't know much about those NIOs packages but 819000 call in 40 mins 
is a lot.
The Socket Interface was called more than twice with 2 hosts than with a 
single one. Which seams normal.

Maybe this can help.
If you need the complete hprof file I can send them to you.
1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
  1 11.48% 11.48%  5485 java.lang.Object.wait
  2 11.46% 22.94% 11786 java.lang.Object.wait
  3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
  4 10.93% 44.81%4114   224 java.lang.Thread.sleep
  5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
  6  7.37% 63.09%  28   495 java.lang.Object.wait
  7  7.24% 70.34%  10   576 java.lang.Object.wait
  8  4.57% 74.90%  90   716 java.lang.Thread.sleep
  9  4.48% 79.38%   1   909 java.lang.Object.wait
 10  4.48% 83.86%   1   908 java.lang.Object.wait
 11  4.48% 88.34%  15   810 java.lang.Object.wait
 12  4.47% 92.81%   1   910 java.net.PlainSocketImpl.socketAccept
 13  0.71% 93.52%   2   623 java.lang.Object.wait
 14  0.56% 94.08%   2   706 java.lang.Object.wait
 15  0.38% 94.46%   2   914 java.lang.Object.wait
 16  0.24% 94.70% 775   913 java.lang.String.toCharArray
 17  0.23% 94.93%   3   475 java.lang.Thread.sleep
 18  0.16% 95.09%   2   472 java.lang.Object.wait
 19  0.15% 95.24%   2   595 java.lang.Thread.sleep
 20  0.15% 95.40%   2   586 java.lang.Thread.sleep
 21  0.15% 95.55%   2   703 java.lang.Thread.sleep
 22  0.15% 95.70%   2   476 java.lang.Thread.sleep
 23  0.15% 95.85%   2   692 java.lang.Thread.sleep
 24  0.12% 95.97%  218595   385 java.lang.CharacterDataLatin1.toLowerCase
 25  0.12% 96.09%  218595   408 java.lang.Character.toLowerCase
 26  0.11% 96.20%  218595   433 java.lang.CharacterDataLatin1.getProperties
 27  0.10% 96.30%  210925   389 java.lang.String.equalsIgnoreCase
 28  0.08% 96.38%  157259   387 java.lang.String.charAt
 29  0.08% 96.46%   1   646 java.lang.Thread.sleep
 30  0.08% 96.53%   1   634 java.lang.Thread.sleep
 31  0.08% 96.61%   1   903 java.lang.Thread.sleep
 32  0.08% 96.69%   1   714 java.lang.Thread.sleep
 33  0.08% 96.76%   1   811 java.lang.Thread.sleep
 34  0.08% 96.84%   1   715 java.lang.Thread.sleep
2 hosts:
CPU TIME (ms) BEGIN (total = 37247) Thu Jan  8 11:01:28 2004
rank   self  accum   count trace method
  1  9.56%  9.56%  5285 java.lang.Object.wait
  2  9.56% 19.12%  2986 java.lang.Object.wait
  3  9.30% 28.43%   3   267 java.lang.Object.wait
  4  9.25% 37.68%6644   224 java.lang.Thread.sleep
  5  9.23% 46.91%   13116   215 java.net.PlainDatagramSocketImpl.receive
  6  7.67% 54.58%   3   266 java.lang.Object.wait
  7  5.90% 60.47%  39   847 java.lang.Object.wait
  8  5.76% 66.24%  12   503 java.lang.Object.wait
  9  3.90% 70.14% 145   975 java.lang.Thread.sleep
 10  3.90% 74.04%   1  1174 java.lang.Object.wait
 11  3.90% 77.94%   1  1173 java.lang.Object.wait
 12  3.90% 81.84%  25   973 java.lang.Object.wait
 13  3.90% 85.74%   1  1175 java.net.PlainSocketImpl.socketAccept
 14  3.88% 89.62%  819692   214 sun.nio.ch.PollArrayWrapper.poll0
 15  0.75% 90.37%   2   958 java.lang.Object.wait
 16  0.28% 90.65%   2   457 java.lang.Object.wait
 17  0.26% 90.91%   2  1181 java.lang.Object.wait
Filip Hanik wrote:

I'll try to get an instance going today. Will let you know how it goes
also, try asynchronous replication, does it still go to 100%?
Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 12:08 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication


Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:
*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from CVS.
Of course the Manager would almost always timeout before it would recieve
the message.
Now it gets the message right away, but maxes my machine out.



-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat

Re: tomcat 5.0.16 Replication

2004-01-08 Thread jean-philippe . belanger
'
Subject: RE: tomcat 5.0.16 Replication


Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:
*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from 
CVS.
Of course the Manager would almost always timeout before it would 
recieve
the message.

Now it gets the message right away, but maxes my machine out.



-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication
100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication


I was having random problems with clustering when starting up. Mostly 
it had
to do with Timing out
when the manager was starting up. I built the CVS version and it 
solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, 
Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast 
packets are
restricted to a crossover link between the servers. There are 3 hosts 
in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I 
know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 
server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, 
but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing 
millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Jean-Philippe Bélanger
(514)228-8800 ext 3060
111 Duke
CGI
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: tomcat 5.0.16 Replication

2004-01-08 Thread Steve Nelson


I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you suggested and
I also see the jump in performance.

Something must have changed in ReplicationListener that causes this because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in 
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey available
That key is not Acceptable and not Readable so it immediatly skip those 
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the 
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop 
and the CPU dropped back to 0% and the replication still worked nicely 
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the 
select(timeout) method shouldn't return a SelectorKey of that state. 
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:

 Hi Filip.

 I did some profiling of 40mins of tomcat with and without a 2nd node 
 up. here are the results with 
 -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:

 Those number are cpu=times and not samples since the later one freezes 
 on my systems.
 So that list shows the time spent in each methods.

 Major difference the some call to the sun.nio.ch.PollArrayWrapper 
 class. I don't know much about those NIOs packages but 819000 call in 
 40 mins is a lot.
 The Socket Interface was called more than twice with 2 hosts than with 
 a single one. Which seams normal.

 Maybe this can help.
 If you need the complete hprof file I can send them to you.

 1 host in cluster:
 CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
 rank   self  accum   count trace method
   1 11.48% 11.48%  5485 java.lang.Object.wait
   2 11.46% 22.94% 11786 java.lang.Object.wait
   3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
   4 10.93% 44.81%4114   224 java.lang.Thread.sleep
   5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
   6  7.37% 63.09%  28   495 java.lang.Object.wait
   7  7.24% 70.34%  10   576 java.lang.Object.wait
   8  4.57% 74.90%  90   716 java.lang.Thread.sleep
   9  4.48% 79.38%   1   909 java.lang.Object.wait
  10  4.48% 83.86%   1   908 java.lang.Object.wait
  11  4.48% 88.34%  15   810 java.lang.Object.wait
  12  4.47% 92.81%   1   910 java.net.PlainSocketImpl.socketAccept
  13  0.71% 93.52%   2   623 java.lang.Object.wait
  14  0.56% 94.08%   2   706 java.lang.Object.wait
  15  0.38% 94.46%   2   914 java.lang.Object.wait
  16  0.24% 94.70% 775   913 java.lang.String.toCharArray
  17  0.23% 94.93%   3   475 java.lang.Thread.sleep
  18  0.16% 95.09%   2   472 java.lang.Object.wait
  19  0.15% 95.24%   2   595 java.lang.Thread.sleep
  20  0.15% 95.40%   2   586 java.lang.Thread.sleep
  21  0.15% 95.55%   2   703 java.lang.Thread.sleep
  22  0.15% 95.70%   2   476 java.lang.Thread.sleep
  23  0.15% 95.85%   2   692 java.lang.Thread.sleep
  24  0.12% 95.97%  218595   385 java.lang.CharacterDataLatin1.toLowerCase
  25  0.12% 96.09%  218595   408 java.lang.Character.toLowerCase
  26  0.11% 96.20%  218595   433 
 java.lang.CharacterDataLatin1.getProperties
  27  0.10% 96.30%  210925   389 java.lang.String.equalsIgnoreCase
  28  0.08% 96.38%  157259   387 java.lang.String.charAt
  29  0.08% 96.46%   1   646 java.lang.Thread.sleep
  30  0.08% 96.53%   1   634 java.lang.Thread.sleep
  31  0.08% 96.61%   1   903 java.lang.Thread.sleep
  32  0.08% 96.69%   1   714 java.lang.Thread.sleep
  33  0.08% 96.76%   1   811 java.lang.Thread.sleep
  34  0.08% 96.84%   1   715 java.lang.Thread.sleep

 2 hosts:
 CPU TIME (ms) BEGIN (total = 37247) Thu Jan  8 11:01:28 2004
 rank   self  accum   count trace method
   1  9.56%  9.56%  5285 java.lang.Object.wait
   2  9.56% 19.12%  2986 java.lang.Object.wait
   3  9.30% 28.43%   3   267 java.lang.Object.wait
   4  9.25% 37.68%6644   224 java.lang.Thread.sleep
   5  9.23% 46.91%   13116   215 java.net.PlainDatagramSocketImpl.receive
   6  7.67% 54.58%   3   266 java.lang.Object.wait
   7  5.90% 60.47%  39   847 java.lang.Object.wait
   8  5.76% 66.24%  12   503 java.lang.Object.wait
   9  3.90% 70.14% 145   975

Re: tomcat 5.0.16 Replication

2004-01-08 Thread jean-philippe . belanger
The only changes in the ReplicationListener class is the try catch that 
was added.

the code logic is the same. Weird enough. So it's probably elsewhere 
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.
To verify your results I just put a Thread.Sleep(1); where you suggested and
I also see the jump in performance.
Something must have changed in ReplicationListener that causes this because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure it out
when I get back to where I can diff the files.
-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
More content for you Filip.

I've checked and followed the code of the listen event in 
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey available
That key is not Acceptable and not Readable so it immediatly skip those 
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the 
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop 
and the CPU dropped back to 0% and the replication still worked nicely 
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the 
select(timeout) method shouldn't return a SelectorKey of that state. 
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:

 

Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node 
up. here are the results with 
-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:

Those number are cpu=times and not samples since the later one freezes 
on my systems.
So that list shows the time spent in each methods.

Major difference the some call to the sun.nio.ch.PollArrayWrapper 
class. I don't know much about those NIOs packages but 819000 call in 
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with 
a single one. Which seams normal.

Maybe this can help.
If you need the complete hprof file I can send them to you.
1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
 1 11.48% 11.48%  5485 java.lang.Object.wait
 2 11.46% 22.94% 11786 java.lang.Object.wait
 3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
 4 10.93% 44.81%4114   224 java.lang.Thread.sleep
 5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
 6  7.37% 63.09%  28   495 java.lang.Object.wait
 7  7.24% 70.34%  10   576 java.lang.Object.wait
 8  4.57% 74.90%  90   716 java.lang.Thread.sleep
 9  4.48% 79.38%   1   909 java.lang.Object.wait
10  4.48% 83.86%   1   908 java.lang.Object.wait
11  4.48% 88.34%  15   810 java.lang.Object.wait
12  4.47% 92.81%   1   910 java.net.PlainSocketImpl.socketAccept
13  0.71% 93.52%   2   623 java.lang.Object.wait
14  0.56% 94.08%   2   706 java.lang.Object.wait
15  0.38% 94.46%   2   914 java.lang.Object.wait
16  0.24% 94.70% 775   913 java.lang.String.toCharArray
17  0.23% 94.93%   3   475 java.lang.Thread.sleep
18  0.16% 95.09%   2   472 java.lang.Object.wait
19  0.15% 95.24%   2   595 java.lang.Thread.sleep
20  0.15% 95.40%   2   586 java.lang.Thread.sleep
21  0.15% 95.55%   2   703 java.lang.Thread.sleep
22  0.15% 95.70%   2   476 java.lang.Thread.sleep
23  0.15% 95.85%   2   692 java.lang.Thread.sleep
24  0.12% 95.97%  218595   385 java.lang.CharacterDataLatin1.toLowerCase
25  0.12% 96.09%  218595   408 java.lang.Character.toLowerCase
26  0.11% 96.20%  218595   433 
java.lang.CharacterDataLatin1.getProperties
27  0.10% 96.30%  210925   389 java.lang.String.equalsIgnoreCase
28  0.08% 96.38%  157259   387 java.lang.String.charAt
29  0.08% 96.46%   1   646 java.lang.Thread.sleep
30  0.08% 96.53%   1   634 java.lang.Thread.sleep
31  0.08% 96.61%   1   903 java.lang.Thread.sleep
32  0.08% 96.69%   1   714 java.lang.Thread.sleep
33  0.08% 96.76%   1   811 java.lang.Thread.sleep
34  0.08% 96.84%   1   715 java.lang.Thread.sleep

2 hosts:
CPU TIME (ms) BEGIN (total = 37247) Thu Jan  8 11:01:28 2004
rank   self  accum   count trace method
 1  9.56%  9.56%  5285 java.lang.Object.wait
 2  9.56% 19.12%  2986 java.lang.Object.wait
 3  9.30% 28.43%   3   267 java.lang.Object.wait
 4  9.25% 37.68%6644   224 java.lang.Thread.sleep
 5  9.23% 46.91%   13116   215 java.net.PlainDatagramSocketImpl.receive
 6  7.67% 54.58%   3   266 java.lang.Object.wait
 7

RE: tomcat 5.0.16 Replication

2004-01-08 Thread Filip Hanik
another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you
suggested and
I also see the jump in performance.

Something must have changed in ReplicationListener that causes this because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey
available
That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:



Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with
-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.

Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.

Maybe this can help.
If you need the complete hprof file I can send them to you.

1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
  1 11.48% 11.48%  5485 java.lang.Object.wait
  2 11.46% 22.94% 11786 java.lang.Object.wait
  3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
  4 10.93% 44.81%4114   224 java.lang.Thread.sleep
  5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
  6  7.37% 63.09%  28   495 java.lang.Object.wait
  7  7.24% 70.34%  10   576 java.lang.Object.wait
  8  4.57% 74.90%  90   716 java.lang.Thread.sleep
  9  4.48% 79.38%   1   909 java.lang.Object.wait
 10  4.48% 83.86%   1   908 java.lang.Object.wait
 11  4.48% 88.34%  15   810 java.lang.Object.wait
 12  4.47% 92.81%   1   910 java.net.PlainSocketImpl.socketAccept
 13  0.71% 93.52%   2   623 java.lang.Object.wait
 14  0.56% 94.08%   2   706 java.lang.Object.wait
 15  0.38% 94.46%   2   914 java.lang.Object.wait
 16  0.24% 94.70% 775   913 java.lang.String.toCharArray
 17  0.23% 94.93%   3   475 java.lang.Thread.sleep
 18  0.16% 95.09%   2   472 java.lang.Object.wait
 19  0.15% 95.24%   2   595 java.lang.Thread.sleep
 20  0.15% 95.40%   2   586 java.lang.Thread.sleep
 21  0.15% 95.55%   2   703 java.lang.Thread.sleep
 22  0.15% 95.70%   2   476 java.lang.Thread.sleep
 23  0.15% 95.85%   2   692 java.lang.Thread.sleep
 24  0.12% 95.97%  218595   385 java.lang.CharacterDataLatin1.toLowerCase
 25  0.12% 96.09%  218595   408 java.lang.Character.toLowerCase
 26  0.11% 96.20%  218595   433
java.lang.CharacterDataLatin1.getProperties
 27  0.10% 96.30%  210925   389 java.lang.String.equalsIgnoreCase
 28  0.08% 96.38%  157259   387 java.lang.String.charAt
 29  0.08% 96.46%   1   646 java.lang.Thread.sleep
 30  0.08% 96.53%   1   634 java.lang.Thread.sleep
 31  0.08% 96.61%   1   903 java.lang.Thread.sleep
 32  0.08% 96.69%   1   714 java.lang.Thread.sleep
 33  0.08% 96.76%   1   811 java.lang.Thread.sleep
 34  0.08% 96.84%   1   715

RE: tomcat 5.0.16 Replication

2004-01-08 Thread Filip Hanik
ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you
suggested and
I also see the jump in performance.

Something must have changed in ReplicationListener that causes this because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey
available
That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:



Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with
-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.

Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.

Maybe this can help.
If you need the complete hprof file I can send them to you.

1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
  1 11.48% 11.48%  5485 java.lang.Object.wait
  2 11.46% 22.94% 11786 java.lang.Object.wait
  3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
  4 10.93% 44.81%4114   224 java.lang.Thread.sleep
  5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
  6  7.37% 63.09%  28   495 java.lang.Object.wait
  7  7.24% 70.34%  10   576 java.lang.Object.wait
  8  4.57% 74.90%  90   716 java.lang.Thread.sleep
  9  4.48% 79.38%   1   909 java.lang.Object.wait
 10  4.48% 83.86%   1   908 java.lang.Object.wait
 11  4.48% 88.34%  15   810 java.lang.Object.wait
 12  4.47% 92.81%   1   910 java.net.PlainSocketImpl.socketAccept
 13  0.71% 93.52%   2   623 java.lang.Object.wait
 14  0.56% 94.08%   2   706 java.lang.Object.wait
 15  0.38% 94.46%   2   914 java.lang.Object.wait
 16  0.24% 94.70% 775   913 java.lang.String.toCharArray
 17  0.23% 94.93%   3   475 java.lang.Thread.sleep
 18  0.16% 95.09%   2   472 java.lang.Object.wait
 19  0.15% 95.24%   2   595 java.lang.Thread.sleep
 20  0.15% 95.40%   2   586 java.lang.Thread.sleep
 21  0.15% 95.55%   2   703 java.lang.Thread.sleep
 22  0.15% 95.70%   2   476 java.lang.Thread.sleep
 23  0.15% 95.85%   2   692 java.lang.Thread.sleep
 24  0.12% 95.97%  218595   385 java.lang.CharacterDataLatin1.toLowerCase
 25  0.12% 96.09%  218595   408 java.lang.Character.toLowerCase
 26  0.11% 96.20%  218595   433
java.lang.CharacterDataLatin1.getProperties
 27  0.10% 96.30%  210925   389

RE: tomcat 5.0.16 Replication

2004-01-08 Thread Filip Hanik

Jean-Philippe and Steve,
I fixed the bug, and tried replication on RH9. Immediately it didn't work.
The problem is that when RH9 tries to write the ACK back to the NIO socket,
it never reaches the other node. and times out after a long time.

I set LD_ASSUME_KERNEL=2.4 and it started to work

Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 6:43 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


ok guys,
good news. The 100% cpu is totally my fault. I messed up on that one.
I was registering OP_WRITE as an interest
this is not good :)
checking in the working code in 15 min, some more regression tests
Filip

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 2:54 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


another code change was, that I am now accepting keys for OP_READ and
OP_WRITE. before it was only OP_READ,
but for synchronous replication I need both.

this is good info, I just got RH9 installed. will be trying it out this and
next week.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 11:46 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


The only changes in the ReplicationListener class is the try catch that
was added.

the code logic is the same. Weird enough. So it's probably elsewhere
that something changed in the state of the SelectionKey.

Jean-Philippe Bélanger

Steve Nelson wrote:

I was just about to try this actually. I found through googling alot of
people
having problems with select with 1.4 and NIO with Redhat 9. They were
actually
experiencing crashes though.

To verify your results I just put a Thread.Sleep(1); where you
suggested and
I also see the jump in performance.

Something must have changed in ReplicationListener that causes this because
the 5.0.16
version doesn't seem to have the problem. I'll see if I can figure it out
when I get back to where I can diff the files.

-Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:25 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


More content for you Filip.

I've checked and followed the code of the listen event in
ReplicationListener.java

Here's what happening:

selector.select(timeout) - return immediatly with one SelectorKey
available
That key is not Acceptable and not Readable so it immediatly skip those
IFs and loops back to the beginning.

I've put traces and this is executed once every millisecond hence the
100% load on the server.
Just to make sure, I've put a Thread.sleep(10) at the end of the loop
and the CPU dropped back to 0% and the replication still worked nicely
but probably a little slower since the wait of 10ms.

I don't know much about those NIO packages but seams like the
select(timeout) method shouldn't return a SelectorKey of that state.
with any waiting.

Let me know what you can dig from those.

Jean-Philippe Bélanger

[EMAIL PROTECTED] wrote:



Hi Filip.

I did some profiling of 40mins of tomcat with and without a 2nd node
up. here are the results with
-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10:

Those number are cpu=times and not samples since the later one freezes
on my systems.
So that list shows the time spent in each methods.

Major difference the some call to the sun.nio.ch.PollArrayWrapper
class. I don't know much about those NIOs packages but 819000 call in
40 mins is a lot.
The Socket Interface was called more than twice with 2 hosts than with
a single one. Which seams normal.

Maybe this can help.
If you need the complete hprof file I can send them to you.

1 host in cluster:
CPU TIME (ms) BEGIN (total = 19701) Thu Jan  8 10:00:59 2004
rank   self  accum   count trace method
  1 11.48% 11.48%  5485 java.lang.Object.wait
  2 11.46% 22.94% 11786 java.lang.Object.wait
  3 10.95% 33.89%4115   215 java.net.PlainDatagramSocketImpl.receive
  4 10.93% 44.81%4114   224 java.lang.Thread.sleep
  5 10.91% 55.73%   19005   214 sun.nio.ch.PollArrayWrapper.poll0
  6  7.37% 63.09%  28   495 java.lang.Object.wait
  7  7.24% 70.34%  10   576 java.lang.Object.wait
  8  4.57% 74.90%  90   716 java.lang.Thread.sleep
  9  4.48% 79.38%   1   909 java.lang.Object.wait
 10  4.48% 83.86%   1   908 java.lang.Object.wait
 11  4.48% 88.34%  15   810 java.lang.Object.wait
 12  4.47% 92.81%   1   910 java.net.PlainSocketImpl.socketAccept
 13  0.71% 93.52%   2   623 java.lang.Object.wait
 14  0.56% 94.08%   2   706 java.lang.Object.wait
 15  0.38% 94.46%   2   914 java.lang.Object.wait
 16  0.24% 94.70% 775   913 java.lang.String.toCharArray
 17  0.23% 94.93%   3   475 java.lang.Thread.sleep
 18  0.16% 95.09%   2   472 java.lang.Object.wait
 19  0.15% 95.24%   2   595 java.lang.Thread.sleep
 20  0.15% 95.40

RE: tomcat 5.0.16 Replication (This is a Thread is a Duplicate Pl ease Ignore)

2004-01-07 Thread Steve Nelson


RE: tomcat 5.0.16 Replication

2004-01-07 Thread Filip Hanik
my only experience with Redhat 9 is that it doesn't play well with NIO.
I have not successfully ran tomcat clustering on RH9, I use RH8.
I also don't have a RH9 machine at home yet, so I can't develop for it

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: tomcat 5.0.16 Replication

2004-01-07 Thread jean-philippe . belanger
Currently running tomcat 5.0.16 with the CVS HEAD of the replication module.
This is under redhat 9. So far so good.
What kind of problem did you encounter under rh9?

Jean-Philippe Bélanger

Filip Hanik wrote:

my only experience with Redhat 9 is that it doesn't play well with NIO.
I have not successfully ran tomcat clustering on RH9, I use RH8.
I also don't have a RH9 machine at home yet, so I can't develop for it
Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication


I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.
First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.
But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.
Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.
-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: tomcat 5.0.16 Replication

2004-01-07 Thread Filip Hanik
I had socket dead locks in the java.io.OutputStream.write that never
returned, caused the system to eventually hang.
in the next few weeks, I'll try to get a RH9 instance going.
So everything works for you?
Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 11:43 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Currently running tomcat 5.0.16 with the CVS HEAD of the replication module.
This is under redhat 9. So far so good.

What kind of problem did you encounter under rh9?

Jean-Philippe Bélanger

Filip Hanik wrote:

my only experience with Redhat 9 is that it doesn't play well with NIO.
I have not successfully ran tomcat clustering on RH9, I use RH8.
I also don't have a RH9 machine at home yet, so I can't develop for it

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up.
Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it
solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat -
9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3
hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not
throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: tomcat 5.0.16 Replication

2004-01-07 Thread Steve Nelson


My CPU Util jumps to 100% on both processes. It functions properly other
than
maxing the machine. BTW this is with NO load. I am going to try to profile
it but the EJP profile files total over 800 meg for just starting up Tomcat.
And I am off-site so I had to transfer them.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:43 PM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication


Currently running tomcat 5.0.16 with the CVS HEAD of the replication module.
This is under redhat 9. So far so good.

What kind of problem did you encounter under rh9?

Jean-Philippe Bélanger

Filip Hanik wrote:

my only experience with Redhat 9 is that it doesn't play well with NIO.
I have not successfully ran tomcat clustering on RH9, I use RH8.
I also don't have a RH9 machine at home yet, so I can't develop for it

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it
had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved
that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9,
Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in
the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing
millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: tomcat 5.0.16 Replication

2004-01-07 Thread jean-philippe . belanger
Well just to make sure that I wasn't saying something untrue. I went to 
check my redhats.

Everything DOES work fine but it's true that there is some loop 
somewhere because both my tomcat are having abnormal loadavg. ie:  1.15
even with the server are idle.

Jean-Philippe Bélanger

Filip Hanik wrote:

I had socket dead locks in the java.io.OutputStream.write that never
returned, caused the system to eventually hang.
in the next few weeks, I'll try to get a RH9 instance going.
So everything works for you?
Filip
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 11:43 AM
To: Tomcat Users List
Subject: Re: tomcat 5.0.16 Replication
Currently running tomcat 5.0.16 with the CVS HEAD of the replication module.
This is under redhat 9. So far so good.
What kind of problem did you encounter under rh9?

Jean-Philippe Bélanger

Filip Hanik wrote:

 

my only experience with Redhat 9 is that it doesn't play well with NIO.
I have not successfully ran tomcat clustering on RH9, I use RH8.
I also don't have a RH9 machine at home yet, so I can't develop for it
Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication


I was having random problems with clustering when starting up.
   

Mostly it had
 

to do with Timing out
when the manager was starting up. I built the CVS version and it
   

solved that
 

problem. But it has caused
some serious performance problems.
First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat -
   

9, Tomcat
 

5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3
   

hosts in the
 

server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.
Oh, and there isn't anything relevant in my logs. It's not
   

throwing millions
 

of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


   



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: tomcat 5.0.16 Replication

2004-01-07 Thread Filip Hanik
100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: tomcat 5.0.16 Replication

2004-01-07 Thread Steve Nelson


Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:

*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from CVS.
Of course the Manager would almost always timeout before it would recieve
the message.

Now it gets the message right away, but maxes my machine out.




-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: tomcat 5.0.16 Replication

2004-01-07 Thread Filip Hanik
I'll try to get an instance going today. Will let you know how it goes
also, try asynchronous replication, does it still go to 100%?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 12:08 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:

*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from CVS.
Of course the Manager would almost always timeout before it would recieve
the message.

Now it gets the message right away, but maxes my machine out.




-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: tomcat 5.0.16 Replication

2004-01-07 Thread Steve Nelson

Yep, also happens when I use asynch. I couldn't get the profiling files to
load on the machine I am using right now, when I get back to the servers
I'll
try to figure out what is eating up all the CPUalthough TOP tells me
arround
30% of the ute is system level as opposed the the java executable. Sounds
like
alot of the load may be in system calls.

-Steve


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:47 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


I'll try to get an instance going today. Will let you know how it goes
also, try asynchronous replication, does it still go to 100%?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 12:08 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:

*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from CVS.
Of course the Manager would almost always timeout before it would recieve
the message.

Now it gets the message right away, but maxes my machine out.




-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: tomcat 5.0.16 Replication

2004-01-07 Thread Steve Nelson


Okay, I reverted back to the 5.0.16 version and now I don't have the high
CPU ute.
But it takes almost 60 seconds for the Manager to request the session state.
Which
causes it to fail to synch about half the time. 

Must be something in the Synch code. Which comes back to your original
comments about the NIO stuff and RH9 not liking Java in general. Is there a
known fix for making things right with RH9? I could try that.

-Steve



-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:53 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Yep, also happens when I use asynch. I couldn't get the profiling files to
load on the machine I am using right now, when I get back to the servers
I'll
try to figure out what is eating up all the CPUalthough TOP tells me
arround
30% of the ute is system level as opposed the the java executable. Sounds
like
alot of the load may be in system calls.

-Steve


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:47 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


I'll try to get an instance going today. Will let you know how it goes
also, try asynchronous replication, does it still go to 100%?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 12:08 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:

*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from CVS.
Of course the Manager would almost always timeout before it would recieve
the message.

Now it gets the message right away, but maxes my machine out.




-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: tomcat 5.0.16 Replication

2004-01-07 Thread Steve Nelson

Heh, now I am replying to myself :P

I tried 
export set LD_ASSUME_KERNEL=2.4.1
No change in Behaviour
then I tried
export set LD_ASSUME_KERNEL=2.2.5
again, no change.

I restarted both servers between runs. I still get the CPU going crazy
Scenario.

-Steve

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 3:03 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, I reverted back to the 5.0.16 version and now I don't have the high
CPU ute.
But it takes almost 60 seconds for the Manager to request the session state.
Which
causes it to fail to synch about half the time. 

Must be something in the Synch code. Which comes back to your original
comments about the NIO stuff and RH9 not liking Java in general. Is there a
known fix for making things right with RH9? I could try that.

-Steve



-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:53 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Yep, also happens when I use asynch. I couldn't get the profiling files to
load on the machine I am using right now, when I get back to the servers
I'll
try to figure out what is eating up all the CPUalthough TOP tells me
arround
30% of the ute is system level as opposed the the java executable. Sounds
like
alot of the load may be in system calls.

-Steve


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:47 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


I'll try to get an instance going today. Will let you know how it goes
also, try asynchronous replication, does it still go to 100%?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 12:08 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:

*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from CVS.
Of course the Manager would almost always timeout before it would recieve
the message.

Now it gets the message right away, but maxes my machine out.




-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: tomcat 5.0.16 Replication

2004-01-07 Thread Filip Hanik
you should do

export LD_ASSUME_KERNEL=2.4.1

not

export set LD_ASSUME_KERNEL=2.4.1

in regular bash shell

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:38 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Heh, now I am replying to myself :P

I tried
export set LD_ASSUME_KERNEL=2.4.1
No change in Behaviour
then I tried
export set LD_ASSUME_KERNEL=2.2.5
again, no change.

I restarted both servers between runs. I still get the CPU going crazy
Scenario.

-Steve

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 3:03 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, I reverted back to the 5.0.16 version and now I don't have the high
CPU ute.
But it takes almost 60 seconds for the Manager to request the session state.
Which
causes it to fail to synch about half the time.

Must be something in the Synch code. Which comes back to your original
comments about the NIO stuff and RH9 not liking Java in general. Is there a
known fix for making things right with RH9? I could try that.

-Steve



-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:53 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Yep, also happens when I use asynch. I couldn't get the profiling files to
load on the machine I am using right now, when I get back to the servers
I'll
try to figure out what is eating up all the CPUalthough TOP tells me
arround
30% of the ute is system level as opposed the the java executable. Sounds
like
alot of the load may be in system calls.

-Steve


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:47 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


I'll try to get an instance going today. Will let you know how it goes
also, try asynchronous replication, does it still go to 100%?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 12:08 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:

*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from CVS.
Of course the Manager would almost always timeout before it would recieve
the message.

Now it gets the message right away, but maxes my machine out.




-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: tomcat 5.0.16 Replication

2004-01-07 Thread Steve Nelson

Ends up doing the same thing.

The variable was set. I checked it with an echo.



-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 4:05 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


you should do

export LD_ASSUME_KERNEL=2.4.1

not

export set LD_ASSUME_KERNEL=2.4.1

in regular bash shell

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:38 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Heh, now I am replying to myself :P

I tried
export set LD_ASSUME_KERNEL=2.4.1
No change in Behaviour
then I tried
export set LD_ASSUME_KERNEL=2.2.5
again, no change.

I restarted both servers between runs. I still get the CPU going crazy
Scenario.

-Steve

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 3:03 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, I reverted back to the 5.0.16 version and now I don't have the high
CPU ute.
But it takes almost 60 seconds for the Manager to request the session state.
Which
causes it to fail to synch about half the time.

Must be something in the Synch code. Which comes back to your original
comments about the NIO stuff and RH9 not liking Java in general. Is there a
known fix for making things right with RH9? I could try that.

-Steve



-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:53 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication



Yep, also happens when I use asynch. I couldn't get the profiling files to
load on the machine I am using right now, when I get back to the servers
I'll
try to figure out what is eating up all the CPUalthough TOP tells me
arround
30% of the ute is system level as opposed the the java executable. Sounds
like
alot of the load may be in system calls.

-Steve


-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 2:47 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


I'll try to get an instance going today. Will let you know how it goes
also, try asynchronous replication, does it still go to 100%?

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 12:08 PM
To: 'Tomcat Users List'
Subject: RE: tomcat 5.0.16 Replication




Okay, did that got this

BEGIN TO RECEIVE
SENT:Default 1
RECEIVED:Default 1 FROM /10.0.0.110:
SENT:Default 2
BEGIN TO RECEIVE
RECEIVED:Default 2 FROM /10.0.0.110:
SENT:Default 3
BEGIN TO RECEIVE
RECEIVED:Default 3 FROM /10.0.0.110:
SENT:Default 4
BEGIN TO RECEIVE
RECEIVED:Default 4 FROM /10.0.0.110:

*shrug*

BTW It didn't go to 100% CPU ute before I started using the code from CVS.
Of course the Manager would almost always timeout before it would recieve
the message.

Now it gets the message right away, but maxes my machine out.




-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 1:58 PM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


100% cpu can mean that you have a multicast problem, try to run

java -cp tomcat-replication.jar MCaster

download the jar from http://cvs.apache.org/~fhanik/

Filip

-Original Message-
From: Steve Nelson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:51 AM
To: '[EMAIL PROTECTED]'
Subject: tomcat 5.0.16 Replication



I was having random problems with clustering when starting up. Mostly it had
to do with Timing out
when the manager was starting up. I built the CVS version and it solved that
problem. But it has caused
some serious performance problems.

First a little background.

I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat
5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are
restricted to a crossover link between the servers. There are 3 hosts in the
server.xml, all with clustering set up. They all function just fine.

But.the cpu's spikes up to 100% if I start up both servers. I know this
didn't happen without the new catalina-cluster.jar. If I shut down 1 server
(doesn't matter which) everything returns to normal. But when both are
running both servers are at 100% CPU. I am trying to profile it now, but I
figured if someone has already experienced this they could save me some
time.

Oh, and there isn't anything relevant in my logs. It's not throwing millions
of errors or something.

-Steve Nelson



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED

RE: tomcat 5.0.16 Replication

2004-01-05 Thread jean-philippe . belanger
I built the latest CVS branch for the clustering module and replaced my
catalina-cluster.jar.

Seams like everything is synchronous as stated.

I had another unrelated problem with a IFRAME that IE seams to load before
the server (tomcat) ends the request. So even if everything was synchronous
the iframe request could be done by IE before the actual parent page was
done replicating.

I'll let you know if any other problem gets by since that release will be
going thru intensive testing in the coming weeks.

Thanks

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Saturday, January 03, 2004 8:07 PM
To: Tomcat Users List; [EMAIL PROTECTED]
Subject: RE: tomcat 5.0.16 Replication


it will come out in the next release.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 31, 2003 9:41 AM
To: Tomcat-user
Subject: tomcat 5.0.16 Replication


The new tomcat 5.0.16 replication seams to work odly.

From what I've read from the documentation and the mailing list,
the clustering is supposed to be done synchronously.

Right?

Well that's not what's happening on my end, the client receives the
response before the whole replication thing is done.

ex:

I got a webpage that fetches data and if data is found put it in
the session and return a webpage containing a IFRAME.
In the IFRAME, the src hits a webpage on the same cluster and loads
up the data found in the session and display it.

Well sometimes, when the IFRAME is shown and that hit is forwarded
to different server than the first access, the data in the session is empty.
Then once the page is loaded (empty) and returned to the client, I
get the replication message in my logs. (The message containing the
data that was supposed to be already replicated).

[Cluster config]
Cluster
className=org.apache.catalina.cluster.tcp.SimpleTcpCluster
  name=PortalClusterJP
  debug=10

serviceclass=org.apache.catalina.cluster.mcast.McastService
  mcastAddr=228.0.0.5
  mcastPort=45564
  mcastFrequency=500
  mcastDropTime=3000
  tcpThreadCount=2
  tcpListenAddress=auto
  tcpListenPort=4001
  tcpSelectorTimeout=100
  printToScreen=true
  expireSessionsOnShutdown=false
  useDirtyFlag=true
  replicationMode=synchronous
/

Valve className=org.apache.catalina.cluster.tcp.ReplicationValve
   filter=.*\.gif;.*\.js;.*\.jpg;.*\.htm;.*\.html;.*\.txt;/

[end Cluster Config]

Any idea on what could be going wrong here?

Jean-Philippe Bélanger
CGI


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: tomcat 5.0.16 Replication

2004-01-05 Thread Filip Hanik
clustering doesn't support frames.
synchronizing everything down to that level would cause overhead, so I
decided against supporting it.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Monday, January 05, 2004 6:46 AM
To: Tomcat Users List
Subject: RE: tomcat 5.0.16 Replication


I built the latest CVS branch for the clustering module and replaced my
catalina-cluster.jar.

Seams like everything is synchronous as stated.

I had another unrelated problem with a IFRAME that IE seams to load before
the server (tomcat) ends the request. So even if everything was synchronous
the iframe request could be done by IE before the actual parent page was
done replicating.

I'll let you know if any other problem gets by since that release will be
going thru intensive testing in the coming weeks.

Thanks

-Original Message-
From: Filip Hanik [mailto:[EMAIL PROTECTED]
Sent: Saturday, January 03, 2004 8:07 PM
To: Tomcat Users List; [EMAIL PROTECTED]
Subject: RE: tomcat 5.0.16 Replication


it will come out in the next release.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 31, 2003 9:41 AM
To: Tomcat-user
Subject: tomcat 5.0.16 Replication


The new tomcat 5.0.16 replication seams to work odly.

From what I've read from the documentation and the mailing list,
the clustering is supposed to be done synchronously.

Right?

Well that's not what's happening on my end, the client receives the
response before the whole replication thing is done.

ex:

I got a webpage that fetches data and if data is found put it in
the session and return a webpage containing a IFRAME.
In the IFRAME, the src hits a webpage on the same cluster and loads
up the data found in the session and display it.

Well sometimes, when the IFRAME is shown and that hit is forwarded
to different server than the first access, the data in the session is empty.
Then once the page is loaded (empty) and returned to the client, I
get the replication message in my logs. (The message containing the
data that was supposed to be already replicated).

[Cluster config]
Cluster
className=org.apache.catalina.cluster.tcp.SimpleTcpCluster
  name=PortalClusterJP
  debug=10

serviceclass=org.apache.catalina.cluster.mcast.McastService
  mcastAddr=228.0.0.5
  mcastPort=45564
  mcastFrequency=500
  mcastDropTime=3000
  tcpThreadCount=2
  tcpListenAddress=auto
  tcpListenPort=4001
  tcpSelectorTimeout=100
  printToScreen=true
  expireSessionsOnShutdown=false
  useDirtyFlag=true
  replicationMode=synchronous
/

Valve className=org.apache.catalina.cluster.tcp.ReplicationValve
   filter=.*\.gif;.*\.js;.*\.jpg;.*\.htm;.*\.html;.*\.txt;/

[end Cluster Config]

Any idea on what could be going wrong here?

Jean-Philippe Bélanger
CGI


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: tomcat 5.0.16 Replication

2004-01-03 Thread Filip Hanik
it will come out in the next release.

Filip

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 31, 2003 9:41 AM
To: Tomcat-user
Subject: tomcat 5.0.16 Replication


The new tomcat 5.0.16 replication seams to work odly.

From what I've read from the documentation and the mailing list,
the clustering is supposed to be done synchronously.

Right?

Well that's not what's happening on my end, the client receives the
response before the whole replication thing is done.

ex:

I got a webpage that fetches data and if data is found put it in
the session and return a webpage containing a IFRAME.
In the IFRAME, the src hits a webpage on the same cluster and loads
up the data found in the session and display it.

Well sometimes, when the IFRAME is shown and that hit is forwarded
to different server than the first access, the data in the session is empty.
Then once the page is loaded (empty) and returned to the client, I
get the replication message in my logs. (The message containing the
data that was supposed to be already replicated).

[Cluster config]
Cluster
className=org.apache.catalina.cluster.tcp.SimpleTcpCluster
  name=PortalClusterJP
  debug=10

serviceclass=org.apache.catalina.cluster.mcast.McastService
  mcastAddr=228.0.0.5
  mcastPort=45564
  mcastFrequency=500
  mcastDropTime=3000
  tcpThreadCount=2
  tcpListenAddress=auto
  tcpListenPort=4001
  tcpSelectorTimeout=100
  printToScreen=true
  expireSessionsOnShutdown=false
  useDirtyFlag=true
  replicationMode=synchronous
/

Valve className=org.apache.catalina.cluster.tcp.ReplicationValve
   filter=.*\.gif;.*\.js;.*\.jpg;.*\.htm;.*\.html;.*\.txt;/

[end Cluster Config]

Any idea on what could be going wrong here?

Jean-Philippe Bélanger
CGI


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]