Re: Network performance issues under heavy load

2023-01-20 Thread Emmanuel Lécharny


On Fri, Jan 20, 2023 at 1:29 PM Marc Boorshtein > wrote:



I would say that we only have a limited number of threads
dedicated to
process the incoming messages, and this number is computed based
on the
number of core you have on your machine.


Is this configurable?  I'd like to be able to adjust to figure out
if it makes an impact.



There is a ads-transportNbThreads parameter that defaults to 3 which is 
pretty low. You should be able to tweak it:



dn: 
ads-transportid=ldap,ou=transports,ads-serverId=ldapServer,ou=servers,ads-directoryServiceId=default,ou=config

ads-systemport: 10389
ads-transportnbthreads: 8  <- Here
ads-transportaddress: 0.0.0.0
ads-transportid: ldap
objectclass: ads-transport
objectclass: ads-tcpTransport
objectClass: ads-base
objectclass: top
ads-enabled: TRUE





First thing: is your server toping at 100% CPU? With Z GC it
should not
stop anything...


No, not even close


Have you profiled the server while running under your stress
test (not
tracing, but sampling)?


Do you have a tool you could recommend?



I'm a user of YourtKit, but abny profiler can do.




Second thing: are you sure that it occurs when the GC kick on?


It's a guess.

On 20/01/2023 15:57, Marc Boorshtein wrote:
 > We're using ApacheDS as a frontend for MyVD,
 > running 2.0.0.AM27-SNAPSHOT.  We're finding that under heavy
load (~300
 > concurrent connections) we'll periodically get "broken pipe"
errors from
 > the client.  i can reproduce this pretty easily with jmeter's
LDAP
 > module.  The errors tend to come in bunches and when there is
a garbage
 > collection event (under really heavy loads you can see the
logs slow
 > down momentarily and then the errors occur.
 >
 > My test bed is a mac m2 running java 17, however the server
is an amazon
 > m5a. running the correta java 18 jvm with the ZGC garbage
collector.
 > here;s the JVM switches:
 >
 > -Xms4g -Xmx4g -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
 > -Dsun.net.client.defaultConnectTimeout=1
 > -Dsun.net.client.defaultReadTimeout=2
 >
 >
 >   We're not seeing any issues in the myvd portion of the
system, now in
 > the down stream directories being proxied.  Also, if we add an
 > artificial bottleneck by dropping the connection pool size
from 300 to
 > 50, but still maintaining 300 clients, the issue decreases
dramatically.
 >
 > Any thoughts as to where i can start debugging this issue?  A
thread
 > dump analysis doesn't show any deadlocks.
 >
 > Thanks
 > Marc

-- 
*Emmanuel Lécharny - CTO* 205 Promenade des Anglais – 06200 NICE

T. +33 (0)4 89 97 36 50
P. +33 (0)6 08 33 32 61
emmanuel.lecha...@busit.com 
https://www.busit.com/ 

-
To unsubscribe, e-mail: dev-unsubscr...@directory.apache.org

For additional commands, e-mail: dev-h...@directory.apache.org




--
*Emmanuel Lécharny - CTO* 205 Promenade des Anglais – 06200 NICE
T. +33 (0)4 89 97 36 50
P. +33 (0)6 08 33 32 61
emmanuel.lecha...@busit.com https://www.busit.com/

-
To unsubscribe, e-mail: dev-unsubscr...@directory.apache.org
For additional commands, e-mail: dev-h...@directory.apache.org



Re: Network performance issues under heavy load

2023-01-20 Thread Emmanuel Lécharny




On 20/01/2023 20:28, Marc Boorshtein wrote:
Here's an additional datapoint.  Enabling TLS eliminates the issue 
entirely (not LDAP+StartTLS, but just straight LDAPS).  Once I enabled 
TLS I was able to hammer MyVD with 300+ inbound connections and 300 
outbound connections and it worked great!



This is extra weird...

TLS is managed entirely by MINA, and I can't see how enabling something 
that eats CPU can make the server running faster :/


At this point, a thread dump could help...




So while I'd love to say "we'll just use LDAPS", the customer would 
prefer not to so I need to keep digging into this.


On Fri, Jan 20, 2023 at 1:29 PM Marc Boorshtein > wrote:



I would say that we only have a limited number of threads
dedicated to
process the incoming messages, and this number is computed based
on the
number of core you have on your machine.


Is this configurable?  I'd like to be able to adjust to figure out
if it makes an impact.


First thing: is your server toping at 100% CPU? With Z GC it
should not
stop anything...


No, not even close


Have you profiled the server while running under your stress
test (not
tracing, but sampling)?


Do you have a tool you could recommend?

Second thing: are you sure that it occurs when the GC kick on?


It's a guess.

On 20/01/2023 15:57, Marc Boorshtein wrote:
 > We're using ApacheDS as a frontend for MyVD,
 > running 2.0.0.AM27-SNAPSHOT.  We're finding that under heavy
load (~300
 > concurrent connections) we'll periodically get "broken pipe"
errors from
 > the client.  i can reproduce this pretty easily with jmeter's
LDAP
 > module.  The errors tend to come in bunches and when there is
a garbage
 > collection event (under really heavy loads you can see the
logs slow
 > down momentarily and then the errors occur.
 >
 > My test bed is a mac m2 running java 17, however the server
is an amazon
 > m5a. running the correta java 18 jvm with the ZGC garbage
collector.
 > here;s the JVM switches:
 >
 > -Xms4g -Xmx4g -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
 > -Dsun.net.client.defaultConnectTimeout=1
 > -Dsun.net.client.defaultReadTimeout=2
 >
 >
 >   We're not seeing any issues in the myvd portion of the
system, now in
 > the down stream directories being proxied.  Also, if we add an
 > artificial bottleneck by dropping the connection pool size
from 300 to
 > 50, but still maintaining 300 clients, the issue decreases
dramatically.
 >
 > Any thoughts as to where i can start debugging this issue?  A
thread
 > dump analysis doesn't show any deadlocks.
 >
 > Thanks
 > Marc

-- 
*Emmanuel Lécharny - CTO* 205 Promenade des Anglais – 06200 NICE

T. +33 (0)4 89 97 36 50
P. +33 (0)6 08 33 32 61
emmanuel.lecha...@busit.com 
https://www.busit.com/ 

-
To unsubscribe, e-mail: dev-unsubscr...@directory.apache.org

For additional commands, e-mail: dev-h...@directory.apache.org




--
*Emmanuel Lécharny - CTO* 205 Promenade des Anglais – 06200 NICE
T. +33 (0)4 89 97 36 50
P. +33 (0)6 08 33 32 61
emmanuel.lecha...@busit.com https://www.busit.com/

-
To unsubscribe, e-mail: dev-unsubscr...@directory.apache.org
For additional commands, e-mail: dev-h...@directory.apache.org



Re: Network performance issues under heavy load

2023-01-20 Thread Marc Boorshtein
Here's an additional datapoint.  Enabling TLS eliminates the issue entirely
(not LDAP+StartTLS, but just straight LDAPS).  Once I enabled TLS I was
able to hammer MyVD with 300+ inbound connections and 300 outbound
connections and it worked great!

So while I'd love to say "we'll just use LDAPS", the customer would prefer
not to so I need to keep digging into this.

On Fri, Jan 20, 2023 at 1:29 PM Marc Boorshtein 
wrote:

>
>> I would say that we only have a limited number of threads dedicated to
>> process the incoming messages, and this number is computed based on the
>> number of core you have on your machine.
>>
>
> Is this configurable?  I'd like to be able to adjust to figure out if it
> makes an impact.
>
>
>>
>> First thing: is your server toping at 100% CPU? With Z GC it should not
>> stop anything...
>>
>
> No, not even close
>
>
>>
>> Have you profiled the server while running under your stress test (not
>> tracing, but sampling)?
>>
>>
> Do you have a tool you could recommend?
>
>
>> Second thing: are you sure that it occurs when the GC kick on?
>>
>>
> It's a guess.
>
>
>> On 20/01/2023 15:57, Marc Boorshtein wrote:
>> > We're using ApacheDS as a frontend for MyVD,
>> > running 2.0.0.AM27-SNAPSHOT.  We're finding that under heavy load (~300
>> > concurrent connections) we'll periodically get "broken pipe" errors
>> from
>> > the client.  i can reproduce this pretty easily with jmeter's LDAP
>> > module.  The errors tend to come in bunches and when there is a garbage
>> > collection event (under really heavy loads you can see the logs slow
>> > down momentarily and then the errors occur.
>> >
>> > My test bed is a mac m2 running java 17, however the server is an
>> amazon
>> > m5a. running the correta java 18 jvm with the ZGC garbage collector.
>> > here;s the JVM switches:
>> >
>> > -Xms4g -Xmx4g -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
>> > -Dsun.net.client.defaultConnectTimeout=1
>> > -Dsun.net.client.defaultReadTimeout=2
>> >
>> >
>> >   We're not seeing any issues in the myvd portion of the system, now in
>> > the down stream directories being proxied.  Also, if we add an
>> > artificial bottleneck by dropping the connection pool size from 300 to
>> > 50, but still maintaining 300 clients, the issue decreases dramatically.
>> >
>> > Any thoughts as to where i can start debugging this issue?  A thread
>> > dump analysis doesn't show any deadlocks.
>> >
>> > Thanks
>> > Marc
>>
>> --
>> *Emmanuel Lécharny - CTO* 205 Promenade des Anglais – 06200 NICE
>> T. +33 (0)4 89 97 36 50
>> P. +33 (0)6 08 33 32 61
>> emmanuel.lecha...@busit.com https://www.busit.com/
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@directory.apache.org
>> For additional commands, e-mail: dev-h...@directory.apache.org
>>
>>


Re: Network performance issues under heavy load

2023-01-20 Thread Emmanuel Lécharny

Hi Marc,

not any obvious clue.

I would say that we only have a limited number of threads dedicated to 
process the incoming messages, and this number is computed based on the 
number of core you have on your machine.


First thing: is your server toping at 100% CPU? With Z GC it should not 
stop anything...


Have you profiled the server while running under your stress test (not 
tracing, but sampling)?


Second thing: are you sure that it occurs when the GC kick on?

On 20/01/2023 15:57, Marc Boorshtein wrote:
We're using ApacheDS as a frontend for MyVD, 
running 2.0.0.AM27-SNAPSHOT.  We're finding that under heavy load (~300 
concurrent connections) we'll periodically get "broken pipe" errors from 
the client.  i can reproduce this pretty easily with jmeter's LDAP 
module.  The errors tend to come in bunches and when there is a garbage 
collection event (under really heavy loads you can see the logs slow 
down momentarily and then the errors occur.


My test bed is a mac m2 running java 17, however the server is an amazon 
m5a. running the correta java 18 jvm with the ZGC garbage collector.  
here;s the JVM switches:


-Xms4g -Xmx4g -XX:+UnlockExperimentalVMOptions -XX:+UseZGC 
-Dsun.net.client.defaultConnectTimeout=1 
-Dsun.net.client.defaultReadTimeout=2



  We're not seeing any issues in the myvd portion of the system, now in 
the down stream directories being proxied.  Also, if we add an 
artificial bottleneck by dropping the connection pool size from 300 to 
50, but still maintaining 300 clients, the issue decreases dramatically.


Any thoughts as to where i can start debugging this issue?  A thread 
dump analysis doesn't show any deadlocks.


Thanks
Marc


--
*Emmanuel Lécharny - CTO* 205 Promenade des Anglais – 06200 NICE
T. +33 (0)4 89 97 36 50
P. +33 (0)6 08 33 32 61
emmanuel.lecha...@busit.com https://www.busit.com/

-
To unsubscribe, e-mail: dev-unsubscr...@directory.apache.org
For additional commands, e-mail: dev-h...@directory.apache.org