Re: [Mono-dev] Memory usage on Mono remoting

2009-07-15 Thread pablosantosl...@terra.es
Hi Gonzalo,

First, thanks for the detailed answers.


 Considering Boehm GC seems to have really hard times releasing memory
 and we're delivering GBs of data... it could be.
 

 Delivering GBs of data and having hundreds of connections should not be
 a problem. Years ago, when testing iFolder under those conditions
 everything worked just fine. But it was mod-mono-server/apache.

   
Dick is actually checking this. I hope it's not the issue.
 I'm not 100% sure, but it seems reusing buffers could be a very good idea.
 

 Xsp does it too and it's much better than allocating 32kB for every
 request every time.

   
Good.
 Also, you mentioned in a previous email that the TcpChannel should be
 changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP.

 My question is: I guess AsynchCallback will use a thread underneath,
 won't it? If so: what's the advantage over launching threads to accept
 calls?
 

 Your guess is wrong. Those asynchronous calls from Socket are treated as
 if they were a WorkItem for a ThreadPool, only that when they are made,
 the socket is added to an epoll fd (if you're on linux with support for
 epoll). And when there's an event in the socket, there's a dedicated IO
 threadpool to take care of reading/writing data and invoking the
 callbacks. The advantages: if you have 10k connections, you don't need
 10k threads, threads are reused (no creation overhead), ...

   
Ok, of course. Well, when I said launching a thread I really meant
launching a thread on a thread pool.

Well, I'll try to use the ansync sockets then, but I guess to get the
best out of them I'll need not only to use them during accept, but also
read data asynchronously, right?

BTW, I already replaced the built-in remoting threadpool by the
System.Threading one.


 You mentioned it is better to use the default ThreadPool instead of the
 internal one in the TcpChannel, why is it going to be better?
 

 Coupled with asynchronous I/O, it will make better use of the resources
 available. There's no need to create 100 threads for 100 client or
 having 1 threadpool thread blocking on a socket asynchronous
 operation,... Also, if you're thinking of reusing buffers, this helps
 too, as the number of buffers will be bound to the maximum number of
 threads in the threadpool, ...
   
Good, so, whenever I wait for a read or a write using async, the thread
should be able to work on another request?

I think this is the way it's implemented on Windows, but I can tell you
it is still created a huge number of threads, almost 1-1 with clients
under the most overloaded scenarios.


Thanks,

pablo


 -Gonzalo




   
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-15 Thread Gonzalo Paniagua Javier
On Wed, 2009-07-15 at 13:29 +0200, pablosantosl...@terra.es wrote:
[...]
  Your guess is wrong. Those asynchronous calls from Socket are treated as
  if they were a WorkItem for a ThreadPool, only that when they are made,
  the socket is added to an epoll fd (if you're on linux with support for
  epoll). And when there's an event in the socket, there's a dedicated IO
  threadpool to take care of reading/writing data and invoking the
  callbacks. The advantages: if you have 10k connections, you don't need
  10k threads, threads are reused (no creation overhead), ...
 

 Ok, of course. Well, when I said launching a thread I really meant
 launching a thread on a thread pool.
 
 Well, I'll try to use the ansync sockets then, but I guess to get the
 best out of them I'll need not only to use them during accept, but also
 read data asynchronously, right?

Correct. If possible, Write should also be asynchronous, but as long as
the OS buffers everything, there should be no problem.
[...]
  Coupled with asynchronous I/O, it will make better use of the resources
  available. There's no need to create 100 threads for 100 client or
  having 1 threadpool thread blocking on a socket asynchronous
  operation,... Also, if you're thinking of reusing buffers, this helps
  too, as the number of buffers will be bound to the maximum number of
  threads in the threadpool, ...

 Good, so, whenever I wait for a read or a write using async, the thread
 should be able to work on another request?

Correct. In fact, you don't 'wait' for an asynchronous read or write.
For instance, when you call BeginRead, the socket is added to an epoll
fd and you BeginRead call returns immediately. The callback you
provided, if any, will be called from a different thread as soon as new
data is available. Just don't spoil it by doing something like
socket.EndRead (socket.BeginRead (...)))   ;-)

 I think this is the way it's implemented on Windows, but I can tell you
 it is still created a huge number of threads, almost 1-1 with clients
 under the most overloaded scenarios.

The number of threads in the current threadpool can be configured using
MONO_THREADS_PER_CPU (see mono(1)) and SetMaxThreads. I don't remember
the numbers, but if you have a dual-quad core, it means ~140 threads. I
would adjust the maximum number of threads until the performance is more
or less the same than if you add a few more threads, but this may vary
depending on what the threads are doing...

-Gonzalo





___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-15 Thread pablosantosl...@terra.es
Hi Gonzalo,

 Well, I'll try to use the ansync sockets then, but I guess to get the
 best out of them I'll need not only to use them during accept, but also
 read data asynchronously, right?
 

 Correct. If possible, Write should also be asynchronous, but as long as
 the OS buffers everything, there should be no problem.
 [...]
   

Ok, well, in fact I think I'm experiencing this right now: I've 112
clients against the same server, each of them will download about 300Mb.

I see how the thread pool gets full (quad core) and then new requests
are accepted but not scheduled.

CPU is almost doing nothing, so I guess all threads are waiting for the
synchronous socket write to complete, so performance could be much higher.

 Coupled with asynchronous I/O, it will make better use of the resources
 available. There's no need to create 100 threads for 100 client or
 having 1 threadpool thread blocking on a socket asynchronous
 operation,... Also, if you're thinking of reusing buffers, this helps
 too, as the number of buffers will be bound to the maximum number of
 threads in the threadpool, ...
   
   
 Good, so, whenever I wait for a read or a write using async, the thread
 should be able to work on another request?
 

 Correct. In fact, you don't 'wait' for an asynchronous read or write.
 For instance, when you call BeginRead, the socket is added to an epoll
 fd and you BeginRead call returns immediately. The callback you
 provided, if any, will be called from a different thread as soon as new
 data is available. Just don't spoil it by doing something like
 socket.EndRead (socket.BeginRead (...)))   ;-)
   
ok ! :-P

Well, I see it will mean a good number of changes in the Channel.


Besides, as I also told Dick and Dave on a separate thread, it seems the
latest MySql provider has a HUGE memory leak (ok, or it never frees
mem), which is causing a good number of problems to my test.

And, it seems I also have to do something with my code since all data
requests are reading byte[] in blocks of 4Mb, but not reusing the
buffers at all, so when a lot of threads are doing its job, huge amounts
of mem are being allocated and giving extra work to the GC which should
be completely avoided.

I'm a little bit concerned about the following, though:

If you've a method like byte[] GetData() which is going to be invoked
through remoting, even if you do custom serialization (maybe creating a
DataArray class) two copies of the data will be created: one the
original you created, and a second one in the serialization buffer prior
to be sent to the network... :-(

pablo
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-14 Thread pablosantosl...@terra.es
Hi Gonzalo,

 Testing PlasticSCM under really heavy load (hundreds of clients against
 a single server delivering hundreds of Gb over the network).
 

 So no profiling...

   
Comparing the same code, the same hardware and the same test under Linux
and Windows. On Windows we run under .NET, and the test passes successfully.

Dick (in CC) is looking into GC problems.

 System.Web uses unmanaged chunks of memory together with an unmanaged
 memory based stream.
   
   
 That's what I need. Can you point me to the right class?
 

 It's HttpResponseStream.cs all the *Bucket classes that then use an
 IntPtrStream.

 I still don't think that allocating MemoryStream (256 bytes by default)
 is hurting that bad. 
Considering Boehm GC seems to have really hard times releasing memory
and we're delivering GBs of data... it could be.

I'm not 100% sure, but it seems reusing buffers could be a very good idea.

 A wild guess is that the BufferedStream wrapping
 the NetworkStream is allocating much more memory (4kB by default). But
 if the code is rewritten following what xsp does, this should not be a
 problem any more.
   
Ok, I'm not familiar with xsp, I'll take a look.

I've just noticed that you use send from libc instead of the socket
functions... I guess it is due to performance reasons, right?

Also, you mentioned in a previous email that the TcpChannel should be
changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP.

My question is: I guess AsynchCallback will use a thread underneath,
won't it? If so: what's the advantage over launching threads to accept
calls?

You mentioned it is better to use the default ThreadPool instead of the
internal one in the TcpChannel, why is it going to be better?

Thanks again Gonzalo,

pablo


www.plasticscm.com



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-14 Thread Robert Jordan
pablosantosl...@terra.es wrote:
 Also, you mentioned in a previous email that the TcpChannel should be
 changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP.
 
 My question is: I guess AsynchCallback will use a thread underneath,
 won't it? If so: what's the advantage over launching threads to accept
 calls?

The thread pool threads are already launched and they are
maintained by the runtime which is known to be faster than a
managed thread pool implementation.

Additionally, the default thread pool is already configurable
(see mono(1)).

 You mentioned it is better to use the default ThreadPool instead of the
 internal one in the TcpChannel, why is it going to be better?

See above.

Robert

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-14 Thread pablosantosl...@terra.es
Thanks Robert,

So:

I'll modify the TcpChannel to use ThreadPool at System.Threading instead
of the built-in one.



Robert Jordan wrote:
 pablosantosl...@terra.es wrote:
   
 Also, you mentioned in a previous email that the TcpChannel should be
 changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP.

 My question is: I guess AsynchCallback will use a thread underneath,
 won't it? If so: what's the advantage over launching threads to accept
 calls?
 

 The thread pool threads are already launched and they are
 maintained by the runtime which is known to be faster than a
 managed thread pool implementation.

 Additionally, the default thread pool is already configurable
 (see mono(1)).

   
 You mentioned it is better to use the default ThreadPool instead of the
 internal one in the TcpChannel, why is it going to be better?
 

 See above.

 Robert

 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list

   
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-14 Thread Gonzalo Paniagua Javier
On Tue, 2009-07-14 at 11:12 +0200, pablosantosl...@terra.es wrote:
 Hi Gonzalo,
 
  Testing PlasticSCM under really heavy load (hundreds of clients against
  a single server delivering hundreds of Gb over the network).
  
 
  So no profiling...
 

 Comparing the same code, the same hardware and the same test under Linux
 and Windows. On Windows we run under .NET, and the test passes successfully.

Oh, when I said profiling I meant the mono --profile=stat or similar
that can tell you what and where is being allocated.
[...]
  It's HttpResponseStream.cs all the *Bucket classes that then use an
  IntPtrStream.
 
  I still don't think that allocating MemoryStream (256 bytes by default)
  is hurting that bad. 
 Considering Boehm GC seems to have really hard times releasing memory
 and we're delivering GBs of data... it could be.

Delivering GBs of data and having hundreds of connections should not be
a problem. Years ago, when testing iFolder under those conditions
everything worked just fine. But it was mod-mono-server/apache.

 I'm not 100% sure, but it seems reusing buffers could be a very good idea.

Xsp does it too and it's much better than allocating 32kB for every
request every time.

  A wild guess is that the BufferedStream wrapping
  the NetworkStream is allocating much more memory (4kB by default). But
  if the code is rewritten following what xsp does, this should not be a
  problem any more.

 Ok, I'm not familiar with xsp, I'll take a look.
 
 I've just noticed that you use send from libc instead of the socket
 functions... I guess it is due to performance reasons, right?

That's because I wanted to use the TCP_CORK option to avoid sending
headers and the beginning of the content in separate packets.

 Also, you mentioned in a previous email that the TcpChannel should be
 changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP.
 
 My question is: I guess AsynchCallback will use a thread underneath,
 won't it? If so: what's the advantage over launching threads to accept
 calls?

Your guess is wrong. Those asynchronous calls from Socket are treated as
if they were a WorkItem for a ThreadPool, only that when they are made,
the socket is added to an epoll fd (if you're on linux with support for
epoll). And when there's an event in the socket, there's a dedicated IO
threadpool to take care of reading/writing data and invoking the
callbacks. The advantages: if you have 10k connections, you don't need
10k threads, threads are reused (no creation overhead), ...

 You mentioned it is better to use the default ThreadPool instead of the
 internal one in the TcpChannel, why is it going to be better?

Coupled with asynchronous I/O, it will make better use of the resources
available. There's no need to create 100 threads for 100 client or
having 1 threadpool thread blocking on a socket asynchronous
operation,... Also, if you're thinking of reusing buffers, this helps
too, as the number of buffers will be bound to the maximum number of
threads in the threadpool, ...

-Gonzalo



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] Memory usage on Mono remoting

2009-07-13 Thread pablosantosl...@terra.es
Hi all,

On BinaryServerFormatterSink.cs, a new MemoryStream is being created to
attend every remoting call.

Under high load conditions it will make the GC work harder than
required, both decreasing performance and potentially causing memory
problems.

It should be replaced by some sort of MemoryStream based on preallocated
memory buffers.

Maybe something like ChunkedMemoryStream in Rotor.

http://www.koders.com/csharp/fid0C95C784238E26C8EAC95C7A852A34A0CE9305BB.aspx?s=chunkedmemorystream

Are you aware of an open source implementation of this?

Thanks,

pablo


www.plasticscm.com
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-13 Thread Gonzalo Paniagua Javier
On Mon, 2009-07-13 at 19:39 +0200, pablosantosl...@terra.es wrote:
 Hi all,
 
 On BinaryServerFormatterSink.cs, a new MemoryStream is being created to
 attend every remoting call.
 
 Under high load conditions it will make the GC work harder than
 required, both decreasing performance and potentially causing memory
 problems.
 
 It should be replaced by some sort of MemoryStream based on preallocated
 memory buffers.

Are you volunteering? Have you profiled the application or is this just
a guess?

 Maybe something like ChunkedMemoryStream in Rotor.
 
 http://www.koders.com/csharp/fid0C95C784238E26C8EAC95C7A852A34A0CE9305BB.aspx?s=chunkedmemorystream

If you look at that code you are not allowed to contribute to mono
anything based on what you saw.

System.Web uses unmanaged chunks of memory together with an unmanaged
memory based stream.

-Gonzalo


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-13 Thread pablosantosl...@terra.es
Hi,


 On Mon, 2009-07-13 at 19:39 +0200, pablosantosl...@terra.es wrote:
   
 Hi all,

 On BinaryServerFormatterSink.cs, a new MemoryStream is being created to
 attend every remoting call.

 Under high load conditions it will make the GC work harder than
 required, both decreasing performance and potentially causing memory
 problems.

 It should be replaced by some sort of MemoryStream based on preallocated
 memory buffers.
 

 Are you volunteering? Have you profiled the application or is this just
 a guess?

   
Testing PlasticSCM under really heavy load (hundreds of clients against
a single server delivering hundreds of Gb over the network).

Yes, I'm volunteering.


 System.Web uses unmanaged chunks of memory together with an unmanaged
 memory based stream.
   
That's what I need. Can you point me to the right class?


pablo


www.plasticscm.com

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-13 Thread pablosantosl...@terra.es
Ok, I think you mean IntPtrStream, but I'm not sure it's what I'm
looking for.

I think managed memory would be enough (unless the other is faster :-P),
but I need the stream to reuse preallocated buffers.

pablo

www.plasticscm.com


pablosantosl...@terra.es wrote:
 Hi,


   
 On Mon, 2009-07-13 at 19:39 +0200, pablosantosl...@terra.es wrote:
   
 
 Hi all,

 On BinaryServerFormatterSink.cs, a new MemoryStream is being created to
 attend every remoting call.

 Under high load conditions it will make the GC work harder than
 required, both decreasing performance and potentially causing memory
 problems.

 It should be replaced by some sort of MemoryStream based on preallocated
 memory buffers.
 
   
 Are you volunteering? Have you profiled the application or is this just
 a guess?

   
 
 Testing PlasticSCM under really heavy load (hundreds of clients against
 a single server delivering hundreds of Gb over the network).

 Yes, I'm volunteering.

   
 System.Web uses unmanaged chunks of memory together with an unmanaged
 memory based stream.
   
 
 That's what I need. Can you point me to the right class?


 pablo


 www.plasticscm.com

 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list

   
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Memory usage on Mono remoting

2009-07-13 Thread Gonzalo Paniagua Javier
On Mon, 2009-07-13 at 19:59 +0200, pablosantosl...@terra.es wrote:
[...]

 
  Are you volunteering? Have you profiled the application or is this just
  a guess?
 

 Testing PlasticSCM under really heavy load (hundreds of clients against
 a single server delivering hundreds of Gb over the network).

So no profiling...

 Yes, I'm volunteering.

The problem is that you can't contribute anything that is related to
that class that you linked before. See
http://mono-project.com/Contributing under 'Important Rules'.

 
  System.Web uses unmanaged chunks of memory together with an unmanaged
  memory based stream.

 That's what I need. Can you point me to the right class?

It's HttpResponseStream.cs all the *Bucket classes that then use an
IntPtrStream.

I still don't think that allocating MemoryStream (256 bytes by default)
is hurting that bad. A wild guess is that the BufferedStream wrapping
the NetworkStream is allocating much more memory (4kB by default). But
if the code is rewritten following what xsp does, this should not be a
problem any more.

-Gonzalo


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list