Re: [Mono-dev] Memory usage on Mono remoting
Hi Gonzalo, First, thanks for the detailed answers. Considering Boehm GC seems to have really hard times releasing memory and we're delivering GBs of data... it could be. Delivering GBs of data and having hundreds of connections should not be a problem. Years ago, when testing iFolder under those conditions everything worked just fine. But it was mod-mono-server/apache. Dick is actually checking this. I hope it's not the issue. I'm not 100% sure, but it seems reusing buffers could be a very good idea. Xsp does it too and it's much better than allocating 32kB for every request every time. Good. Also, you mentioned in a previous email that the TcpChannel should be changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP. My question is: I guess AsynchCallback will use a thread underneath, won't it? If so: what's the advantage over launching threads to accept calls? Your guess is wrong. Those asynchronous calls from Socket are treated as if they were a WorkItem for a ThreadPool, only that when they are made, the socket is added to an epoll fd (if you're on linux with support for epoll). And when there's an event in the socket, there's a dedicated IO threadpool to take care of reading/writing data and invoking the callbacks. The advantages: if you have 10k connections, you don't need 10k threads, threads are reused (no creation overhead), ... Ok, of course. Well, when I said launching a thread I really meant launching a thread on a thread pool. Well, I'll try to use the ansync sockets then, but I guess to get the best out of them I'll need not only to use them during accept, but also read data asynchronously, right? BTW, I already replaced the built-in remoting threadpool by the System.Threading one. You mentioned it is better to use the default ThreadPool instead of the internal one in the TcpChannel, why is it going to be better? Coupled with asynchronous I/O, it will make better use of the resources available. There's no need to create 100 threads for 100 client or having 1 threadpool thread blocking on a socket asynchronous operation,... Also, if you're thinking of reusing buffers, this helps too, as the number of buffers will be bound to the maximum number of threads in the threadpool, ... Good, so, whenever I wait for a read or a write using async, the thread should be able to work on another request? I think this is the way it's implemented on Windows, but I can tell you it is still created a huge number of threads, almost 1-1 with clients under the most overloaded scenarios. Thanks, pablo -Gonzalo ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
On Wed, 2009-07-15 at 13:29 +0200, pablosantosl...@terra.es wrote: [...] Your guess is wrong. Those asynchronous calls from Socket are treated as if they were a WorkItem for a ThreadPool, only that when they are made, the socket is added to an epoll fd (if you're on linux with support for epoll). And when there's an event in the socket, there's a dedicated IO threadpool to take care of reading/writing data and invoking the callbacks. The advantages: if you have 10k connections, you don't need 10k threads, threads are reused (no creation overhead), ... Ok, of course. Well, when I said launching a thread I really meant launching a thread on a thread pool. Well, I'll try to use the ansync sockets then, but I guess to get the best out of them I'll need not only to use them during accept, but also read data asynchronously, right? Correct. If possible, Write should also be asynchronous, but as long as the OS buffers everything, there should be no problem. [...] Coupled with asynchronous I/O, it will make better use of the resources available. There's no need to create 100 threads for 100 client or having 1 threadpool thread blocking on a socket asynchronous operation,... Also, if you're thinking of reusing buffers, this helps too, as the number of buffers will be bound to the maximum number of threads in the threadpool, ... Good, so, whenever I wait for a read or a write using async, the thread should be able to work on another request? Correct. In fact, you don't 'wait' for an asynchronous read or write. For instance, when you call BeginRead, the socket is added to an epoll fd and you BeginRead call returns immediately. The callback you provided, if any, will be called from a different thread as soon as new data is available. Just don't spoil it by doing something like socket.EndRead (socket.BeginRead (...))) ;-) I think this is the way it's implemented on Windows, but I can tell you it is still created a huge number of threads, almost 1-1 with clients under the most overloaded scenarios. The number of threads in the current threadpool can be configured using MONO_THREADS_PER_CPU (see mono(1)) and SetMaxThreads. I don't remember the numbers, but if you have a dual-quad core, it means ~140 threads. I would adjust the maximum number of threads until the performance is more or less the same than if you add a few more threads, but this may vary depending on what the threads are doing... -Gonzalo ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
Hi Gonzalo, Well, I'll try to use the ansync sockets then, but I guess to get the best out of them I'll need not only to use them during accept, but also read data asynchronously, right? Correct. If possible, Write should also be asynchronous, but as long as the OS buffers everything, there should be no problem. [...] Ok, well, in fact I think I'm experiencing this right now: I've 112 clients against the same server, each of them will download about 300Mb. I see how the thread pool gets full (quad core) and then new requests are accepted but not scheduled. CPU is almost doing nothing, so I guess all threads are waiting for the synchronous socket write to complete, so performance could be much higher. Coupled with asynchronous I/O, it will make better use of the resources available. There's no need to create 100 threads for 100 client or having 1 threadpool thread blocking on a socket asynchronous operation,... Also, if you're thinking of reusing buffers, this helps too, as the number of buffers will be bound to the maximum number of threads in the threadpool, ... Good, so, whenever I wait for a read or a write using async, the thread should be able to work on another request? Correct. In fact, you don't 'wait' for an asynchronous read or write. For instance, when you call BeginRead, the socket is added to an epoll fd and you BeginRead call returns immediately. The callback you provided, if any, will be called from a different thread as soon as new data is available. Just don't spoil it by doing something like socket.EndRead (socket.BeginRead (...))) ;-) ok ! :-P Well, I see it will mean a good number of changes in the Channel. Besides, as I also told Dick and Dave on a separate thread, it seems the latest MySql provider has a HUGE memory leak (ok, or it never frees mem), which is causing a good number of problems to my test. And, it seems I also have to do something with my code since all data requests are reading byte[] in blocks of 4Mb, but not reusing the buffers at all, so when a lot of threads are doing its job, huge amounts of mem are being allocated and giving extra work to the GC which should be completely avoided. I'm a little bit concerned about the following, though: If you've a method like byte[] GetData() which is going to be invoked through remoting, even if you do custom serialization (maybe creating a DataArray class) two copies of the data will be created: one the original you created, and a second one in the serialization buffer prior to be sent to the network... :-( pablo ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
Hi Gonzalo, Testing PlasticSCM under really heavy load (hundreds of clients against a single server delivering hundreds of Gb over the network). So no profiling... Comparing the same code, the same hardware and the same test under Linux and Windows. On Windows we run under .NET, and the test passes successfully. Dick (in CC) is looking into GC problems. System.Web uses unmanaged chunks of memory together with an unmanaged memory based stream. That's what I need. Can you point me to the right class? It's HttpResponseStream.cs all the *Bucket classes that then use an IntPtrStream. I still don't think that allocating MemoryStream (256 bytes by default) is hurting that bad. Considering Boehm GC seems to have really hard times releasing memory and we're delivering GBs of data... it could be. I'm not 100% sure, but it seems reusing buffers could be a very good idea. A wild guess is that the BufferedStream wrapping the NetworkStream is allocating much more memory (4kB by default). But if the code is rewritten following what xsp does, this should not be a problem any more. Ok, I'm not familiar with xsp, I'll take a look. I've just noticed that you use send from libc instead of the socket functions... I guess it is due to performance reasons, right? Also, you mentioned in a previous email that the TcpChannel should be changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP. My question is: I guess AsynchCallback will use a thread underneath, won't it? If so: what's the advantage over launching threads to accept calls? You mentioned it is better to use the default ThreadPool instead of the internal one in the TcpChannel, why is it going to be better? Thanks again Gonzalo, pablo www.plasticscm.com ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
pablosantosl...@terra.es wrote: Also, you mentioned in a previous email that the TcpChannel should be changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP. My question is: I guess AsynchCallback will use a thread underneath, won't it? If so: what's the advantage over launching threads to accept calls? The thread pool threads are already launched and they are maintained by the runtime which is known to be faster than a managed thread pool implementation. Additionally, the default thread pool is already configurable (see mono(1)). You mentioned it is better to use the default ThreadPool instead of the internal one in the TcpChannel, why is it going to be better? See above. Robert ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
Thanks Robert, So: I'll modify the TcpChannel to use ThreadPool at System.Threading instead of the built-in one. Robert Jordan wrote: pablosantosl...@terra.es wrote: Also, you mentioned in a previous email that the TcpChannel should be changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP. My question is: I guess AsynchCallback will use a thread underneath, won't it? If so: what's the advantage over launching threads to accept calls? The thread pool threads are already launched and they are maintained by the runtime which is known to be faster than a managed thread pool implementation. Additionally, the default thread pool is already configurable (see mono(1)). You mentioned it is better to use the default ThreadPool instead of the internal one in the TcpChannel, why is it going to be better? See above. Robert ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
On Tue, 2009-07-14 at 11:12 +0200, pablosantosl...@terra.es wrote: Hi Gonzalo, Testing PlasticSCM under really heavy load (hundreds of clients against a single server delivering hundreds of Gb over the network). So no profiling... Comparing the same code, the same hardware and the same test under Linux and Windows. On Windows we run under .NET, and the test passes successfully. Oh, when I said profiling I meant the mono --profile=stat or similar that can tell you what and where is being allocated. [...] It's HttpResponseStream.cs all the *Bucket classes that then use an IntPtrStream. I still don't think that allocating MemoryStream (256 bytes by default) is hurting that bad. Considering Boehm GC seems to have really hard times releasing memory and we're delivering GBs of data... it could be. Delivering GBs of data and having hundreds of connections should not be a problem. Years ago, when testing iFolder under those conditions everything worked just fine. But it was mod-mono-server/apache. I'm not 100% sure, but it seems reusing buffers could be a very good idea. Xsp does it too and it's much better than allocating 32kB for every request every time. A wild guess is that the BufferedStream wrapping the NetworkStream is allocating much more memory (4kB by default). But if the code is rewritten following what xsp does, this should not be a problem any more. Ok, I'm not familiar with xsp, I'll take a look. I've just noticed that you use send from libc instead of the socket functions... I guess it is due to performance reasons, right? That's because I wanted to use the TCP_CORK option to avoid sending headers and the beginning of the content in separate packets. Also, you mentioned in a previous email that the TcpChannel should be changed so it uses Asynch sockets. I've seen you use AsyncCallBack on XSP. My question is: I guess AsynchCallback will use a thread underneath, won't it? If so: what's the advantage over launching threads to accept calls? Your guess is wrong. Those asynchronous calls from Socket are treated as if they were a WorkItem for a ThreadPool, only that when they are made, the socket is added to an epoll fd (if you're on linux with support for epoll). And when there's an event in the socket, there's a dedicated IO threadpool to take care of reading/writing data and invoking the callbacks. The advantages: if you have 10k connections, you don't need 10k threads, threads are reused (no creation overhead), ... You mentioned it is better to use the default ThreadPool instead of the internal one in the TcpChannel, why is it going to be better? Coupled with asynchronous I/O, it will make better use of the resources available. There's no need to create 100 threads for 100 client or having 1 threadpool thread blocking on a socket asynchronous operation,... Also, if you're thinking of reusing buffers, this helps too, as the number of buffers will be bound to the maximum number of threads in the threadpool, ... -Gonzalo ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] Memory usage on Mono remoting
Hi all, On BinaryServerFormatterSink.cs, a new MemoryStream is being created to attend every remoting call. Under high load conditions it will make the GC work harder than required, both decreasing performance and potentially causing memory problems. It should be replaced by some sort of MemoryStream based on preallocated memory buffers. Maybe something like ChunkedMemoryStream in Rotor. http://www.koders.com/csharp/fid0C95C784238E26C8EAC95C7A852A34A0CE9305BB.aspx?s=chunkedmemorystream Are you aware of an open source implementation of this? Thanks, pablo www.plasticscm.com ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
On Mon, 2009-07-13 at 19:39 +0200, pablosantosl...@terra.es wrote: Hi all, On BinaryServerFormatterSink.cs, a new MemoryStream is being created to attend every remoting call. Under high load conditions it will make the GC work harder than required, both decreasing performance and potentially causing memory problems. It should be replaced by some sort of MemoryStream based on preallocated memory buffers. Are you volunteering? Have you profiled the application or is this just a guess? Maybe something like ChunkedMemoryStream in Rotor. http://www.koders.com/csharp/fid0C95C784238E26C8EAC95C7A852A34A0CE9305BB.aspx?s=chunkedmemorystream If you look at that code you are not allowed to contribute to mono anything based on what you saw. System.Web uses unmanaged chunks of memory together with an unmanaged memory based stream. -Gonzalo ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
Hi, On Mon, 2009-07-13 at 19:39 +0200, pablosantosl...@terra.es wrote: Hi all, On BinaryServerFormatterSink.cs, a new MemoryStream is being created to attend every remoting call. Under high load conditions it will make the GC work harder than required, both decreasing performance and potentially causing memory problems. It should be replaced by some sort of MemoryStream based on preallocated memory buffers. Are you volunteering? Have you profiled the application or is this just a guess? Testing PlasticSCM under really heavy load (hundreds of clients against a single server delivering hundreds of Gb over the network). Yes, I'm volunteering. System.Web uses unmanaged chunks of memory together with an unmanaged memory based stream. That's what I need. Can you point me to the right class? pablo www.plasticscm.com ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
Ok, I think you mean IntPtrStream, but I'm not sure it's what I'm looking for. I think managed memory would be enough (unless the other is faster :-P), but I need the stream to reuse preallocated buffers. pablo www.plasticscm.com pablosantosl...@terra.es wrote: Hi, On Mon, 2009-07-13 at 19:39 +0200, pablosantosl...@terra.es wrote: Hi all, On BinaryServerFormatterSink.cs, a new MemoryStream is being created to attend every remoting call. Under high load conditions it will make the GC work harder than required, both decreasing performance and potentially causing memory problems. It should be replaced by some sort of MemoryStream based on preallocated memory buffers. Are you volunteering? Have you profiled the application or is this just a guess? Testing PlasticSCM under really heavy load (hundreds of clients against a single server delivering hundreds of Gb over the network). Yes, I'm volunteering. System.Web uses unmanaged chunks of memory together with an unmanaged memory based stream. That's what I need. Can you point me to the right class? pablo www.plasticscm.com ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Memory usage on Mono remoting
On Mon, 2009-07-13 at 19:59 +0200, pablosantosl...@terra.es wrote: [...] Are you volunteering? Have you profiled the application or is this just a guess? Testing PlasticSCM under really heavy load (hundreds of clients against a single server delivering hundreds of Gb over the network). So no profiling... Yes, I'm volunteering. The problem is that you can't contribute anything that is related to that class that you linked before. See http://mono-project.com/Contributing under 'Important Rules'. System.Web uses unmanaged chunks of memory together with an unmanaged memory based stream. That's what I need. Can you point me to the right class? It's HttpResponseStream.cs all the *Bucket classes that then use an IntPtrStream. I still don't think that allocating MemoryStream (256 bytes by default) is hurting that bad. A wild guess is that the BufferedStream wrapping the NetworkStream is allocating much more memory (4kB by default). But if the code is rewritten following what xsp does, this should not be a problem any more. -Gonzalo ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list