Re: [GENERAL] Urgent: 10K or more connections

Gianni Mariani Sat, 19 Jul 2003 09:41:55 -0700

Sean Chittenden wrote:

PostgreSQL will never be single proc, multi-threaded, and I don't think it should be for reliability's sake. See my above post, however, as I think I may have a better way to handle "lots of connections" without using threads. -sc

never is a VERY long time ... Also, the single proc/multiple proc thing does not have to be exclusive. Meaning you could "tune" the system so that it could do either.

True. This topic has come up a zillion times in the past though. The memory segmentation and reliability that independent processes give you is huge and the biggest reason why _if_ PostgreSQL does spontaneously wedge itself (like MySQL does all too often), you're only having to cope with a single DB connection being corrupt, invalid, etc. Imagine a threaded model where the process was horked and you loose 1000 connections worth of data in a SEGV. *shudder* Unix is reliable at the cost of memory segmentation... something that I dearly believe in. If that weren't worth anything, then I'd run everything in kernel and avoid the context switching, which is pretty expensive.

Yep, but if you design it right, you can have both. A rare occasion where you can have the cake and eat it too.

I have developed a single process server that handled thousands of connections. I've also developed a single process database (a while back) that handled multiple connections but I'm not sure I would do it the "hard" way again as the cost of writing the code for keeping context was not insignificant, although there are much better ways of doing it than how I did it 15 years ago.

Not saying it's not possible, just that at this point, reliability is more paramount than handling additional connections. With copy on write VM's being abundant these days, a lot of the size that you see with PostgreSQL is shared. Memory profiling and increasing the number of read only pages would be an extremely interesting exercise that could yield some slick results in terms of reducing the memory foot print of PG's children.

Context switching and cache thrashing are the killers in a multiple process model. There is a 6-10x performance penalty for running in separate processes vs running in a single process (and single thread) which I observed when doing benchmarking on a streaming server. Perhaps a better scheduler (like the O(1) scheduler in Linux 2.6.* would improve that but I just don't know.

What you talk about is very fundamental and I would love to have another go at it .... however you're right that this won't happen any time soon. Connection pooling is a fundamentally flawed way of overcoming this problem. A different design could render a significantly higher feasable connection count.

Surprisingly, it's not that complex at least handling a large number of FDs and figuring out which ones have data on them and need to be passed to a backend. I'm actually using the model for monitoring FD's from thttpd and reapplying bits where appropriate. It's abstraction of kqueue()/poll()/select() is nice enough to not want to reinvent the wheel (same with its license). Hopefully ripping through the incoming data and figuring out which backend pool to send a connection to won't be that bad, but I have next to no experience with writing that kind of code and my Stevens is hidden away in one of 23 boxes from a move earlier this month. I only know that Apache 1.3 does this with obviously huge success on basically every *nix so it can't be too hard.

No epoll ?


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] Urgent: 10K or more connections

Reply via email to