Re: [PERFORM] sustained update load of 1-2k/sec
Thanks again everyone for the excellent suggestions. I looked into IO::Reactor, but after a few hours of fiddling decided I was getting the kind of performance I wanted from using a slightly more than modest number of threads and decided(due to dev timelines) to come back to patching the SNMP libraries for Ruby to do async using Reactor later. I am unfortunately stuck with updates, but I think(with you're suggestions) I've made it work for me. MySQL = 1500 updates/sec PostgreSQL w/10k tx per commit using single thread = 1400 updates/sec Given the update heavy nature of this table I felt it was necessary to test during a vacuum. Turns out the hit wasn't that bad . . . PostgreSQL w/10k tx per commit using a single thread during a vacuum = 1300 updates/sec 100-200 updates/sec is a small price to pay for mature stored procedures, more stored procedure language options, acid compliance, mvcc, very few if any corrupt tables(get about 2 a week from MySQL on the 40 DBs I manage), more crash resistant db(crash about once a month on one of my 40 MySQL dbs), and replication that actually works for more than a day before quitting for no apparent reason ;) [/flame off] For those of you with Cox Communications cable modems look forward to better customer service and cable plant management. :) And if anyone's curious here's the app I'm rebuilding/updating http://www.mysql.com/customers/customer.php?id=16 We won runner up behind Saabre airline reservation system for MySQL app of the year. Needless to say they weren't too happy when they heard we might be switching DBs. 'njoy, Mark On 8/19/05 1:12 PM, J. Andrew Rogers [EMAIL PROTECTED] wrote: On 8/19/05 1:24 AM, Mark Cotner [EMAIL PROTECTED] wrote: I'm currently working on an application that will poll thousands of cable modems per minute and I would like to use PostgreSQL to maintain state between polls of each device. This requires a very heavy amount of updates in place on a reasonably large table(100k-500k rows, ~7 columns mostly integers/bigint). Each row will be refreshed every 15 minutes, or at least that's how fast I can poll via SNMP. I hope I can tune the DB to keep up. The app is threaded and will likely have well over 100 concurrent db connections. Temp tables for storage aren't a preferred option since this is designed to be a shared nothing approach and I will likely have several polling processes. Mark, We have PostgreSQL databases on modest hardware doing exactly what you are attempting to (massive scalable SNMP monitoring system). The monitoring volume for a single database server appears to exceed what you are trying to do by a few orders of magnitude with no scaling or performance issues, so I can state without reservation that PostgreSQL can easily handle your application in theory. However, that is predicated on having a well-architected system that minimizes resource contention and unnecessary blocking, and based on your description you may be going about it a bit wrong. The biggest obvious bottleneck is the use of threads and massive process-level parallelization. As others have pointed out, async queues are your friends, as is partitioning the workload horizontally rather than vertically through the app stack. A very scalable high-throughput engine for SNMP polling only requires two or three threads handling different parts of the workload to saturate the network, and by choosing what each thread does carefully you can all but eliminate blocking when there is work to be done. We only use a single database connection to insert all the data into PostgreSQL, and that process/thread receives its data from a work queue. Depending on how you design your system, you can batch many records in your queue as a single transaction. In our case, we also use very few updates, mostly just inserts, which is probably advantageous in terms of throughput if you have the disk for it. The insert I/O load is easily handled, and our disk array is a modest 10k SCSI rig. The only thing that really hammers the server is when multiple reporting processes are running, which frequently touch several million rows each (the database is much larger than the system memory), and even this is manageable with clever database design. In short, what you are trying to do is easily doable on PostgreSQL in theory. However, restrictions on design choices may pose significant hurdles. We did not start out with an ideal system either; it took a fair amount of re-engineering to solve all the bottlenecks and problems that pop up. Good luck, J. Andrew Rogers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ?
Re: [PERFORM] sustained update load of 1-2k/sec
On Aug 18, 2005, at 10:24 PM, Mark Cotner wrote: I'm currently working on an application that will poll thousands of cable modems per minute and I would like to use PostgreSQL to maintain state between polls of each device. This requires a very heavy amount of updates in place on a reasonably large table(100k-500k rows, ~7 columns mostly integers/bigint). Each row will be refreshed every 15 minutes, or at least that's how fast I can poll via SNMP. I hope I can tune the DB to keep up. The app is threaded and will likely have well over 100 concurrent db connections. Temp tables for storage aren't a preferred option since this is designed to be a shared nothing approach and I will likely have several polling processes. Somewhat OT, but.. The easiest way to speed that up is to use less threads. You're adding a whole TON of overhead with that many threads that you just don't want or need. You should probably be using something event- driven to solve this problem, with just a few database threads to store all that state. Less is definitely more in this case. See http://www.kegel.com/c10k.html (and there's plenty of other literature out there saying that event driven is an extremely good way to do this sort of thing). Here are some frameworks to look at for this kind of network code: (Python) Twisted - http://twistedmatrix.com/ (Perl) POE - http://poe.perl.org/ (Java) java.nio (not familiar enough with the Java thing to know whether or not there's a high-level wrapper) (C++) ACE - http://www.cs.wustl.edu/~schmidt/ACE.html (Ruby) IO::Reactor - http://www.deveiate.org/code/IO-Reactor.html (C) libevent - http://monkey.org/~provos/libevent/ .. and of course, you have select/poll/kqueue/WaitNextEvent/whatever that you could use directly, if you wanted to roll your own solution, but don't do that. If you don't want to optimize the whole application, I'd at least just push the DB operations down to a very small number of connections (*one* might even be optimal!), waiting on some kind of thread-safe queue for updates from the rest of the system. This way you can easily batch those updates into transactions and you won't be putting so much unnecessary synchronization overhead into your application and the database. Generally, once you have more worker threads (or processes) than CPUs, you're going to get diminishing returns in a bad way, assuming those threads are making good use of their time. -bob ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] sustained update load of 1-2k/sec
Excellent feedback. Thank you. Please do keep in mind I'm storing the results of SNMP queries. The majority of the time each thread is in a wait state, listening on a UDP port for return packet. The number of threads is high because in order to sustain poll speed I need to minimize the impact of timeouts and all this waiting for return packets. I had intended to have a fallback plan which would build a thread safe queue for db stuffs, but the application isn't currently architected that way. It's not completely built yet so now is the time for change. I hadn't thought of building up a batch of queries and creating a transaction from them. I've been looking into memcached as a persistent object store as well and hadn't seen the reactor pattern yet. Still trying to get my puny brain around that one. Again, thanks for the help. 'njoy, Mark On 8/19/05 5:09 AM, Bob Ippolito [EMAIL PROTECTED] wrote: On Aug 18, 2005, at 10:24 PM, Mark Cotner wrote: I'm currently working on an application that will poll thousands of cable modems per minute and I would like to use PostgreSQL to maintain state between polls of each device. This requires a very heavy amount of updates in place on a reasonably large table(100k-500k rows, ~7 columns mostly integers/bigint). Each row will be refreshed every 15 minutes, or at least that's how fast I can poll via SNMP. I hope I can tune the DB to keep up. The app is threaded and will likely have well over 100 concurrent db connections. Temp tables for storage aren't a preferred option since this is designed to be a shared nothing approach and I will likely have several polling processes. Somewhat OT, but.. The easiest way to speed that up is to use less threads. You're adding a whole TON of overhead with that many threads that you just don't want or need. You should probably be using something event- driven to solve this problem, with just a few database threads to store all that state. Less is definitely more in this case. See http://www.kegel.com/c10k.html (and there's plenty of other literature out there saying that event driven is an extremely good way to do this sort of thing). Here are some frameworks to look at for this kind of network code: (Python) Twisted - http://twistedmatrix.com/ (Perl) POE - http://poe.perl.org/ (Java) java.nio (not familiar enough with the Java thing to know whether or not there's a high-level wrapper) (C++) ACE - http://www.cs.wustl.edu/~schmidt/ACE.html (Ruby) IO::Reactor - http://www.deveiate.org/code/IO-Reactor.html (C) libevent - http://monkey.org/~provos/libevent/ .. and of course, you have select/poll/kqueue/WaitNextEvent/whatever that you could use directly, if you wanted to roll your own solution, but don't do that. If you don't want to optimize the whole application, I'd at least just push the DB operations down to a very small number of connections (*one* might even be optimal!), waiting on some kind of thread-safe queue for updates from the rest of the system. This way you can easily batch those updates into transactions and you won't be putting so much unnecessary synchronization overhead into your application and the database. Generally, once you have more worker threads (or processes) than CPUs, you're going to get diminishing returns in a bad way, assuming those threads are making good use of their time. -bob ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] sustained update load of 1-2k/sec
On Aug 19, 2005, at 12:14 AM, Mark Cotner wrote: Excellent feedback. Thank you. Please do keep in mind I'm storing the results of SNMP queries. The majority of the time each thread is in a wait state, listening on a UDP port for return packet. The number of threads is high because in order to sustain poll speed I need to minimize the impact of timeouts and all this waiting for return packets. Asynchronous IO via select/poll/etc. basically says: given these 100 sockets, wake me up when any of them has something to tell me, or wake me up anyway in N milliseconds. From one thread, you can usually deal with thousands of connections without breaking a sweat, where with thread-per-connection you have so much overhead just for the threads that you probably run out of RAM before your network is throttled. The reactor pattern basically just abstracts this a bit so that you worry about what do to when the sockets have something to say, and also allow you to schedule timed events, rather than having to worry about how to implement that correctly *and* write your application. With 100 threads you are basically invoking a special-case of the same mechanism that only looks at one socket, but this makes for 100 different data structures that end up in both userspace and kernel space, plus the thread stacks (which can easily be a few megs each) and context switching when any of them wakes up.. You're throwing a lot of RAM and CPU cycles out the window by using this design. Also, preemptive threads are hard. I had intended to have a fallback plan which would build a thread safe queue for db stuffs, but the application isn't currently architected that way. It's not completely built yet so now is the time for change. I hadn't thought of building up a batch of queries and creating a transaction from them. It should be *really* easy to just swap out the implementation of your change this record function with one that simply puts its arguments on a queue, with another thread that gets them from the queue and actually does the work. I've been looking into memcached as a persistent object store as well and hadn't seen the reactor pattern yet. Still trying to get my puny brain around that one. memcached is RAM based, it's not persistent at all... unless you are sure all of your nodes will be up at all times and will never go down. IIRC, it also just starts throwing away data once you hit its size limit. If course, this isn't really any different than MySQL's MyISAM tables if you hit the row limit, but I think that memcached might not even give you an error when this happens. Also, memcached is just key/value pairs over a network, not much of a database going on there. If you can fit all this data in RAM and you don't care so much about the integrity, you might not benefit much from a RDBMS at all. However, I don't really know what you're doing with the data once you have it so I might be very wrong here... -bob Again, thanks for the help. 'njoy, Mark On 8/19/05 5:09 AM, Bob Ippolito [EMAIL PROTECTED] wrote: On Aug 18, 2005, at 10:24 PM, Mark Cotner wrote: I'm currently working on an application that will poll thousands of cable modems per minute and I would like to use PostgreSQL to maintain state between polls of each device. This requires a very heavy amount of updates in place on a reasonably large table(100k-500k rows, ~7 columns mostly integers/bigint). Each row will be refreshed every 15 minutes, or at least that's how fast I can poll via SNMP. I hope I can tune the DB to keep up. The app is threaded and will likely have well over 100 concurrent db connections. Temp tables for storage aren't a preferred option since this is designed to be a shared nothing approach and I will likely have several polling processes. Somewhat OT, but.. The easiest way to speed that up is to use less threads. You're adding a whole TON of overhead with that many threads that you just don't want or need. You should probably be using something event- driven to solve this problem, with just a few database threads to store all that state. Less is definitely more in this case. See http://www.kegel.com/c10k.html (and there's plenty of other literature out there saying that event driven is an extremely good way to do this sort of thing). Here are some frameworks to look at for this kind of network code: (Python) Twisted - http://twistedmatrix.com/ (Perl) POE - http://poe.perl.org/ (Java) java.nio (not familiar enough with the Java thing to know whether or not there's a high-level wrapper) (C++) ACE - http://www.cs.wustl.edu/~schmidt/ACE.html (Ruby) IO::Reactor - http://www.deveiate.org/code/IO-Reactor.html (C) libevent - http://monkey.org/~provos/libevent/ .. and of course, you have select/poll/kqueue/WaitNextEvent/whatever that you could use directly, if you wanted to roll your own solution, but don't do that. If
Re: [PERFORM] sustained update load of 1-2k/sec
I have managed tx speeds that high from postgresql going even as high as 2500/sec for small tables, but it does require a good RAID controler card (yes I'm even running with fsync on). I'm using 3ware 9500S-8MI with Raptor drives in multiple RAID 10s. The box wasn't too $$$ at just around $7k. I have two independant controlers on two independant PCI buses to give max throughput. on with a 6 drive RAID 10 and the other with two 4 drive RAID 10s. Alex Turner NetEconomist On 8/19/05, Mark Cotner [EMAIL PROTECTED] wrote: Hi all, I bet you get tired of the same ole questions over and over. I'm currently working on an application that will poll thousands of cable modems per minute and I would like to use PostgreSQL to maintain state between polls of each device. This requires a very heavy amount of updates in place on a reasonably large table(100k-500k rows, ~7 columns mostly integers/bigint). Each row will be refreshed every 15 minutes, or at least that's how fast I can poll via SNMP. I hope I can tune the DB to keep up. The app is threaded and will likely have well over 100 concurrent db connections. Temp tables for storage aren't a preferred option since this is designed to be a shared nothing approach and I will likely have several polling processes. Here are some of my assumptions so far . . . HUGE WAL Vacuum hourly if not more often I'm getting 1700tx/sec from MySQL and I would REALLY prefer to use PG. I don't need to match the number, just get close. Is there a global temp table option? In memory tables would be very beneficial in this case. I could just flush it to disk occasionally with an insert into blah select from memory table. Any help or creative alternatives would be greatly appreciated. :) 'njoy, Mark -- Writing software requires an intelligent person, creating functional art requires an artist. -- Unknown ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] sustained update load of 1-2k/sec
Bob Ippolito [EMAIL PROTECTED] writes: If you don't want to optimize the whole application, I'd at least just push the DB operations down to a very small number of connections (*one* might even be optimal!), waiting on some kind of thread-safe queue for updates from the rest of the system. While I agree that hundreds of threads seems like overkill, I think the above advice might be going too far in the other direction. The problem with single-threaded operation is that any delay affects the whole system --- eg, if you're blocked waiting for disk I/O, the CPU doesn't get anything done either. You want enough DB connections doing things in parallel to make sure that there's always something else useful to do for each major component. This is particularly important for Postgres, which doesn't do any internal query parallelization (not that it would help much anyway for the sorts of trivial queries you are worried about). If you have, say, a 4-way CPU you want at least 4 active connections to make good use of the CPUs. I'd suggest trying to build the system so that it uses a dozen or two active database connections. If that doesn't match up to the number of polling activities you want to have in flight at any instant, then you can do something like what Bob suggested on the client side to bridge the gap. As far as the question can PG do 1-2k xact/sec, the answer is yes if you throw enough hardware at it. Spending enough money on the disk subsystem is the key ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] sustained update load of 1-2k/sec
Tom Lane wrote: Bob Ippolito [EMAIL PROTECTED] writes: If you don't want to optimize the whole application, I'd at least just push the DB operations down to a very small number of connections (*one* might even be optimal!), waiting on some kind of thread-safe queue for updates from the rest of the system. While I agree that hundreds of threads seems like overkill, I think the above advice might be going too far in the other direction. The problem with single-threaded operation is that any delay affects the whole system --- eg, if you're blocked waiting for disk I/O, the CPU doesn't get anything done either. You want enough DB connections doing things in parallel to make sure that there's always something else useful to do for each major component. This is particularly important for Postgres, which doesn't do any internal query parallelization (not that it would help much anyway for the sorts of trivial queries you are worried about). If you have, say, a 4-way CPU you want at least 4 active connections to make good use of the CPUs. I'd suggest trying to build the system so that it uses a dozen or two active database connections. If that doesn't match up to the number of polling activities you want to have in flight at any instant, then you can do something like what Bob suggested on the client side to bridge the gap. As far as the question can PG do 1-2k xact/sec, the answer is yes if you throw enough hardware at it. Spending enough money on the disk subsystem is the key ... The 1-2k xact/sec for MySQL seems suspicious, sounds very much like write-back cached, not write-through, esp. considering that heavy concurrent write access isn't said to be MySQLs strength... I wonder if preserving the database after a fatal crash is really necessary, since the data stored sounds quite volatile; in this case, fsync=false might be sufficient. Regards, Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] sustained update load of 1-2k/sec
Andreas Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: As far as the question can PG do 1-2k xact/sec, the answer is yes if you throw enough hardware at it. Spending enough money on the disk subsystem is the key ... The 1-2k xact/sec for MySQL seems suspicious, sounds very much like write-back cached, not write-through, esp. considering that heavy concurrent write access isn't said to be MySQLs strength... I wonder if preserving the database after a fatal crash is really necessary, since the data stored sounds quite volatile; in this case, fsync=false might be sufficient. Yeah, that's something to think about. If you do need full transaction safety, then you *must* have a decent battery-backed-write-cache setup, else your transaction commit rate will be limited by disk rotation speed --- for instance, a single connection can commit at most 250 xacts per second if the WAL log is on a 15000RPM drive. (You can improve this to the extent that you can spread activity across multiple connections, but I'm not sure you can expect to reliably have 8 or more connections ready to commit each time the disk goes 'round.) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] sustained update load of 1-2k/sec
Alex mentions a nice setup, but I'm pretty sure I know how to beat that IO subsystems HW's performance by at least 1.5x or 2x. Possibly more. (No, I do NOT work for any vendor I'm about to discuss.) Start by replacing the WD Raptors with Maxtor Atlas 15K II's. At 5.5ms average access, 97.4MB/s outer track throughput, 85.9MB/s average, and 74.4 MB/s inner track throughput, they have the best performance characteristics of any tested shipping HDs I know of. (Supposedly the new SAS versions will _sustain_ ~98MB/s, but I'll believe that only if I see it under independent testing). In comparison, the numbers on the WD740GD are 8.1ms average access, 71.8, 62.9, and 53.9 MB/s outer, average and inner track throughputs respectively. Be prepared to use as many of them as possible (read: as many you can afford) if you want to maximize transaction rates, particularly for small transactions like this application seems to be mentioning. Next, use a better RAID card. The TOL enterprise stuff (Xyratex, Engino, Dot-hill) is probably too expensive, but in the commodity market benchmarks indicate that that Areca's 1GB buffer RAID cards currently outperform all the other commodity RAID stuff. 9 Atlas II's per card in a RAID 5 set, or 16 per card in a RAID 10 set, should max the RAID card's throughput and come very close to, if not attaining, the real world peak bandwidth of the 64b 133MHz PCI-X bus they are plugged into. Say somewhere in the 700-800MB/s range. Repeat the above for as many independent PCI-X buses as you have for a very fast commodity RAID IO subsystem. Two such configured cards used in the dame manner as mentioned by Alex should easily attain 1.5x - 2x the transaction numbers mentioned by Alex unless there's a bottleneck somewhere else in the system design. Hope this helps, Ron Peacetree At 08:40 AM 8/19/2005, Alex Turner wrote: I have managed tx speeds that high from postgresql going even as high as 2500/sec for small tables, but it does require a good RAID controler card (yes I'm even running with fsync on). I'm using 3ware 9500S-8MI with Raptor drives in multiple RAID 10s. The box wasn't too $$$ at just around $7k. I have two independant controlers on two independant PCI buses to give max throughput. on with a 6 drive RAID 10 and the other with two 4 drive RAID 10s. Alex Turner NetEconomist On 8/19/05, Mark Cotner [EMAIL PROTECTED] wrote: Hi all, I bet you get tired of the same ole questions over and over. I'm currently working on an application that will poll thousands of cable modems per minute and I would like to use PostgreSQL to maintain state between polls of each device. This requires a very heavy amount of updates in place on a reasonably large table(100k-500k rows, ~7 columns mostly integers/bigint). Each row will be refreshed every 15 minutes, or at least that's how fast I can poll via SNMP. I hope I can tune the DB to keep up. The app is threaded and will likely have well over 100 concurrent db connections. Temp tables for storage aren't a preferred option since this is designed to be a shared nothing approach and I will likely have several polling processes. Here are some of my assumptions so far . . . HUGE WAL Vacuum hourly if not more often I'm getting 1700tx/sec from MySQL and I would REALLY prefer to use PG. I don't need to match the number, just get close. Is there a global temp table option? In memory tables would be very beneficial in this case. I could just flush it to disk occasionally with an insert into blah select from memory table. Any help or creative alternatives would be greatly appreciated. :) 'njoy, Mark -- Writing software requires an intelligent person, creating functional art requires an artist. -- Unknown ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] sustained update load of 1-2k/sec
At 09:58 AM 8/19/2005, Andreas Pflug wrote: The 1-2k xact/sec for MySQL seems suspicious, sounds very much like write-back cached, not write-through, esp. considering that heavy concurrent write access isn't said to be MySQLs strength... Don't be suspicious. I haven't seen the code under discussion, but I have seen mySQL easily achieve these kinds of numbers using the myISAM storage engine in write-through cache mode. myISAM can be =FAST=. Particularly when decent HW is thrown at it. Ron ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly