On 8/19/05 1:24 AM, "Mark Cotner" <[EMAIL PROTECTED]> wrote: > I'm currently working on an application that will poll > thousands of cable modems per minute and I would like > to use PostgreSQL to maintain state between polls of > each device. This requires a very heavy amount of > updates in place on a reasonably large table(100k-500k > rows, ~7 columns mostly integers/bigint). Each row > will be refreshed every 15 minutes, or at least that's > how fast I can poll via SNMP. I hope I can tune the > DB to keep up. > > The app is threaded and will likely have well over 100 > concurrent db connections. Temp tables for storage > aren't a preferred option since this is designed to be > a shared nothing approach and I will likely have > several polling processes.
Mark, We have PostgreSQL databases on modest hardware doing exactly what you are attempting to (massive scalable SNMP monitoring system). The monitoring volume for a single database server appears to exceed what you are trying to do by a few orders of magnitude with no scaling or performance issues, so I can state without reservation that PostgreSQL can easily handle your application in theory. However, that is predicated on having a well-architected system that minimizes resource contention and unnecessary blocking, and based on your description you may be going about it a bit wrong. The biggest obvious bottleneck is the use of threads and massive process-level parallelization. As others have pointed out, async queues are your friends, as is partitioning the workload horizontally rather than vertically through the app stack. A very scalable high-throughput engine for SNMP polling only requires two or three threads handling different parts of the workload to saturate the network, and by choosing what each thread does carefully you can all but eliminate blocking when there is work to be done. We only use a single database connection to insert all the data into PostgreSQL, and that process/thread receives its data from a work queue. Depending on how you design your system, you can batch many records in your queue as a single transaction. In our case, we also use very few updates, mostly just inserts, which is probably advantageous in terms of throughput if you have the disk for it. The insert I/O load is easily handled, and our disk array is a modest 10k SCSI rig. The only thing that really hammers the server is when multiple reporting processes are running, which frequently touch several million rows each (the database is much larger than the system memory), and even this is manageable with clever database design. In short, what you are trying to do is easily doable on PostgreSQL in theory. However, restrictions on design choices may pose significant hurdles. We did not start out with an ideal system either; it took a fair amount of re-engineering to solve all the bottlenecks and problems that pop up. Good luck, J. Andrew Rogers [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq