On 8/19/05 1:24 AM, "Mark Cotner" <[EMAIL PROTECTED]> wrote:
> I'm currently working on an application that will poll
> thousands of cable modems per minute and I would like
> to use PostgreSQL to maintain state between polls of
> each device. This requires a very heavy amount of
> updates in place on a reasonably large table(100k-500k
> rows, ~7 columns mostly integers/bigint). Each row
> will be refreshed every 15 minutes, or at least that's
> how fast I can poll via SNMP. I hope I can tune the
> DB to keep up.
> The app is threaded and will likely have well over 100
> concurrent db connections. Temp tables for storage
> aren't a preferred option since this is designed to be
> a shared nothing approach and I will likely have
> several polling processes.
We have PostgreSQL databases on modest hardware doing exactly what you are
attempting to (massive scalable SNMP monitoring system). The monitoring
volume for a single database server appears to exceed what you are trying to
do by a few orders of magnitude with no scaling or performance issues, so I
can state without reservation that PostgreSQL can easily handle your
application in theory.
However, that is predicated on having a well-architected system that
minimizes resource contention and unnecessary blocking, and based on your
description you may be going about it a bit wrong.
The biggest obvious bottleneck is the use of threads and massive
process-level parallelization. As others have pointed out, async queues are
your friends, as is partitioning the workload horizontally rather than
vertically through the app stack. A very scalable high-throughput engine
for SNMP polling only requires two or three threads handling different parts
of the workload to saturate the network, and by choosing what each thread
does carefully you can all but eliminate blocking when there is work to be
We only use a single database connection to insert all the data into
PostgreSQL, and that process/thread receives its data from a work queue.
Depending on how you design your system, you can batch many records in your
queue as a single transaction. In our case, we also use very few updates,
mostly just inserts, which is probably advantageous in terms of throughput
if you have the disk for it. The insert I/O load is easily handled, and our
disk array is a modest 10k SCSI rig. The only thing that really hammers the
server is when multiple reporting processes are running, which frequently
touch several million rows each (the database is much larger than the system
memory), and even this is manageable with clever database design.
In short, what you are trying to do is easily doable on PostgreSQL in
theory. However, restrictions on design choices may pose significant
hurdles. We did not start out with an ideal system either; it took a fair
amount of re-engineering to solve all the bottlenecks and problems that pop
J. Andrew Rogers
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?