On Aug 19, 2005, at 12:14 AM, Mark Cotner wrote:

Excellent feedback. Thank you. Please do keep in mind I'm storing the results of SNMP queries. The majority of the time each thread is in a wait state, listening on a UDP port for return packet. The number of threads is high because in order to sustain poll speed I need to minimize the impact of
timeouts and all this waiting for return packets.

Asynchronous IO via select/poll/etc. basically says: "given these 100 sockets, wake me up when any of them has something to tell me, or wake me up anyway in N milliseconds". From one thread, you can usually deal with thousands of connections without breaking a sweat, where with thread-per-connection you have so much overhead just for the threads that you probably run out of RAM before your network is throttled. The reactor pattern basically just abstracts this a bit so that you worry about what do to when the sockets have something to say, and also allow you to schedule timed events, rather than having to worry about how to implement that correctly *and* write your application.

With 100 threads you are basically invoking a special-case of the same mechanism that only looks at one socket, but this makes for 100 different data structures that end up in both userspace and kernel space, plus the thread stacks (which can easily be a few megs each) and context switching when any of them wakes up.. You're throwing a lot of RAM and CPU cycles out the window by using this design.

Also, preemptive threads are hard.

I had intended to have a fallback plan which would build a thread safe queue for db stuffs, but the application isn't currently architected that way.
It's not completely built yet so now is the time for change.  I hadn't
thought of building up a batch of queries and creating a transaction from

It should be *really* easy to just swap out the implementation of your "change this record" function with one that simply puts its arguments on a queue, with another thread that gets them from the queue and actually does the work.

I've been looking into memcached as a persistent object store as well and hadn't seen the reactor pattern yet. Still trying to get my puny brain
around that one.

memcached is RAM based, it's not persistent at all... unless you are sure all of your nodes will be up at all times and will never go down. IIRC, it also just starts throwing away data once you hit its size limit. If course, this isn't really any different than MySQL's MyISAM tables if you hit the row limit, but I think that memcached might not even give you an error when this happens. Also, memcached is just key/value pairs over a network, not much of a database going on there.

If you can fit all this data in RAM and you don't care so much about the integrity, you might not benefit much from a RDBMS at all. However, I don't really know what you're doing with the data once you have it so I might be very wrong here...


Again, thanks for the help.


On 8/19/05 5:09 AM, "Bob Ippolito" <[EMAIL PROTECTED]> wrote:

On Aug 18, 2005, at 10:24 PM, Mark Cotner wrote:

I'm currently working on an application that will poll
thousands of cable modems per minute and I would like
to use PostgreSQL to maintain state between polls of
each device.  This requires a very heavy amount of
updates in place on a reasonably large table(100k-500k
rows, ~7 columns mostly integers/bigint).  Each row
will be refreshed every 15 minutes, or at least that's
how fast I can poll via SNMP.  I hope I can tune the
DB to keep up.

The app is threaded and will likely have well over 100
concurrent db connections.  Temp tables for storage
aren't a preferred option since this is designed to be
a shared nothing approach and I will likely have
several polling processes.

Somewhat OT, but..

The easiest way to speed that up is to use less threads.  You're
adding a whole TON of overhead with that many threads that you just
don't want or need.  You should probably be using something event-
driven to solve this problem, with just a few database threads to
store all that state.  Less is definitely more in this case.  See
<http://www.kegel.com/c10k.html> (and there's plenty of other
literature out there saying that event driven is an extremely good
way to do this sort of thing).

Here are some frameworks to look at for this kind of network code:
(Python) Twisted - <http://twistedmatrix.com/>
(Perl) POE - <http://poe.perl.org/>
(Java) java.nio (not familiar enough with the Java thing to know
whether or not there's a high-level wrapper)
(C++) ACE - <http://www.cs.wustl.edu/~schmidt/ACE.html>
(Ruby) IO::Reactor - <http://www.deveiate.org/code/IO-Reactor.html>
(C) libevent - <http://monkey.org/~provos/libevent/>

.. and of course, you have select/poll/kqueue/WaitNextEvent/whatever
that you could use directly, if you wanted to roll your own solution,
but don't do that.

If you don't want to optimize the whole application, I'd at least
just push the DB operations down to a very small number of
connections (*one* might even be optimal!), waiting on some kind of
thread-safe queue for updates from the rest of the system.  This way
you can easily batch those updates into transactions and you won't be
putting so much unnecessary synchronization overhead into your
application and the database.

Generally, once you have more worker threads (or processes) than
CPUs, you're going to get diminishing returns in a bad way, assuming
those threads are making good use of their time.


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to