Re: [Pytables-users] pytables or pyroot?

David Fokkema Fri, 18 Sep 2009 06:25:39 -0700

On Thu, 2009-07-30 at 10:32 -0400, Kenneth Arnold wrote:
> On Thu, Jul 30, 2009 at 9:00 AM, Francesc Alted<[email protected]> wrote:
> > A Thursday 30 July 2009 14:32:42 escriguéreu:
> >> Yes, but I can't figure out how to have multiple apache processes
> >> connect to the same Queue object.
> >
> > Ah, correct, you are using different *processes*, not threads.  Well, I 
> > think
> > that some kind of communications package must be used then.  There are 
> > plenty
> > of options, but Pyro [1] or Ice [2] (I recently was told about this), seems 
> > to
> > be powerful and easy enough to program.  If you want more performance, you 
> > may
> > want to use MPI via mpi4py [3], but I don't really think you are going to 
> > need
> > this.
> >
> > [1] http://pyro.sourceforge.net/
> > [2] http://www.zeroc.com/icepy.html
> > [3] http://mpi4py.scipy.org/
> 
> Insert obligatory warning against over-engineering here: simple
> problems should have simple solutions.


I wholeheartedly agree. Our 'new' database scheme took me two years to
fix after delivery by a professional, is very complex, did not scale as
well as promised, and is pretty much everything an over-engineered
solution might be. We've now officially dropped it. I got to keep the
pieces, though, ;-)

> Simplest: a single-threaded, single-process Python server that
> directly handles the HTTP input and writes to PyTables. Concurrent
> requests just have to wait. Downside: a slow client can tie up the
> server for a long time. Multithreading/multiprocessing (both in the
> Python standard library) can help, but if that's an issue, try:

Indeed, the simplest one. This won't work due to the fact that we have a
_very_ fast connection (we're connected to a router with a 10 m. optic
fiber into AMS-IX, one floor up) and the schools only have a ADSL
connection (slow upload). A saturated school connection will tie up all
our resources.

> Also pretty simple: a lightweight mod_python or fcgi script, written
> with, say, Django/CherryPy/web.py, that buffers the data in some
> temporary place while waiting for it to be written. Could be in memory
> or a file, or even a conventional relational database. Then the
> PyTables writer process just needs to know about that data. Files are
> easy; when you're done writing a file, move it into an "incoming"
> directory; then the PyTables writer can just poll 'incoming' for a
> file, process it, move it out of the way, repeat.

Whereas this is indeed pretty simple (we can just use pickle for example
to 'move' the python objects around using disk) this is basically a
custom message passing interface. If this will indeed be simpler than
just using an existing library, I don't know. However, reading through
some documents, some of the above-mentioned libraries are huge beasts!

> If you're concerned about being able to report failure, you have to
> consider all the possible points of failure. The first solution has
> very simple failure reporting: "I wrote this to PyTables" or "I
> didn't". The second is a two-stage process, where all the client can
> report is "I passed this on to the writer process". But if your buffer
> is somewhere persistent and reliable (like disk), that's perhaps a
> _better_ report: even if the PyTables db gets corrupted somehow, you
> still have the data at least until you clean out the old stuff (which
> you can do after backing up the HDF5 file, for example).

Good point. If we have it on disk _somewhere_, the client can drop its
data.

> >> Not innovative as opposed to Pro, but OPSI itself seems to be
> >> innovative. And fast, apparently. Oh well, certainly don't want you to
> >> stop working on pytables, :-)
> >
> > Thank you :-)
> 
> OPSI, on my brief look at it, seemed to be optimized for write-once,
> read-many. There are many other scenarios possible; for example, we
> have one scenario that requires checking if an item is already stored
> before writing it. The re-indexing that OPSI would require would hurt
> performance, though there may be ways around that. The point: if you
> know your problem well, you can probably make a more efficient
> implementation of just about anything than commercial general-purpose
> products.

Interesting. How _do_ you check for already-stored items if OPSI is too
slow?

Best regards,

David


------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] pytables or pyroot?

Reply via email to