On 6/29/05, Rudi Starcevic <[EMAIL PROTECTED]> wrote: > Hi, > > >I do my batch processing daily using a python script I've written. I > >found that trying to do it with pl/pgsql took more than 24 hours to > >process 24 hours worth of logs. I then used C# and in memory hash > >tables to drop the time to 2 hours, but I couldn't get mono installed > >on some of my older servers. Python proved the fastest and I can > >process 24 hours worth of logs in about 15 minutes. Common reports run > >in < 1 sec and custom reports run in < 15 seconds (usually). > > > > > > When you say you do your batch processing in a Python script do you mean > a you are using 'plpython' inside > PostgreSQL or using Python to execut select statements and crunch the > data 'outside' PostgreSQL? > > Your reply is very interesting.
Sorry for not making that clear... I don't use plpython, I'm using an external python program that makes database connections, creates dictionaries and does the normalization/batch processing in memory. It then saves the changes to a textfile which is copied using psql. I've tried many things and while this is RAM intensive, it is by far the fastest aproach I've found. I've also modified the python program to optionally use disk based dictionaries based on (I think) gdb. This signfincantly increases the time to closer to 25 min. ;-) but drops the memory usage by an order of magnitude. To be fair to C# and .Net, I think that python and C# can do it equally fast, but between the time of creating the C# version and the python version I learned some new optimization techniques. I feel that both are powerful languages. (To be fair to python, I can write the dictionary lookup code in 25% (aprox) fewer lines than similar hash table code in C#. I could go on but I think I'm starting to get off topic.) -- Matthew Nuzum www.bearfruit.org ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster