I've got an application that needs to chunk through ~2GB of data. The data is ~7000 different sets of 300 records each. I put all of the data into a postgres database but that doesn't look like its going to work because of how the data lives on the disk.

When the app runs on a 500 Mhz G4 the CPU is 30% idle... the processing application eating about 50%, postgres taking about 10%. I don't know how to tell for sure but it looks like postgres is blocking on disk i/o.

For a serial scan of the postgres table (e.g. "select * from datatable"), "iostat" reports 128K per transfer, ~140 tps and between 14 and 20 MB/s from disk0 - with postgres taking more than 90% CPU.

If I then run a loop asking for only the 300 records at a time (e.g. "select from datatable where group_id='123'"), iostat reports 8k per transfer, ~200 tps, less than 1MB/s throughput and postgres taking ~10% CPU. (There is an index defined for group_id and EXPLAIN says it's being used.)

So I'm guessing that postgres is jumping all over the disk and my app is just waiting on data. Is there a way to fix this? Or should I move to a scientific data file format like NCSA's HDF?

I need to push new values into each of the 7000 datasets once or twice a day and then read-process the entire data set as many times as I can in a 12 hour period - nearly every day of the year. Currently there is only single table but I had planned to add several others.

- Chris

---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

Reply via email to