Matt Olson <[EMAIL PROTECTED]> writes:

> I've done other things that make sense, like using indexes, playing with the
> planner constants and turning up the postgres cache buffers.
> 
> Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no
> apparent difference in database performance.  The random nature of the I/O
> drops disk reads down to about 1MB/sec for the array.  A linear table scan
> can easily yield 70-80MB/sec on this system.  Total table size is usually
> around 1GB and with indexes should be able to fit completely in main memory.

Actually forcing things to use indexes is the wrong direction to go if you're
trying to process lots of data and want to stream it off disk as rapidly as
possible. I would think about whether you can structure your data such that
you can use sequential scans. That might mean partitioning your raw data into
separate tables and then accessing only the partitions that are relevant to
the query.

In your application that might be hard. It sounds like you would need more or
less one table per stock ticker which would really be hard to manage.

One thing you might look into is using the CLUSTER command. But postgres
doesn't maintain the cluster ordering so it would require periodically
rerunning it.

I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s
even for completely random reads. Is it possible you're seeing something else
interfering? Do you have INSERT/UPDATE/DELETE transactions happening
concurrently with this select scan? If so you should strongly look into
separating the transaction log from the data files.

-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Reply via email to