Greg Stark <[EMAIL PROTECTED]> writes:
> Actually forcing things to use indexes is the wrong direction to go if you're
> trying to process lots of data and want to stream it off disk as rapidly as
> possible. I would think about whether you can structure your data such that
> you can use sequential scans.

Agreed.

> In your application that might be hard. It sounds like you would need more or
> less one table per stock ticker which would really be hard to manage.

Actually, in a previous lifetime I used to do pretty much the same stuff
Matt is working on.  The reason I suggested parallelizing is that what
you want is usually not so much the 200day moving average of FOO, as the
200day moving averages of a whole bunch of things.  If your input table
contains time-ordered data for all those things, then a seqscan works
out pretty well.

> One thing you might look into is using the CLUSTER command. But postgres
> doesn't maintain the cluster ordering so it would require periodically
> rerunning it.

If the desired sort order is time-based, it falls out pretty much for
free in this application, because historical data doesn't change -- you
are only interested in appending at the right.

In said previous lifetime, we used Postgres for tracking our actual
transactions, but we built a custom file format for storing the
individual tick data.  That's not stuff you need transactional semantics
for; the historical data is what it is.  Besides, you need to compress
it as much as you can because there's always too much of it.  Machines
are faster and disk space cheaper than they were at the time, but I'd
still question the wisdom of using a Postgres row for each daily bar,
let alone finer-grain data.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to