Re: [PERFORM] Prefetch

Matt Olson Tue, 10 May 2005 11:39:12 -0700

My postgres binaries and WAL are on a separate disk from the raid array.  The 
table I'm doing the selects from is probably about 4GB in size and 18-20 
million records.  No concurrent or dependent inserts or deletes are going on.


Tom's point and your points about optimizing the application are well taken.  
I know my approach is sub optimal and prone to getting caught by latency 
issues (seek times, cache hit rates, etc.).  However, the question of 
prefetch in my mind is all about eliminating latencies, so, I thought my 
problem would be good for the sake of discussing prefetching.

The two approaches I'm in the process of testing are Rod and Greg's suggestion 
of using 'CLUSTER'.  And for the sake of not letting a good idea get away, 
I'll probably spend time on doing a parallel query approach which Tom 
suggested.  

I'll report back to the list what I find and maybe do some _rough_ 
benchmarking.  This is a production app, so I can't get too much in the way 
of the daily batches.  

-- 
Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

On Tuesday 10 May 2005 11:13 am, Greg Stark wrote:
> Matt Olson writes:
> > I've done other things that make sense, like using indexes, playing with
> > the planner constants and turning up the postgres cache buffers.
> >
> > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields
> > no apparent difference in database performance.  The random nature of the
> > I/O drops disk reads down to about 1MB/sec for the array.  A linear table
> > scan can easily yield 70-80MB/sec on this system.  Total table size is
> > usually around 1GB and with indexes should be able to fit completely in
> > main memory.
>
> Actually forcing things to use indexes is the wrong direction to go if
> you're trying to process lots of data and want to stream it off disk as
> rapidly as possible. I would think about whether you can structure your
> data such that you can use sequential scans. That might mean partitioning
> your raw data into separate tables and then accessing only the partitions
> that are relevant to the query.
>
> In your application that might be hard. It sounds like you would need more
> or less one table per stock ticker which would really be hard to manage.
>
> One thing you might look into is using the CLUSTER command. But postgres
> doesn't maintain the cluster ordering so it would require periodically
> rerunning it.
>
> I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s
> even for completely random reads. Is it possible you're seeing something
> else interfering? Do you have INSERT/UPDATE/DELETE transactions happening
> concurrently with this select scan? If so you should strongly look into
> separating the transaction log from the data files.



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Re: [PERFORM] Prefetch

Reply via email to