Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-28 Thread PFC

(For those not knowing - it's ReadFile/WriteFile where you pass an array
of this many bytes to this address as parameters)
Isn't that like the BSD writev()/readv() that Linux supports also?  Is
that something we should be using on Unix if it is supported by the OS?
	Nope, readv()/writev() read/write from/to the file sequentially to/from a  
list of buffers in memory. The Windows calls read/write at random file  
offsets to/from a list of buffers.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-21 Thread Merlin Moncure
 Magnus Hagander wrote:
  I don't think that's correct either. Scatter/Gather I/O is used to
SQL
  Server can issue reads for several blocks from disks into it's own
  buffer cache with a single syscall even if these buffers are not
  sequential. It did make significant performance improvements when
they
  added it, though.
 
  (For those not knowing - it's ReadFile/WriteFile where you pass an
array
  of this many bytes to this address as parameters)
 
 Isn't that like the BSD writev()/readv() that Linux supports also?  Is
 that something we should be using on Unix if it is supported by the
OS?

readv and writev are in the single unix spec...and yes they are
basically just like the win32 versions except that that are synchronous
(and therefore better, IMO).

On some systems they might just be implemented as a loop inside the
library, or even as a macro.

http://www.opengroup.org/onlinepubs/007908799/xsh/sysuio.h.html

On operating systems that optimize vectored read operations, it's pretty
reasonable to assume good or even great performance gains, in addition
to (or instead of) recent changes to xlog.c to group writes together for
a file...it just takes things one stop further.

Is there a reason why readv/writev have not been considered in the past?

Merlin

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-21 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 Is there a reason why readv/writev have not been considered in the past?

Lack of portability, and lack of obvious usefulness that would justify
dealing with the lack of portability.

I don't think there's any value in trying to write ordinary buffers this
way; making the buffer manager able to write multiple buffers at once
sounds like a great deal of complexity and deadlock risk in return for
not much.  It might be an alternative to the existing proposed patch for
writing multiple WAL buffers at once, but frankly I consider that patch
a waste of effort.  In real scenarios you seldom get to write more than
one WAL page without a forced sync occurring because someone committed.
Even if *your* transaction is long, any other backend committing a small
transaction still fsyncs.  On top of that, the bgwriter will be flushing
WAL in order to maintain the write-ahead rule any time it dumps a dirty
buffer.  I have a personal to-do item to make the bgwriter explicitly
responsible for writing completed WAL pages as part of its duties, but
I haven't done anything about it because I think that it will write lots
of such pages without any explicit code, thanks to the bufmgr's LSN
interlock.  Even if it doesn't get it done that way, the simplest answer
is to add a little bit of code to make sure bgwriter generally does the
writes, and then we don't care.

If you want to experiment with writev, feel free, but I'll want to see
demonstrable performance benefits before any such code actually goes in.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-21 Thread Ron Mayer
Merlin Moncure wrote:
readv and writev are in the single unix spec...and yes ...
On some systems they might just be implemented as a loop inside the
library, or even as a macro.
You sure?
Requirements like this:
 http://www.opengroup.org/onlinepubs/007908799/xsh/write.html
 Write requests of {PIPE_BUF} bytes or less will not be
  interleaved with data from other processes doing writes
  on the same pipe.
make me think that it couldn't be just a macro; and if it
were a loop in the library it seems it'd still have to
make sure it's done with a single write system call.
(yeah, I know that requirement is just for pipes; and I
suppose they could write a loop for normal files and a
different special case for pipes; but I'd be surprised).
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-18 Thread Bruce Momjian
Magnus Hagander wrote:
 I don't think that's correct either. Scatter/Gather I/O is used to SQL
 Server can issue reads for several blocks from disks into it's own
 buffer cache with a single syscall even if these buffers are not
 sequential. It did make significant performance improvements when they
 added it, though.
 
 (For those not knowing - it's ReadFile/WriteFile where you pass an array
 of this many bytes to this address as parameters)

Isn't that like the BSD writev()/readv() that Linux supports also?  Is
that something we should be using on Unix if it is supported by the OS?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-15 Thread Merlin Moncure
 It seems inevitable that Postgres will eventually eliminate that
redundant
 layer of buffering. Since mmap is not workable, that means using
O_DIRECT
 to
 read table and index data.

What about going the other way and simply letting the o/s do all the
caching?  How bad (or good) would the performance really be?  

Merlin

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Rod Taylor
 My concern is that this kind of testing has very little relevance to the 
 real world of multiuser processing where contention for the cache becomes an 
 issue.  It may be that, at least in the current situation, postgres is 
 giving too much weight to seq scans based on single user, straight line 

To be fair, a large index scan can easily throw the buffers out of whack
as well. An index scan on 0.1% of a table with 1 billion tuples will
have a similar impact to buffers as a sequential scan of a table with 1
million tuples.

Any solution fixing buffers should probably not take into consideration
the method being performed (do you really want to skip caching a
sequential scan of a 2 tuple table because it didn't use an index) but
the volume of data involved as compared to the size of the cache.

I've often wondered if a single 1GB toasted tuple could wipe out the
buffers. I would suppose that toast doesn't bypass them.
-- 
Rod Taylor [EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Christopher Browne
The world rejoiced as [EMAIL PROTECTED] (Mark Aufflick) wrote:
 Hi All,

 I have boiled my situation down to the following simple case:
 (postgres version 7.3)

 * Query 1 is doing a sequential scan over a table (courtesy of field
 ILIKE 'foo%') and index joins to a few others
 * Query 2 is doing a functional index scan over the same table
 (lower(field) LIKE 'foo%') and index joins to a few others
 * neither query has an order by clause
 * for the purpose of testing, both queries are designed to return the
 same result set

 Obviously Q2 is faster than Q1, but if I ever run them both at the
 same time (lets say I run two of Q1 and one of Q2 at the same time)
 then Q2 consistently returns WORSE times than Q1 (explain analyze
 confirms that it is using the index).

 My assumption is that the sequential scan is blowing the index from
 any cache it might live in, and simultaneously stealing all the disk
 IO that is needed to access the index on disk (the table has 200,000
 rows).

There's something to be said for that...

 If I simplify the case to not do the index joins (ie. operate on the
 one table only) the situation is not as dramatic, but similar.

 My thoughts are:

 1) kill the sequential scan - but unfortunately I don't have direct
 control over that code

This is a good choice, if plausible...

 2) change the way the server allocates/prioritizes different caches -
 i don't know enough about how postgres caches work to do this (if it's
 possible)

That's what the 8.0 cache changes did...  Patent claim issues are
leading to some changes to the prioritization, which is liable to
change 8.0.something and 8.1.

 3) try it on postgres 7.4 - possible, but migrating the system to 7.4
 in production will be hard because the above code that I am not
 responsible for has a lot of (slightly wacky) implicit date casts

Moving to 7.4 wouldn't materially change the situation; you'd have to
go all the way to version 8.
-- 
(format nil [EMAIL PROTECTED] cbbrowne gmail.com)
http://linuxdatabases.info/~cbbrowne/postgresql.html
Rules of the Evil Overlord #32. I will not fly into a rage and kill a
messenger who brings me bad news  just to illustrate how evil I really
am. Good messengers are hard to come by.
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Iain
Hi Rod,
 
 Any solution fixing buffers should probably not take into consideration
 the method being performed (do you really want to skip caching a
 sequential scan of a 2 tuple table because it didn't use an index) but
 the volume of data involved as compared to the size of the cache.

Yes, in fact indexes aren't so different to tables really in that regard.

It sounds like version 8 may help out anyway.

regards
Iain

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings