Re: [PATCHES] Interval input: usec, msec

2007-05-29 Thread Michael Glaesemann


On May 29, 2007, at 0:06 , Neil Conway wrote:


Applied to HEAD, backported to 8.2 and 8.1


One thing I noticed when looking over the patch is that there are a  
few bare numbers in datetime.c such as 10, 1000, 1e-3, and 1e-6.  
In timestamp.[hc] we've defined macros for conversions such as  
#define #define USECS_PER_SEC	INT64CONST(100)


I'd like to work up a patch that would add similar macros for  
datetime.c, in particular using the INT64CONST construction where  
appropriate. Thoughts?


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [pgsql-patches] Ctid chain following enhancement

2007-05-29 Thread Zdenek Kotala

Pavan Deolasee wrote:


On 1/28/07, *Tom Lane* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
wrote:


OTOH it might be
cleaner to refactor things that way, if we were going to apply this.


Here is a revised patch which includes refactoring of
heap_get_latest_tid(), as per Tom's suggestion.



I'm looking on your patch. I have one comment:

If you have old tid and new tid you can easy compare if new tid points 
to different page? And if page is still same there is no reason to 
Unlock it and lock again. I think add inner loop something like:



Readbufer
Lock
do{

...

} while(ctid.block_id == tid.block_id)
ReleaseAndUnlock

can save some extra locking/unlocking cycle. What do you think?


Zdenek

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] Seq scans status update

2007-05-29 Thread Alvaro Herrera
Tom Lane wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
  Is there a reason UnpinBuffer has to be the one to increment the usage count
  anyways? Why can't ReadBuffer handle incrementing the count and just trust
  that it won't be decremented until the buffer is unpinned anyways?
 
 That's a good question.  I think the idea was that if we hold a buffer
 pinned for awhile (long enough that the bgwriter's clock sweep passes
 over it one or more times), we want the usage count decrementing to
 start when we release the pin, not when we acquire it.  But maybe that
 could be fixed if the clock sweep doesn't touch the usage_count of a
 pinned buffer.  Which in fact it may not do already --- didn't look.

It does -- in BgBufferSync the all scan calls SyncOneBuffer with
skip_pinned=false.  The lru scan does skip pinned buffers.

-- 
Alvaro Herrera  Developer, http://www.PostgreSQL.org/
World domination is proceeding according to plan(Andrew Morton)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Concurrent psql patch

2007-05-29 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Andrew Dunstan [EMAIL PROTECTED] writes:
 if (pset.c-db-asyncStatus != PGASYNC_BUSY)
 {
 break;
 }

 There already is a defined API for this, namely PQisBusy().

 In any case, I rather concur with the XXX comment: busy-waiting like
 this sucks.  The correct way to do this is to get the socket numbers for
 the connections (via PQsocket), wait for any of them to be read-ready
 according to select() (or for the timeout to elapse, assuming that we
 think that behavior is good), then cycle through PQconsumeInput() and
 PQisBusy() on each connection.  See
 http://www.postgresql.org/docs/8.2/static/libpq-async.html

Huh, so it turns out we already have code that does exactly this in
pqSocketPoll and pqSocketCheck. Except that they have too little resolution
because they work with time_t which means we would have to wait at least 1-2
seconds.

And pqSocketCheck keeps looping when it gets an EINTR which doesn't seem like
the right thing for psql to do.

It would be nice to use these functions though because:

a) They get the SSL case right in that that they check the SSL buffer before
   calling select/poll.

b) They use poll if available and fall back to select

c) they would keep the select/poll system code out of psql where there's none
   of it currently.

So would I be better off adding a PQSocketPollms() which works in milliseconds
instead of seconds? Or should I just copy all this code into psql?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Seq scans status update

2007-05-29 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:

Is there a reason UnpinBuffer has to be the one to increment the usage count
anyways? Why can't ReadBuffer handle incrementing the count and just trust
that it won't be decremented until the buffer is unpinned anyways?

That's a good question.  I think the idea was that if we hold a buffer
pinned for awhile (long enough that the bgwriter's clock sweep passes
over it one or more times), we want the usage count decrementing to
start when we release the pin, not when we acquire it.  But maybe that
could be fixed if the clock sweep doesn't touch the usage_count of a
pinned buffer.  Which in fact it may not do already --- didn't look.


It does -- in BgBufferSync the all scan calls SyncOneBuffer with
skip_pinned=false.  The lru scan does skip pinned buffers.


You're looking at the wrong place. StrategyGetBuffer drives the clock 
sweep, and it always decreases the usage_count, IOW it doesn't skip 
pinned buffers. SyncOneBuffer and BgBufferSync don't decrease the 
usage_count in any case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Tom Lane
Updated version of Heikki's buffer ring patch, as per my comments here:
http://archives.postgresql.org/pgsql-patches/2007-05/msg00449.php

The COPY IN part of the patch is not there, pending resolution of
whether we think it adds enough value to be worth uglifying
heap_insert's API for.  Also, I tentatively reduced the threshold
at which heapscans switch to ring mode to NBuffers/16; that probably
needs more thought.  Lastly, I haven't done anything about making
non-btree indexes honor the access strategy during VACUUM scans.

regards, tom lane



binlsHkz85l0G.bin
Description: buffer-ring-2.patch.gz

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-29 Thread Heikki Linnakangas

Greg Smith wrote:
I'll take another stab at refining this can of worms I opened.  The one 
thing I noticed on a quick review is that it's almost possible to skip 
all the calls to gettimeofday if log_checkpoints is off now.  I'd like 
to make that a specific goal, because that will make me feel better that 
adding this code has almost no performance impact relative to now unless 
you turn the feature on.


Saving a couple of gettimeofday calls on an event that happens as 
infrequently as checkpoints is not going to make any difference. 
Especially if you compare it to all the other work that's done on 
checkpoint.


I agree with Simon that tracking create/drop separately is unnecessary. 
As for why all the timing info is in ms, given the scale of the numbers 
typically encountered I found it easier to work with.  I originally 
wanted resolution down to 0.1ms if the underlying OS supports it, which 
means 4 figures to the right of the decimal point if the unit was 
switched to seconds.  Quite often the times reported are less than 
100ms, so you'll normally be dealing with fractional part of a second.  
If we take Heikki's example:


LOG:  checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms 
sync=32.0 ms


And switch it to seconds:

LOG:  checkpoint complete; buffers written=3.1 MB (9.6%) write=0.0968 ms 
sync=0.0320 ms


I don't find that as easy to work with.  The only way a timing in 
seconds would look OK is if the resolution of the whole thing is reduced 
to ms, which then makes 3 decimal points--easy to read as ms instead.  
Having stared at a fair amount of this data now, that's probably fine; 
I'll collect up some more data on it from a fast server this week to 
confirm whether's it's worthless precision or worth capturing.


The checkpoint will take at least a couple of seconds on any interesting 
system, so 0.1 s resolution should be enough IMHO.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Heikki Linnakangas

Tom Lane wrote:

 Also, I tentatively reduced the threshold
at which heapscans switch to ring mode to NBuffers/16; that probably
needs more thought.  


Yeah. One scenario where threshold  shared_buffers will hurt is if your 
shared_buffers = RAM size / 2. In that scenario, a scan on a table that 
would barely fit in shared_buffers, will use the ring instead and not 
fit in the OS cache either. Which means that repeatedly scanning that 
table will do physical I/O with the patch, but not without it.


swappiness, using linux terms, also makes a difference. When I started 
testing the patch, I saw unexpectedly high gains from the patch with the 
following configuration:

- RAM size 4 GB
- shared_buffers 1 GB
- table size 3GB

Without the patch, the table wouldn't fit in shared_buffers, and also 
wouldn't fit in the OS cache, so repeatedly scanning the table always 
read the table physically from disk, and it took ~20 seconds. With the 
patch, however, the ring only actively used a few pages from 
shared_buffers, and the kernel swapped out the rest. Thanks to that, 
there was more than 3GB of RAM available for OS caching, the table fit 
completely in the OS cache, and the query took  2 seconds. It took me 
quite a while to figure out what's going on.



Lastly, I haven't done anything about making
non-btree indexes honor the access strategy during VACUUM scans.


Also there's no attempt to not inflate usage_count, which means that 
synchronized scans will spoil the buffer cache as if we didn't have the 
buffer ring patch. If there's no easy solution, I think we could live 
with that, but Greg's suggestion of bumping the usage_count in PinBuffer 
instead of UnpinBuffer sounds like a nice solution to me.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PATCHES] OS X startup script patch

2007-05-29 Thread Les Hill

Hi,

I recently built and installed postgres 8.2.4 on my MBP (10.4.9).
Thanks for the great work!

The existing startup script worked with one tweak, the rotate logs
command was not redirecting stderr to the log.  A patch generated with
the make_diff scripts is attached.

--
Les Hill
[EMAIL PROTECTED]


osx-start.patch
Description: Binary data

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Also there's no attempt to not inflate usage_count, which means that 
 synchronized scans will spoil the buffer cache as if we didn't have the 
 buffer ring patch.

As I said, these patches are hardly independent.

 If there's no easy solution, I think we could live 
 with that, but Greg's suggestion of bumping the usage_count in PinBuffer 
 instead of UnpinBuffer sounds like a nice solution to me.

After thinking about it more, I'm a bit hesitant to do that because it
will change the interaction with the clock sweep for buffers that stay
pinned for awhile.  I had suggested making the clock sweep not decrement
usage_count of a pinned buffer, but I think that would change the
fairness of the algorithm.  OTOH it may not matter that much if we just
move the usage_count increment and leave the clock sweep alone.  Do we
have any decent way of measuring the effectiveness of the clock-sweep
allocation algorithm?

I also thought about having ReadBuffer decrement the usage count when it
has a nondefault strategy and finds the buffer already in cache; this
would then cancel out the later unconditional increment in UnpinBuffer.
But that makes twice as many cycles spent holding the buffer spinlock.

Either one of these methods would require PinBuffer to be aware of the
strategy argument, which it is not at present.  OTOH with the first way
we could get rid of the normalAccess argument to UnpinBuffer, so
there's some net conservation of cruft I guess.  I think I had
originally given this task to UnpinBuffer on the theory that we'd have
better information at unpin time than pin time about what the buffer
state had been and thus be able to make smarter decisions about whether
to bump the access count or not.  But at the moment it doesn't seem that
we really need any such info; AFAICS all the callers of PinBuffer know
what they want to happen.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Regression tests

2007-05-29 Thread Magnus Hagander
Joachim Wieland attempted to post this patch, but it appears to be gone.
I tried a repost, and notivced it got rejected because it was 100kb.
Let me repeat previous objections that it really should be possible to
post a patch 100kb.
That said, here's a gzipped version.

Joachim, once it comes through, feel free to post whatever comments you
had in your original mail.

//Magnus


pg_regression_msvc.3.diff.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Tom Lane
I wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 If there's no easy solution, I think we could live 
 with that, but Greg's suggestion of bumping the usage_count in PinBuffer 
 instead of UnpinBuffer sounds like a nice solution to me.

 After thinking about it more, I'm a bit hesitant to do that because it
 will change the interaction with the clock sweep for buffers that stay
 pinned for awhile.  I had suggested making the clock sweep not decrement
 usage_count of a pinned buffer, but I think that would change the
 fairness of the algorithm.  OTOH it may not matter that much if we just
 move the usage_count increment and leave the clock sweep alone.  Do we
 have any decent way of measuring the effectiveness of the clock-sweep
 allocation algorithm?

Despite above misgivings, here's a version of the patch that moves
usage_count incrementing to PinBuffer instead of UnpinBuffer.  It does
seem a good bit cleaner.

regards, tom lane



binalDuLkt1Ft.bin
Description: buffer-ring-3.patch.gz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Regression tests

2007-05-29 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Joachim Wieland attempted to post this patch, but it appears to be gone.

I trust the applied version will contain neither Windows newlines nor
non-English comments.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Greg Smith

On Tue, 29 May 2007, Tom Lane wrote:

Do we have any decent way of measuring the effectiveness of the 
clock-sweep allocation algorithm?


I put a view on top of the current pg_buffercache (now that it include 
usage_count) that shows what the high usage_count buffers consist of. 
Since they were basically what I hoped for (like plenty of index blocks on 
popular tables) that seemed a reasonable enough measure of effectiveness 
for my purposes.  I briefly looked into adding some internal measurements 
in this area, like how many buffers are scanned on average to satisfy an 
allocation request; that would actually be easy to add to the buffer 
allocation stats part of the auto bgwriter_max_pages patch I submitted 
recently.


Based on my observations of buffer cache statistics, the number of pinned 
buffers at any time is small enough that in a reasonably sized buffer 
cache, I wouldn't expect a change in the pinned usage_count behavior to 
have any serious impact.  With what you're adjusting, the only time I can 
think of that there would be a noticable shift in fairness would be if 
ones buffer cache was very small relative to the number of clients, which 
is kind of an unreasonable situation to go out of your way to accommodate.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 Based on my observations of buffer cache statistics, the number of pinned 
 buffers at any time is small enough that in a reasonably sized buffer 
 cache, I wouldn't expect a change in the pinned usage_count behavior to 
 have any serious impact.

Fair enough.  The patch I put up earlier tonight bumps usage_count at
PinBuffer instead of UnpinBuffer time, and leaves the clock sweep
behavior unchanged, which means that a buffer that had stayed pinned for
more than a clock-sweep cycle time could get recycled almost instantly
after being unpinned.  That seems intuitively bad.  If we make the clock
sweep code not decrement usage_count of a pinned buffer then the problem
goes away.  I had expressed some discomfort with that idea, but I've got
to admit that it's only a vague worry not anything concrete.  Barring
objections I'll adjust the patch to include the clock-sweep change.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-29 Thread Greg Smith

On Tue, 29 May 2007, Heikki Linnakangas wrote:

The checkpoint will take at least a couple of seconds on any interesting 
system, so 0.1 s resolution should be enough IMHO.


You may be underestimating the resources some interesting systems are 
willing to put into speeding up checkpoints.  I'm sometimes dumping into a 
SAN whose cache is bigger than the shared_buffer cache in the server, and 
0.1s isn't really enough resolution in that situation.  A second is a 
really long checkpoint there.  Since even that's limited by fiber-channel 
speeds, I know it's possible to do better than what I'm seeing with 
something like a PCIe host adapter having on-board cache in the GB range 
(which isn't that expensive nowadays).


Also, even if the checkpoint total takes seconds, much of that is in the 
sync phase; the write time can still be in the small number of ms range, 
and I wouldn't want to see that truncated too much.


Anyway, I have a bunch of data on this subject being collected at this 
moment, and I'll rescale the results based on what I see after analyzing 
that this week.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend