Re: [HACKERS] Using multi-row technique with COPY

2005-11-28 Thread Martijn van Oosterhout
On Sun, Nov 27, 2005 at 05:45:31PM -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  COPY FROM can read in sufficient rows until it has a whole block worth
  of data, then get a new block and write it all with one pair of
  BufferLock calls.
 
  Comments?
 
 I don't see any way to do this without horrible modularity violations.
 The COPY code has no business going anywhere near individual buffers;
 for that matter, it doesn't even really know what a block worth of
 data is, since the tuples it's dealing with aren't toasted yet.

I wonder if you could get the same effect by having COPY copy into a
temporary table (thus no WAL, no bufferlocks) and then every hundred
rows do a SELECT INTO. Mind you, that just moves the modularity
violation elsewhere since only the bufmgr knows about the size of
buffers really.

Whatever happened to that idea to build as entire datafile with COPY or
some external tool and simply copy it into place and update the
catalog?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpl1Ql5tjcOX.pgp
Description: PGP signature


Re: [HACKERS] Using multi-row technique with COPY

2005-11-28 Thread Simon Riggs
On Mon, 2005-11-28 at 09:40 +0100, Martijn van Oosterhout wrote:
 On Sun, Nov 27, 2005 at 05:45:31PM -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   COPY FROM can read in sufficient rows until it has a whole block worth
   of data, then get a new block and write it all with one pair of
   BufferLock calls.
  
   Comments?

 Whatever happened to that idea to build as entire datafile with COPY or
 some external tool and simply copy it into place and update the
 catalog?

What's wrong with tuning the server to do this? 

Zapping the catalog as a normal operation is the wrong approach if you
want a robust system. All actions on the catalog must be under tight
control.

Most other RDBMS support a fast path loader, but all of them include
strong hooks into the main server to maintain catalog correctly. That is
one approach, but it requires creation of an external API - which seems
more work, plus a security risk. Copying data in a block at a time is
the basic technique all use.

I never discuss implementing features that other RDBMS have for any
other reason than than a similar use case exists for both. There are
many features where PostgreSQL is already ahead.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS][OT] Doubt

2005-11-28 Thread Csaba Nagy
http://acronymfinder.com/

Cheers,
Csaba.

On Fri, 2005-11-25 at 19:24, Gustavo Tonini wrote:
 What is ISTM?
 
 Sorry,
 Gustavo.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Using multi-row technique with COPY

2005-11-28 Thread Simon Riggs
On Sun, 2005-11-27 at 17:45 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  COPY FROM can read in sufficient rows until it has a whole block worth
  of data, then get a new block and write it all with one pair of
  BufferLock calls.
 
  Comments?
 
 I don't see any way to do this without horrible modularity violations.
 The COPY code has no business going anywhere near individual buffers;
 for that matter, it doesn't even really know what a block worth of
 data is, since the tuples it's dealing with aren't toasted yet.

I've taken on board your comments about modularity issues from earlier.
[I've not included anything on unique indexes, notice]

I was expecting to buffer this in the heap access method with a new
call, say, heap_bulk_insert() rather than have all that code hanging
around in COPY. A lower level routine RelationGetBufferForTupleArray can
handle the actual grunt. It can work, without ugliness.

We'd need to handle a buffer bigger than a single tuple anyway, so you
keep adding tuples until the last one tips over the edge, which then
gets saved for the next block. Heap access method knows about blocks.

We could reasonably do a test for would-be-toasted within those
routines. I should have said that this wouldn't apply if any of the
tuples require toasting, which of course has to be a dynamic test.

COPY would only need to know whether it was invoking the normal or the
bulk mode, which is reasonable, since it knows about indexes, triggers
etc.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Using multi-row technique with COPY

2005-11-28 Thread Alvaro Herrera
Simon Riggs wrote:
 On Sun, 2005-11-27 at 17:45 -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   COPY FROM can read in sufficient rows until it has a whole block worth
   of data, then get a new block and write it all with one pair of
   BufferLock calls.
  
   Comments?
  
  I don't see any way to do this without horrible modularity violations.
  The COPY code has no business going anywhere near individual buffers;
  for that matter, it doesn't even really know what a block worth of
  data is, since the tuples it's dealing with aren't toasted yet.
 
 I've taken on board your comments about modularity issues from earlier.
 [I've not included anything on unique indexes, notice]

I don't see why couldn't have an additional index access method entry
point to insert multiple rows on one call.  For an unique index, we
could return an array of entries that violated the condition (if we were
to implement such a feature for single-entry insertion.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Using multi-row technique with COPY

2005-11-28 Thread Hannu Krosing
On Mon, 2005-11-28 at 00:56 +, Simon Riggs wrote:
 On Sun, 2005-11-27 at 17:45 -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   COPY FROM can read in sufficient rows until it has a whole block worth
   of data, then get a new block and write it all with one pair of
   BufferLock calls.
  
   Comments?
  
  I don't see any way to do this without horrible modularity violations.
  The COPY code has no business going anywhere near individual buffers;
  for that matter, it doesn't even really know what a block worth of
  data is, since the tuples it's dealing with aren't toasted yet.
 
 I've taken on board your comments about modularity issues from earlier.
 [I've not included anything on unique indexes, notice]
 
 I was expecting to buffer this in the heap access method with a new
 call, say, heap_bulk_insert() rather than have all that code hanging
 around in COPY. A lower level routine RelationGetBufferForTupleArray can
 handle the actual grunt. It can work, without ugliness.
 
 We'd need to handle a buffer bigger than a single tuple anyway, so you
 keep adding tuples until the last one tips over the edge, which then
 gets saved for the next block. Heap access method knows about blocks.
 
 We could reasonably do a test for would-be-toasted within those
 routines. I should have said that this wouldn't apply if any of the
 tuples require toasting, which of course has to be a dynamic test.

If we had a buffer big enough (say 10-100x the page size), then we would
not actually need to test toasting. We can just pass the big buffer to
heap_bulk_insert() which inserts the whole buffer in as big chunks  as
needed to fill the free space on pages (with single page lock).

--
Hannu



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Using multi-row technique with COPY

2005-11-28 Thread Martijn van Oosterhout
On Mon, Nov 28, 2005 at 09:39:36AM +, Simon Riggs wrote:
 On Mon, 2005-11-28 at 09:40 +0100, Martijn van Oosterhout wrote:
  Whatever happened to that idea to build as entire datafile with COPY or
  some external tool and simply copy it into place and update the
  catalog?
 
 What's wrong with tuning the server to do this? 

It was just a question. I remember it being mentioned but didn't hear
if it had been rejected or just not implemented.

I was wondering if maybe we can approach this another way. Whenever you
create a new block by fetching with P_NEW you pretty much know that no
other backends are going to be interested in this block. Doesn't this
mean you can simply hold this block exclusively until the end of the
transaction? Hence you can avoid all the locking on this block.

If this is acheives the same reduction in block locking then it would
be active in far more places such as SELECT INTO or any large copy (I
think index creation is already special cased). Once a block has been
written once it would revert to standard locking.

Is this feasable?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgp3KpUjllTza.pgp
Description: PGP signature


[HACKERS] Getting different number of results when using hashjoin on/off

2005-11-28 Thread Mario Weilguni
I've a problem that might be a bug in the core system (hashjoins) or with ltree 
using gist-index, but I fail miserable to produce a useful testcase (using 8.1, 
worked in 8.0):

A query produces wrong (=0) results, when a different plan is enforced, I get a 
merge-join plan that looks similar, but produces the correct result (=16 rows).

I can post a queryplan, but cannot post the data itself since it's confidental 
(though I might be able to randomize some data and construct a self contained 
case, but this would take quite some time).


The working case is:
set enable_hashjoin to off;
 Seq Scan on foo1 cost=0.00..423583.57 rows=10810 width=4) (actual 
time=675.422..706.815 rows=16 loops=1)
   Filter: (subplan)
   SubPlan
 -  Merge Join  (cost=19.49..19.55 rows=1 width=0) (actual 
time=0.028..0.028 rows=0 loops=21619)
   Merge Cond: (outer.str_id = inner.id)
   -  Sort  (cost=6.49..6.50 rows=5 width=4) (actual time=0.023..0.023 
rows=0 loops=21619)
 Sort Key: bz.str_id
 -  Bitmap Heap Scan on foo2 bz  (cost=2.02..6.43 rows=5 
width=4) (actual time=0.012..0.012 rows=0 loops=21619)
   Recheck Cond: (bid = $0)
   -  Bitmap Index Scan on foo2_bid_key1  (cost=0.00..2.02 
rows=5 width=0) (actual time=0.009..0.009 rows=0 loops=21619)
 Index Cond: (bid = $0)
   -  Sort  (cost=13.00..13.01 rows=6 width=4) (actual 
time=0.002..0.003 rows=1 loops=136)
 Sort Key: str.id
 -  Bitmap Heap Scan on structure str  (cost=2.02..12.92 
rows=6 width=4) (actual time=0.095..0.097 rows=1 loops=1)
   Recheck Cond: (path ~ 
'142.2330445.2330598.2330676.*'::lquery)
   -  Bitmap Index Scan on str_uk4  (cost=0.00..2.02 
rows=6 width=0) (actual time=0.086..0.086 rows=1 loops=1)
 Index Cond: (path ~ 
'142.2330445.2330598.2330676.*'::lquery)
 Total runtime: 707.019 ms

16 rows...


The failing case is:
set enable_hashjoin to on;
 Seq Scan on foo1 cost=0.00..421679.00 rows=10810 width=4) (actual 
time=154.663..154.663 rows=0 loops=1)
   Filter: (subplan)
   SubPlan
 -  Hash Join  (cost=8.47..19.46 rows=1 width=0) (actual time=0.004..0.004 
rows=0 loops=21619)
   Hash Cond: (outer.id = inner.str_id)
   -  Bitmap Heap Scan on structure str  (cost=2.02..12.92 rows=6 
width=4) (actual time=0.100..30.095 rows=1 loops=1)
 Recheck Cond: (path ~ '142.2330445.2330598.2330676.*'::lquery)
 -  Bitmap Index Scan on str_uk4  (cost=0.00..2.02 rows=6 
width=0) (actual time=0.090..0.090 rows=1 loops=1)
   Index Cond: (path ~ 
'142.2330445.2330598.2330676.*'::lquery)
   -  Hash  (cost=6.43..6.43 rows=5 width=4) (actual time=0.032..0.032 
rows=0 loops=1)
 -  Bitmap Heap Scan on foo2 bz  (cost=2.02..6.43 rows=5 
width=4) (actual time=0.025..0.025 rows=0 loops=1)
   Recheck Cond: (bid = $0)
   -  Bitmap Index Scan on foo2_bid_key1  (cost=0.00..2.02 
rows=5 width=0) (actual time=0.021..0.021 rows=0 loops=1)
 Index Cond: (bid = $0)
 Total runtime: 154.862 ms
No rows

The query itself is quite simple:
select foo1.id
from foo1
where 
  foo1.datloesch is null
  and exists (select 1 
from foo2 bz,
 structure str
   where bz.bid=foo1.id
 and str.id = bz.str_id
 and str.path ~ '*.2330676.*'
  );

The path field is an ltree column, with an GIST index on it.


Any ideas what I could try to track this down?

Best regards,
Mario Weilguni

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Getting different number of results when using hashjoin

2005-11-28 Thread Christopher Kings-Lynne

The path field is an ltree column, with an GIST index on it.


Something to do with bitmap indexscans on lossy indexes?

Chris

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Getting different number of results when using hashjoin on/off

2005-11-28 Thread Mario Weilguni
Am Montag, 28. November 2005 14:12 schrieb Christopher Kings-Lynne:
  The path field is an ltree column, with an GIST index on it.

 Something to do with bitmap indexscans on lossy indexes?

 Chris

I doubt that, set enable_bitmapscan to off produces the wrong result as 
well.

Best regards
Mario 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] What's wrong with the lists?

2005-11-28 Thread Alvaro Herrera
Hi,

Is there something weird going on with the lists?

I ask because lately there has been spam showing up rather frequently,
something that happenned only very ocassionally.  Has there been a
change from human moderation to robotical approval of messages?

Also, my last auto-mails to pgsql-committers have been held for
moderator approval.  Maybe I should subscribe-nomail the @postgresql.org
account to the list so that it doesn't happen ... but I wonder why isn't
it working like it was before?

Not meant to criticize anyone -- I'd just like to adjust my config if
there's need to.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
The easiest way to resolve [trivial code guidelines disputes] is to fire
one or both of the people involved.  (Damian Conway)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Allow an alias for the target table in UPDATE

2005-11-28 Thread Atsushi Ogawa
I am interested in a following item in TODO.

o Allow an alias to be provided for the target table in
  UPDATE/DELETE
  This is not SQL-spec but many DBMSs allow it.

I think that this functionality is useful for the migration from
other RDBMS. However the SQL92 spec does not allow an alias for
the target table in UPDATE.
Is it still TODO? If it is TODO, I will try it.

--- Atsushi Ogawa

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Using multi-row technique with COPY

2005-11-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I don't see why couldn't have an additional index access method entry
 point to insert multiple rows on one call.

I think Simon was mainly on about the idea of inserting multiple *heap*
entries with one call, ie, only one cycle of locking a heap buffer.
It's not apparent to me that there'd be any noticeable win from batching
index insertions, because even if the heap entries are all on one page,
it's unlikely their index entries would be.

I'm a bit dubious about the whole thing from a concurrency standpoint,
too: holding locks longer is not a great thing in general.  The recent
patch to avoid repeated locking cycles during a read seems OK, because
(1) checking visibility of a tuple is ordinarily pretty cheap, and
(2) it's only a shared lock so other people can be doing the same thing
concurrently.  However, heap insertion can make neither of those claims.
You're talking about an exclusive lock, and you're talking about holding
it while copying data around and then making multiple WAL entries.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Ipv6 and Postgresql 8.0.3

2005-11-28 Thread R, Rajesh (STSD)
 
PLZ REPLY

Hello there,

I run Postgresql 8.0.3 on Tru64 Unix m/c.

I have included the ipv6 auth. line in my pg_hba.conf file(::1/128) I
keep getting error msgs from postmaster everytime I try to connect.

Going by previous posts on the topic am unable to conclude.
Does this mean pg 8.0.3 doesn't support ipv6 client auth. ??
Or is there a patch somewhere ??
Plz reply.Thanks in advance. 

--
Rajesh R

SORRY FOR THE CROSS POST last time.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Ipv6 and Postgresql 8.0.3

2005-11-28 Thread Andrew Dunstan


First, do not name the mailing list in both the To: and Cc: lines - 
that's just redundant. Second, this is the wrong list to ask this 
question, as you were told last time. Ask on the pgsql-general list.


cheers

andrew

R, Rajesh (STSD) wrote:



PLZ REPLY

Hello there,

I run Postgresql 8.0.3 on Tru64 Unix m/c.

I have included the ipv6 auth. line in my pg_hba.conf file(::1/128) I
keep getting error msgs from postmaster everytime I try to connect.

Going by previous posts on the topic am unable to conclude.
Does this mean pg 8.0.3 doesn't support ipv6 client auth. ??
Or is there a patch somewhere ??
Plz reply.Thanks in advance. 


--
Rajesh R

SORRY FOR THE CROSS POST last time.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Ipv6 and Postgresql 8.0.3

2005-11-28 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 First, do not name the mailing list in both the To: and Cc: lines - 
 that's just redundant. Second, this is the wrong list to ask this 
 question, as you were told last time. Ask on the pgsql-general list.

And third, show us the darn error messages ... the reason you weren't
answered any of the previous times is that you still haven't offered
enough information to let anyone help you.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Getting different number of results when using hashjoin on/off

2005-11-28 Thread Tom Lane
Mario Weilguni [EMAIL PROTECTED] writes:
 The failing case is:
 ...
SubPlan
  -  Hash Join  (cost=8.47..19.46 rows=1 width=0) (actual 
 time=0.004..0.004 rows=0 loops=21619)
Hash Cond: (outer.id = inner.str_id)
-  Bitmap Heap Scan on structure str  (cost=2.02..12.92 rows=6 
 width=4) (actual time=0.100..30.095 rows=1 loops=1)
  Recheck Cond: (path ~ 
 '142.2330445.2330598.2330676.*'::lquery)
  -  Bitmap Index Scan on str_uk4  (cost=0.00..2.02 rows=6 
 width=0) (actual time=0.090..0.090 rows=1 loops=1)
Index Cond: (path ~ 
 '142.2330445.2330598.2330676.*'::lquery)
-  Hash  (cost=6.43..6.43 rows=5 width=4) (actual 
 time=0.032..0.032 rows=0 loops=1)
  -  Bitmap Heap Scan on foo2 bz  (cost=2.02..6.43 rows=5 
 width=4) (actual time=0.025..0.025 rows=0 loops=1)
Recheck Cond: (bid = $0)
-  Bitmap Index Scan on foo2_bid_key1  
 (cost=0.00..2.02 rows=5 width=0) (actual time=0.021..0.021 rows=0 loops=1)
  Index Cond: (bid = $0)

Hmm, I wonder why the hash join's input nodes are showing loops=1 ...
the hash depends on the subplan parameter $0 so it needs to be
re-evaluated each time through.  It looks like that's not happening.
Do you have the corresponding results from 8.0 --- if so, what do
the loop counts look like?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread James Robinson

G'day folks.

We have a production database running 8.0.3 which gets fully  
pg_dump'd and vacuum analyze'd hourly by cron. Something strange  
happened to us on the 5AM Friday Nov. 25'th cron run -- the:


	/usr/local/pgsql/bin/vacuumdb -U postgres --all --analyze --verbose  
 $DATE/vacuum.log


step in our cron procedure never completed. Strange, since no known  
event of note happened on Friday since we were all out of the office  
past Wed. for the american Thanksgiving holiday. Anyway, running the  
vacuum line by hand shows it getting stuck -- processes the majority  
of our tables, then just stops, and the backend postmaster just stops  
accumulating CPU time.


Comparing the logs further with when it did complete, it seems that  
one table in particular (at least) seems afflicted:


social=# vacuum verbose analyze agency.swlog_client;

hangs up forever -- have to control-c the client. Likewise for w/o  
'analyze'.


pg-dump'ing the entire database works (phew!) and upon restoring on a  
backup box, said table can be vacuum'd:


social=# vacuum verbose analyze agency.swlog_client;
INFO:  vacuuming agency.swlog_client
INFO:  index swlog_client_pkey now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  swlog_client: found 0 removable, 0 nonremovable row versions  
in 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing agency.swlog_client
INFO:  swlog_client: scanned 0 of 0 pages, containing 0 live rows  
and 0 dead rows; 0 rows in sample, 0 estimated total rows

VACUUM

That's right -- completely empty table -- which is what we actually  
expect.


How should we proceed such that we can learn from this as well as we  
can proceed and get our entire database vacuuming again successfully?


Running on Linux 2.6.8-24.18-smp (SuSE 9.2). No juicy filesystem- 
related messages in dmesg nor /var/log/messages. 11% disk used on the  
postgres-related partition.



The table in question is defined as:

social=# \d agency.swlog_client;
Table agency.swlog_client
Column |  Type  | Modifiers
++---
swlog  | bigint | not null
client | bigint | not null
Indexes:
swlog_client_pkey PRIMARY KEY, btree (swlog, client)
Foreign-key constraints:
$2 FOREIGN KEY (client) REFERENCES agency.client(id)
$1 FOREIGN KEY (swlog) REFERENCES agency.swlog(id)


And the two fk'd tables:

social=# select count(*) from agency.client;
count
---
 0
(1 row)

social=# select count(*) from agency.swlog;
count
---
69

We doubt that there could be any strange oddball extremely  
longrunning transaction in any of those related tables gumming up  
this table.


Finally, the only possibly potentially interesting event database- 
wise happened on Wed. Nov. 23'rd -- we SIGHUP'd the postmaster to  
have it learn a higher value for work_mem (10240, up from default of  
1024). But the hourly crons went great for the subsequent two days.  
maintenance_work_mem is still at the default of 16384.



Many thanks in advance!
James


James Robinson
Socialserve.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread Tom Lane
James Robinson [EMAIL PROTECTED] writes:
 Comparing the logs further with when it did complete, it seems that  
 one table in particular (at least) seems afflicted:

 social=# vacuum verbose analyze agency.swlog_client;

 hangs up forever -- have to control-c the client. Likewise for w/o  
 'analyze'.

Given that it's not eating CPU time, one would guess that it's blocked
waiting for a lock.  Can you find any relevant locks in pg_locks?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Anonymous CVS working?

2005-11-28 Thread Michael Fuhr
Any problems with CVS or anonymous CVS since the work last evening?
Anonymous CVS hasn't given me the following commit yet; it's been
almost twelve hours since it was made:

http://archives.postgresql.org/pgsql-committers/2005-11/msg00553.php

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Getting different number of results when using hashjoin on/off

2005-11-28 Thread Tom Lane
Mario Weilguni [EMAIL PROTECTED] writes:
 Yes. This is from a 8.0.3 (with slightly older and different data,
 resulting in only 9 rows, but the rest is the same):

Yeah, that looks more reasonable.

I tried to reproduce this, without any luck:

regression=# explain analyze select count(*) from tenk1 a where exists (select 
1 from tenk1 b, tenk1 c where b.unique1=c.unique2 and b.hundred in (4,5) and 
c.hundred=a.hundred);
   QUERY 
PLAN

 Aggregate  (cost=3879742.37..3879742.38 rows=1 width=0) (actual 
time=46579.077..46579.082 rows=1 loops=1)
   -  Seq Scan on tenk1 a  (cost=0.00..3879729.87 rows=5000 width=0) (actual 
time=5.129..46528.208 rows=8500 loops=1)
 Filter: (subplan)
 SubPlan
   -  Hash Join  (cost=229.20..546.66 rows=2 width=0) (actual 
time=4.569..4.569 rows=1 loops=1)
 Hash Cond: (outer.unique1 = inner.unique2)
 -  Bitmap Heap Scan on tenk1 b  (cost=4.69..321.15 rows=196 
width=4) (actual time=0.947..1.698 rows=90 loops=1)
   Recheck Cond: ((hundred = 4) OR (hundred = 5))
   -  BitmapOr  (cost=4.69..4.69 rows=197 width=0) (actual 
time=0.544..0.544 rows=0 loops=1)
 -  Bitmap Index Scan on tenk1_hundred  
(cost=0.00..2.34 rows=98 width=0) (actual time=0.271..0.271 rows=100 
loops=1)
   Index Cond: (hundred = 4)
 -  Bitmap Index Scan on tenk1_hundred  
(cost=0.00..2.34 rows=98 width=0) (actual time=0.262..0.262 rows=100 
loops=1)
   Index Cond: (hundred = 5)
 -  Hash  (cost=224.26..224.26 rows=100 width=4) (actual 
time=2.370..2.370 rows=100 loops=1)
   -  Bitmap Heap Scan on tenk1 c  (cost=2.35..224.26 
rows=100 width=4) (actual time=0.492..1.616 rows=100 loops=1)
 Recheck Cond: (hundred = $0)
 -  Bitmap Index Scan on tenk1_hundred  
(cost=0.00..2.35 rows=100 width=0) (actual time=0.278..0.278 rows=100 
loops=1)
   Index Cond: (hundred = $0)
 Total runtime: 46584.654 ms
(19 rows)

(I'm not bothering with setting up an ltree index, since the question
of what index is being used shouldn't affect hashjoin's decision to
rescan or not.)

That's using 8.1 branch CVS tip, but there aren't any related bug fixes
since 8.1 release.  We did have several bug fixes in the hash join code
during the 8.1 beta cycle though ... is it possible you are really
running an 8.1 beta and not 8.1.0?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread James Robinson
As fate would have it, the vacuumdb frontend and backend which were  
initially afflicted are still in existence:


sscadmin 19236 19235  0 Nov25 ?00:00:00 /usr/local/pgsql/bin/ 
vacuumdb -U postgres --all --analyze --verbose
postgres 19244  3596  0 Nov25 ?00:00:02 postgres: postgres  
social [local] VACUUM


pid 19244.

And here's pg_locks:

social=# select * from pg_locks;
relation | database | transaction |  pid  |   mode
| granted
--+--+-+---+-- 
+-
  |  |38790657 | 19244 |  
ExclusiveLock| t
  6586066 |  6585892 | | 28406 |  
ShareUpdateExclusiveLock | f
  |  |39097312 | 28861 |  
ExclusiveLock| t
  |  |39089744 | 28756 |  
ExclusiveLock| t
  6586066 |  6585892 | | 28756 |  
ShareUpdateExclusiveLock | f
  6586066 |  6585892 | | 19244 |  
ShareUpdateExclusiveLock | t
  6586066 |  6585892 | | 19244 |  
ShareUpdateExclusiveLock | t
  8417138 |  6585892 | | 19244 |  
ShareUpdateExclusiveLock | t
16839 |  6585892 | | 28861 |  
AccessShareLock  | t
  |  |39063661 | 28560 |  
ExclusiveLock| t
  |  |39056736 | 28406 |  
ExclusiveLock| t
  6586066 |  6585892 | | 28560 |  
ShareUpdateExclusiveLock | f

(12 rows)


  pid
---
19244
28406
28560
28756
28861
(5 rows)

Of those 5 pids:

19244 -- vaccuum backend initally afflicted -- status in argv:  
'postgres: postgres social [local] VACUUM'
28406 -- a 10AM today vacuum started up by cron this morning after I  
got things half-way working again early in the diagnosis of this  
situation. args: 'postgres: postgres social [local] VACUUM waiting'
28560 -- a 10:16 today by-hand vacuum session futher in diagnosis  
land. args: 'postgres: postgres social [local] VACUUM waiting'
28756 -- 11AM cron'd process. Yes, I see a quickly mounting issue  
here. args: 'postgres: postgres social [local] VACUUM waiting'
28861 -- production servicing backend, now back in idle state. [ not  
in tx idle by regular idle ].




On Nov 28, 2005, at 11:09 AM, Tom Lane wrote:


James Robinson [EMAIL PROTECTED] writes:

Comparing the logs further with when it did complete, it seems that
one table in particular (at least) seems afflicted:



social=# vacuum verbose analyze agency.swlog_client;



hangs up forever -- have to control-c the client. Likewise for w/o
'analyze'.


Given that it's not eating CPU time, one would guess that it's blocked
waiting for a lock.  Can you find any relevant locks in pg_locks?

regards, tom lane



James Robinson
Socialserve.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread Tom Lane
James Robinson [EMAIL PROTECTED] writes:
 As fate would have it, the vacuumdb frontend and backend which were  
 initially afflicted are still in existence:

OK, so pid 19244 isn't blocked on any lmgr lock; else we'd see an entry
with granted = f for it in pg_locks.  It could be blocked on a lower
level lock though.  Can you attach to that PID with gdb and get a stack
trace?  Something like (as postgres user)

gdb /path/to/postgres-executable 19244
gdb bt
gdb quit
ok to detach? yes

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Getting different number of results when using hashjoin on/off

2005-11-28 Thread Mario Weilguni
Thanks Tom for you quick answer!

No, I'm using 8.1.0, and tried it on different machines, always the same 
results.

SELECT version();
PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.4 20040623 
(Gentoo Hardened Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)

Best regards,
Mario Weilguni 


icomedias - Digitale Kommunikation

Mario Weilguni, Forschung und Entwicklung
[EMAIL PROTECTED], http://www.icomedias.com/

icomedias Österreich Systemhaus GmbH:
  8020 Graz, Entenplatz 1 
  Tel: +43 (316) 721.671-272, Fax: -103  

icomedias Deutschland Systemhaus GmbH:
  10969 Berlin, Alexandrinenstraße 2-3
  Tel: +49 (30) 695.399-272, Fax: -103

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 28, 2005 5:20 PM
To: Mario Weilguni
Cc: Mario Weilguni; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Getting different number of results when using hashjoin 
on/off 

Mario Weilguni [EMAIL PROTECTED] writes:
 Yes. This is from a 8.0.3 (with slightly older and different data,
 resulting in only 9 rows, but the rest is the same):

Yeah, that looks more reasonable.

I tried to reproduce this, without any luck:

regression=# explain analyze select count(*) from tenk1 a where exists (select 
1 from tenk1 b, tenk1 c where b.unique1=c.unique2 and b.hundred in (4,5) and 
c.hundred=a.hundred);
   QUERY 
PLAN

 Aggregate  (cost=3879742.37..3879742.38 rows=1 width=0) (actual 
time=46579.077..46579.082 rows=1 loops=1)
   -  Seq Scan on tenk1 a  (cost=0.00..3879729.87 rows=5000 width=0) (actual 
time=5.129..46528.208 rows=8500 loops=1)
 Filter: (subplan)
 SubPlan
   -  Hash Join  (cost=229.20..546.66 rows=2 width=0) (actual 
time=4.569..4.569 rows=1 loops=1)
 Hash Cond: (outer.unique1 = inner.unique2)
 -  Bitmap Heap Scan on tenk1 b  (cost=4.69..321.15 rows=196 
width=4) (actual time=0.947..1.698 rows=90 loops=1)
   Recheck Cond: ((hundred = 4) OR (hundred = 5))
   -  BitmapOr  (cost=4.69..4.69 rows=197 width=0) (actual 
time=0.544..0.544 rows=0 loops=1)
 -  Bitmap Index Scan on tenk1_hundred  
(cost=0.00..2.34 rows=98 width=0) (actual time=0.271..0.271 rows=100 
loops=1)
   Index Cond: (hundred = 4)
 -  Bitmap Index Scan on tenk1_hundred  
(cost=0.00..2.34 rows=98 width=0) (actual time=0.262..0.262 rows=100 
loops=1)
   Index Cond: (hundred = 5)
 -  Hash  (cost=224.26..224.26 rows=100 width=4) (actual 
time=2.370..2.370 rows=100 loops=1)
   -  Bitmap Heap Scan on tenk1 c  (cost=2.35..224.26 
rows=100 width=4) (actual time=0.492..1.616 rows=100 loops=1)
 Recheck Cond: (hundred = $0)
 -  Bitmap Index Scan on tenk1_hundred  
(cost=0.00..2.35 rows=100 width=0) (actual time=0.278..0.278 rows=100 
loops=1)
   Index Cond: (hundred = $0)
 Total runtime: 46584.654 ms
(19 rows)

(I'm not bothering with setting up an ltree index, since the question
of what index is being used shouldn't affect hashjoin's decision to
rescan or not.)

That's using 8.1 branch CVS tip, but there aren't any related bug fixes
since 8.1 release.  We did have several bug fixes in the hash join code
during the 8.1 beta cycle though ... is it possible you are really
running an 8.1 beta and not 8.1.0?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread James Robinson
Here ya go -- BTW -- your guys support is the _best_. But you know  
that already:


[EMAIL PROTECTED]:/home/sscadmin gdb /usr/local/pgsql/bin/postgres 19244
GNU gdb 6.2.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.
This GDB was configured as i586-suse-linux...Using host  
libthread_db library /lib/tls/libthread_db.so.1.


Attaching to program: /usr/local/pgsql/bin/postgres, process 19244
Reading symbols from /lib/libz.so.1...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib/libreadline.so.5...done.
Loaded symbols for /lib/libreadline.so.5
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libncurses.so.5...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_compat.so.2...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0xe410 in ?? ()
(gdb) bt
#0  0xe410 in ?? ()
#1  0xbfffd508 in ?? ()
#2  0x082aef97 in PqSendBuffer ()
#3  0xbfffd4f0 in ?? ()
#4  0xb7ec03e1 in send () from /lib/tls/libc.so.6
#5  0x08137d27 in secure_write ()
#6  0x0813c2a7 in internal_flush ()
#7  0x0813c4ff in pq_flush ()
#8  0x0820bfec in EmitErrorReport ()
#9  0x0820b5ac in errfinish ()
#10 0x0811d0a8 in lazy_vacuum_rel ()
#11 0x0811ac5a in vacuum_rel ()
#12 0x0811bb93 in vacuum ()
#13 0x0819c84d in PortalRunUtility ()
#14 0x0819d9b8 in PortalRun ()
#15 0x0819b221 in PostgresMain ()
#16 0x0816ffa9 in ServerLoop ()
#17 0x08170de9 in PostmasterMain ()
#18 0x0813e5e5 in main ()
(gdb) quit



James Robinson
Socialserve.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread Tom Lane
James Robinson [EMAIL PROTECTED] writes:
 (gdb) bt
 #0  0xe410 in ?? ()
 #1  0xbfffd508 in ?? ()
 #2  0x082aef97 in PqSendBuffer ()
 #3  0xbfffd4f0 in ?? ()
 #4  0xb7ec03e1 in send () from /lib/tls/libc.so.6
 #5  0x08137d27 in secure_write ()
 #6  0x0813c2a7 in internal_flush ()
 #7  0x0813c4ff in pq_flush ()
 #8  0x0820bfec in EmitErrorReport ()
 #9  0x0820b5ac in errfinish ()
 #10 0x0811d0a8 in lazy_vacuum_rel ()
 #11 0x0811ac5a in vacuum_rel ()
 #12 0x0811bb93 in vacuum ()
 #13 0x0819c84d in PortalRunUtility ()
 #14 0x0819d9b8 in PortalRun ()
 #15 0x0819b221 in PostgresMain ()
 #16 0x0816ffa9 in ServerLoop ()
 #17 0x08170de9 in PostmasterMain ()
 #18 0x0813e5e5 in main ()
 (gdb) quit

Hmm, what this says is that the backend is blocked trying to send an
error or notice message to the client.  We can't tell anything about
what the message was, but that's not so relevant ... the focus now
has to shift to the network or client side, ie, why in the world is
the backend waiting on the client to accept a message?

Can you get a similar backtrace from the vacuumdb process?  (Obviously,
give gdb the vacuumdb executable not the postgres one.)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread James Robinson

On Nov 28, 2005, at 11:38 AM, Tom Lane wrote:
Can you get a similar backtrace from the vacuumdb process?   
(Obviously,

give gdb the vacuumdb executable not the postgres one.)


OK:

(gdb) bt
#0  0xe410 in ?? ()
#1  0xbfffe4f8 in ?? ()
#2  0x0030 in ?? ()
#3  0x08057b68 in ?? ()
#4  0xb7e98533 in __write_nocancel () from /lib/tls/libc.so.6
#5  0xb7e4aae6 in _IO_new_file_write () from /lib/tls/libc.so.6
#6  0xb7e4a7e5 in new_do_write () from /lib/tls/libc.so.6
#7  0xb7e4aa63 in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#8  0xb7e413a2 in fputs () from /lib/tls/libc.so.6
#9  0xb7fd8f99 in defaultNoticeProcessor () from /usr/local/pgsql/lib/ 
libpq.so.4
#10 0xb7fd8fe5 in defaultNoticeReceiver () from /usr/local/pgsql/lib/ 
libpq.so.4
#11 0xb7fe2d34 in pqGetErrorNotice3 () from /usr/local/pgsql/lib/ 
libpq.so.4

#12 0xb7fe3921 in pqParseInput3 () from /usr/local/pgsql/lib/libpq.so.4
#13 0xb7fdb174 in parseInput () from /usr/local/pgsql/lib/libpq.so.4
#14 0xb7fdca99 in PQgetResult () from /usr/local/pgsql/lib/libpq.so.4
#15 0xb7fdcc4b in PQexecFinish () from /usr/local/pgsql/lib/libpq.so.4
#16 0x0804942c in vacuum_one_database ()
#17 0x080497a1 in main ()


Things to know which could possibly be of use. This cron is kicked  
off on the backup database box, and the vacuumdb is run via ssh to  
the primary box. The primary box is running the vacuumdb operation  
with --analyze --verbose, with the output being streamed to a logfile  
on the backup box. Lemme guess __write_nocancel calls syscall write,  
and 0x0030 might could well be the syscall entry point? Something  
gumming up the networking or sshd itself could have stopped up the  
ouput queues, and the backups populated all the way down to this level?


If so, only dummies backup / vacuum direct to remote?



James Robinson
Socialserve.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread Tom Lane
James Robinson [EMAIL PROTECTED] writes:
 On Nov 28, 2005, at 11:38 AM, Tom Lane wrote:
 Can you get a similar backtrace from the vacuumdb process?   

 OK:

 (gdb) bt
 #0  0xe410 in ?? ()
 #1  0xbfffe4f8 in ?? ()
 #2  0x0030 in ?? ()
 #3  0x08057b68 in ?? ()
 #4  0xb7e98533 in __write_nocancel () from /lib/tls/libc.so.6
 #5  0xb7e4aae6 in _IO_new_file_write () from /lib/tls/libc.so.6
 #6  0xb7e4a7e5 in new_do_write () from /lib/tls/libc.so.6
 #7  0xb7e4aa63 in _IO_new_file_xsputn () from /lib/tls/libc.so.6
 #8  0xb7e413a2 in fputs () from /lib/tls/libc.so.6
 #9  0xb7fd8f99 in defaultNoticeProcessor () from /usr/local/pgsql/lib/ 
 libpq.so.4
 #10 0xb7fd8fe5 in defaultNoticeReceiver () from /usr/local/pgsql/lib/ 
 libpq.so.4
 #11 0xb7fe2d34 in pqGetErrorNotice3 () from /usr/local/pgsql/lib/ 
 libpq.so.4
 #12 0xb7fe3921 in pqParseInput3 () from /usr/local/pgsql/lib/libpq.so.4
 #13 0xb7fdb174 in parseInput () from /usr/local/pgsql/lib/libpq.so.4
 #14 0xb7fdca99 in PQgetResult () from /usr/local/pgsql/lib/libpq.so.4
 #15 0xb7fdcc4b in PQexecFinish () from /usr/local/pgsql/lib/libpq.so.4
 #16 0x0804942c in vacuum_one_database ()
 #17 0x080497a1 in main ()

OK, so evidently the backend is sending NOTICE messages, and the
vacuumdb is blocked trying to copy those messages to stderr.

 Things to know which could possibly be of use. This cron is kicked  
 off on the backup database box, and the vacuumdb is run via ssh to  
 the primary box. The primary box is running the vacuumdb operation  
 with --analyze --verbose, with the output being streamed to a logfile  
 on the backup box. Lemme guess __write_nocancel calls syscall write,  
 and 0x0030 might could well be the syscall entry point? Something  
 gumming up the networking or sshd itself could have stopped up the  
 ouput queues, and the backups populated all the way down to this level?

That's what it looks like: the output queue from the vacuumdb has
stopped up somehow.  Your next move is to look at the state of sshd
and whatever is running at the client end of the ssh tunnel.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Getting different number of results when using hashjoin on/off

2005-11-28 Thread Tom Lane
Mario Weilguni [EMAIL PROTECTED] writes:
 No, I'm using 8.1.0, and tried it on different machines, always the same 
 results.

I see it, I think: the recent changes to avoid work when one or the
other side of the hash join is empty would exit the hash join leaving
a state that confused ExecReScanHashJoin() into thinking it didn't
have to do anything.  Try the attached patch.

regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.1
diff -c -r1.75.2.1 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c 22 Nov 2005 18:23:09 -  1.75.2.1
--- src/backend/executor/nodeHashjoin.c 28 Nov 2005 17:04:43 -
***
*** 152,163 
 * outer join, we can quit without scanning the outer relation.
 */
if (hashtable-totalTuples == 0  node-js.jointype != 
JOIN_LEFT)
-   {
-   ExecHashTableDestroy(hashtable);
-   node-hj_HashTable = NULL;
-   node-hj_FirstOuterTupleSlot = NULL;
return NULL;
-   }
  
/*
 * need to remember whether nbatch has increased since we began
--- 152,158 
***
*** 487,493 
{
ExecHashTableDestroy(node-hj_HashTable);
node-hj_HashTable = NULL;
-   node-hj_FirstOuterTupleSlot = NULL;
}
  
/*
--- 482,487 
***
*** 805,841 
  ExecReScanHashJoin(HashJoinState *node, ExprContext *exprCtxt)
  {
/*
-* If we haven't yet built the hash table then we can just return; 
nothing
-* done yet, so nothing to undo.
-*/
-   if (node-hj_HashTable == NULL)
-   return;
- 
-   /*
 * In a multi-batch join, we currently have to do rescans the hard way,
 * primarily because batch temp files may have already been released. 
But
 * if it's a single-batch join, and there is no parameter change for the
 * inner subnode, then we can just re-use the existing hash table 
without
 * rebuilding it.
 */
!   if (node-hj_HashTable-nbatch == 1 
!   ((PlanState *) node)-righttree-chgParam == NULL)
!   {
!   /* okay to reuse the hash table; needn't rescan inner, either */
!   }
!   else
{
!   /* must destroy and rebuild hash table */
!   ExecHashTableDestroy(node-hj_HashTable);
!   node-hj_HashTable = NULL;
!   node-hj_FirstOuterTupleSlot = NULL;
  
!   /*
!* if chgParam of subnode is not null then plan will be 
re-scanned by
!* first ExecProcNode.
!*/
!   if (((PlanState *) node)-righttree-chgParam == NULL)
!   ExecReScan(((PlanState *) node)-righttree, exprCtxt);
}
  
/* Always reset intra-tuple state */
--- 799,830 
  ExecReScanHashJoin(HashJoinState *node, ExprContext *exprCtxt)
  {
/*
 * In a multi-batch join, we currently have to do rescans the hard way,
 * primarily because batch temp files may have already been released. 
But
 * if it's a single-batch join, and there is no parameter change for the
 * inner subnode, then we can just re-use the existing hash table 
without
 * rebuilding it.
 */
!   if (node-hj_HashTable != NULL)
{
!   if (node-hj_HashTable-nbatch == 1 
!   ((PlanState *) node)-righttree-chgParam == NULL)
!   {
!   /* okay to reuse the hash table; needn't rescan inner, 
either */
!   }
!   else
!   {
!   /* must destroy and rebuild hash table */
!   ExecHashTableDestroy(node-hj_HashTable);
!   node-hj_HashTable = NULL;
  
!   /*
!* if chgParam of subnode is not null then plan will be 
re-scanned
!* by first ExecProcNode.
!*/
!   if (((PlanState *) node)-righttree-chgParam == NULL)
!   ExecReScan(((PlanState *) node)-righttree, 
exprCtxt);
!   }
}
  
/* Always reset intra-tuple state */
***
*** 847,852 
--- 836,842 
node-js.ps.ps_TupFromTlist = false;
node-hj_NeedNewOuter = true;
node-hj_MatchedOuter = false;
+   node-hj_FirstOuterTupleSlot = NULL;
  
/*
 * if chgParam of subnode is not null then plan will be re-scanned by

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   

Re: [HACKERS] Anonymous CVS working?

2005-11-28 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes:
 Any problems with CVS or anonymous CVS since the work last evening?

The master CVS is fine, but I agree that the anonymous mirror doesn't
seem to be tracking it ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Anonymous CVS working?

2005-11-28 Thread Alvaro Herrera
Tom Lane wrote:
 Michael Fuhr [EMAIL PROTECTED] writes:
  Any problems with CVS or anonymous CVS since the work last evening?
 
 The master CVS is fine, but I agree that the anonymous mirror doesn't
 seem to be tracking it ...

Apparently the CVSup server is down too.  I wonder if the anon CVS copy
is taken from CVSup.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Getting different number of results when using hashjoin on/off

2005-11-28 Thread Mario Weilguni
Hello Tom,

Thanks for the quick response, I've tried the patch, but it did not work
as expected. When I set enable_hashjoin to off, everything works as
expected, but with hashjoin on I do not even get results anymore, CPU is
going up to 100% and after 3 minutes I cancelled the query (it normale
would take ~100-500 milliseconds).

I will check the patch on a different machine again and inform you of
the results.

Best regards,
Mario Weilguni


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 28, 2005 6:09 PM
To: Mario Weilguni
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Getting different number of results when using
hashjoin on/off 

Mario Weilguni [EMAIL PROTECTED] writes:
 No, I'm using 8.1.0, and tried it on different machines, always the
same results.

I see it, I think: the recent changes to avoid work when one or the
other side of the hash join is empty would exit the hash join leaving
a state that confused ExecReScanHashJoin() into thinking it didn't
have to do anything.  Try the attached patch.

regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.1
diff -c -r1.75.2.1 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c 22 Nov 2005 18:23:09 -
1.75.2.1
--- src/backend/executor/nodeHashjoin.c 28 Nov 2005 17:04:43 -
***
*** 152,163 
 * outer join, we can quit without scanning the outer
relation.
 */
if (hashtable-totalTuples == 0  node-js.jointype !=
JOIN_LEFT)
-   {
-   ExecHashTableDestroy(hashtable);
-   node-hj_HashTable = NULL;
-   node-hj_FirstOuterTupleSlot = NULL;
return NULL;
-   }
  
/*
 * need to remember whether nbatch has increased since
we began
--- 152,158 
***
*** 487,493 
{
ExecHashTableDestroy(node-hj_HashTable);
node-hj_HashTable = NULL;
-   node-hj_FirstOuterTupleSlot = NULL;
}
  
/*
--- 482,487 
***
*** 805,841 
  ExecReScanHashJoin(HashJoinState *node, ExprContext *exprCtxt)
  {
/*
-* If we haven't yet built the hash table then we can just
return; nothing
-* done yet, so nothing to undo.
-*/
-   if (node-hj_HashTable == NULL)
-   return;
- 
-   /*
 * In a multi-batch join, we currently have to do rescans the
hard way,
 * primarily because batch temp files may have already been
released. But
 * if it's a single-batch join, and there is no parameter change
for the
 * inner subnode, then we can just re-use the existing hash
table without
 * rebuilding it.
 */
!   if (node-hj_HashTable-nbatch == 1 
!   ((PlanState *) node)-righttree-chgParam == NULL)
!   {
!   /* okay to reuse the hash table; needn't rescan inner,
either */
!   }
!   else
{
!   /* must destroy and rebuild hash table */
!   ExecHashTableDestroy(node-hj_HashTable);
!   node-hj_HashTable = NULL;
!   node-hj_FirstOuterTupleSlot = NULL;
  
!   /*
!* if chgParam of subnode is not null then plan will be
re-scanned by
!* first ExecProcNode.
!*/
!   if (((PlanState *) node)-righttree-chgParam == NULL)
!   ExecReScan(((PlanState *) node)-righttree,
exprCtxt);
}
  
/* Always reset intra-tuple state */
--- 799,830 
  ExecReScanHashJoin(HashJoinState *node, ExprContext *exprCtxt)
  {
/*
 * In a multi-batch join, we currently have to do rescans the
hard way,
 * primarily because batch temp files may have already been
released. But
 * if it's a single-batch join, and there is no parameter change
for the
 * inner subnode, then we can just re-use the existing hash
table without
 * rebuilding it.
 */
!   if (node-hj_HashTable != NULL)
{
!   if (node-hj_HashTable-nbatch == 1 
!   ((PlanState *) node)-righttree-chgParam ==
NULL)
!   {
!   /* okay to reuse the hash table; needn't rescan
inner, either */
!   }
!   else
!   {
!   /* must destroy and rebuild hash table */
!   ExecHashTableDestroy(node-hj_HashTable);
!   node-hj_HashTable = NULL;
  
!   /*
!* if chgParam of subnode is not null then plan
will be re-scanned
!* by first ExecProcNode.
!*/
!   

Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread James Robinson


On Nov 28, 2005, at 12:00 PM, Tom Lane wrote:


Your next move is to look at the state of sshd
and whatever is running at the client end of the ssh tunnel.



backtrace of the sshd doesn't look good:

(gdb) bt
#0  0xe410 in ?? ()
#1  0xbfffdb48 in ?? ()
#2  0x080a1e28 in ?? ()
#3  0x080a1e78 in ?? ()
#4  0xb7d379fd in ___newselect_nocancel () from /lib/tls/libc.so.6
#5  0x08054d64 in ?? ()
#6  0x000a in ?? ()
#7  0x080a1e78 in ?? ()
#8  0x080a1e28 in ?? ()
#9  0x in ?? ()
#10 0xbfffdb30 in ?? ()
#11 0x in ?? ()
#12 0xbfffdb48 in ?? ()
#13 0x0806c796 in ?? ()
#14 0x080a9d3c in ?? ()
#15 0x0001 in ?? ()
#16 0xbfffdb64 in ?? ()
#17 0x08054c3d in ?? ()
#18 0x0019 in ?? ()
#19 0x000acda0 in ?? ()
#20 0x080a9d3c in ?? ()
#21 0x in ?? ()
#22 0xbfffdb6c in ?? ()
#23 0x in ?? ()
#24 0xbfffdb78 in ?? ()
---Type return to continue, or q return to quit---
#25 0x08055632 in ?? ()
#26 0xbfffdb6c in ?? ()
#27 0x in ?? ()
#28 0x080a1e78 in ?? ()
#29 0x08098ee8 in ?? ()
#30 0x080a1e78 in ?? ()
#31 0x080a1e28 in ?? ()
#32 0x0009 in ?? ()
#33 0x0004 in ?? ()
#34 0x0001 in ?? ()
#35 0x0001 in ?? ()
#36 0xbfffdbb8 in ?? ()
#37 0x0805b816 in ?? ()
#38 0x08098ee8 in ?? ()
#39 0x080a2e10 in ?? ()
#40 0x0007 in ?? ()
#41 0x08098ee8 in ?? ()
#42 0x08080fd2 in _IO_stdin_used ()
#43 0x08098ee8 in ?? ()
#44 0xbfffdbb8 in ?? ()
#45 0x080574a3 in ?? ()
#46 0x in ?? ()
#47 0x08098ee8 in ?? ()
#48 0x08098ee8 in ?? ()
#49 0x08098f30 in ?? ()
---Type return to continue, or q return to quit---
#50 0x08080fd2 in _IO_stdin_used ()
#51 0x08098ee8 in ?? ()
#52 0xbfffeb98 in ?? ()
#53 0x0804fc90 in ?? ()
#54 0x08098ee8 in ?? ()
#55 0x08098f74 in ?? ()
#56 0x08098f30 in ?? ()
#57 0xbfffe110 in ?? ()
#58 0xbfffe110 in ?? ()
#59 0x0808014a in _IO_stdin_used ()
#60 0xb7ffad95 in malloc () from /lib/ld-linux.so.2
Previous frame inner to this frame (corrupt stack?)

The client-side ssh is worse -- 507 frames before it reports  
'(corrupt stack?)'.



At this moment in time, should we kill off the offending processes  
from Nov 25 -- starting from client-most side all the way to the  
vacuumdb process on the production server. The other vacuums would  
probably then complete happily, and we'd be cool again, eh?


I suppose we're darn lucky the process got ultimately gummed up on a  
table that sees no traffic at all to it, eh? The lock that vacuum has  
taken out on it would prevent at least some things happening to the  
table in question -- possibly even new inserts or updates?


Could this potentially be alleviated in the future by a little code  
reordering in vacuumdb or postmaster by completing working on the  
current table completely before emitting output, either postmaster -  
vacuumdb client, or possibly the vacuumdb client - whatever stdout  
is directed to so as to get gummed up in a state when no locks are  
being held? Or would that uglify the code too much and/or people  
would find that additional buffering a damnable offense?




James Robinson
Socialserve.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Getting different number of results when using hashjoin on/off

2005-11-28 Thread Mario Weilguni
Yes. This is from a 8.0.3 (with slightly older and different data,
resulting in only 9 rows, but the rest is the same):

 Index Scan using ben_uk3 on foo1 ben  (cost=0.00..73867.23 rows=863
width=27) (actual time=38.591..501.839 rows=9 loops=1)
   Filter: (subplan)
   SubPlan
 -  Hash Join  (cost=14.25..42.53 rows=1 width=0) (actual
time=0.284..0.284 rows=0 loops=1725)
   Hash Cond: (outer.id = inner.str_id)
   -  Index Scan using str_uk4 on structure str
(cost=0.00..27.91 rows=13 width=4) (actual time=0.765..4.043 rows=1
loops=112)
 Index Cond: (path ~ '*.2330676.*'::lquery)
   -  Hash  (cost=14.23..14.23 rows=10 width=4) (actual
time=0.012..0.012 rows=0 loops=1725)
 -  Index Scan using foo2_ben_id_key1 on foo2 bz
(cost=0.00..14.23 rows=10 width=4) (actual time=0.008..0.009 rows=1
loops=1725)
   Index Cond: (ben_id = $0)
 Total runtime: 501.980 ms

Best regards

P.s. sorry for the stupid quoting, I've to use Outlook


Mario Weilguni [EMAIL PROTECTED] writes:
 The failing case is:
 ...
SubPlan
  -  Hash Join  (cost=8.47..19.46 rows=1 width=0) (actual
time=0.004..0.004 rows=0 loops=21619)
Hash Cond: (outer.id = inner.str_id)
-  Bitmap Heap Scan on structure str  (cost=2.02..12.92
rows=6 width=4) (actual time=0.100..30.095 rows=1 loops=1)
  Recheck Cond: (path ~
'142.2330445.2330598.2330676.*'::lquery)
  -  Bitmap Index Scan on str_uk4  (cost=0.00..2.02
rows=6 width=0) (actual time=0.090..0.090 rows=1 loops=1)
Index Cond: (path ~
'142.2330445.2330598.2330676.*'::lquery)
-  Hash  (cost=6.43..6.43 rows=5 width=4) (actual
time=0.032..0.032 rows=0 loops=1)
  -  Bitmap Heap Scan on foo2 bz  (cost=2.02..6.43
rows=5 width=4) (actual time=0.025..0.025 rows=0 loops=1)
Recheck Cond: (bid = $0)
-  Bitmap Index Scan on foo2_bid_key1
(cost=0.00..2.02 rows=5 width=0) (actual time=0.021..0.021 rows=0
loops=1)
  Index Cond: (bid = $0)

Hmm, I wonder why the hash join's input nodes are showing loops=1 ...
the hash depends on the subplan parameter $0 so it needs to be
re-evaluated each time through.  It looks like that's not happening.
Do you have the corresponding results from 8.0 --- if so, what do
the loop counts look like?


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread Tom Lane
James Robinson [EMAIL PROTECTED] writes:
 backtrace of the sshd doesn't look good:

Stripped executable :-( ... you won't get much info there.  What of
the client at the far end of the ssh connection?  You should probably
assume that the blockage is there, rather than in a commonly used bit
of software like ssh.

 At this moment in time, should we kill off the offending processes  
 from Nov 25 -- starting from client-most side all the way to the  
 vacuumdb process on the production server. The other vacuums would  
 probably then complete happily, and we'd be cool again, eh?

If you just want to get out of it, killing the vacuumdb should be the
least dangerous way to get out of the problem.  I'd suggest taking a
little more time to try to find out what's stuck though.

 I suppose we're darn lucky the process got ultimately gummed up on a  
 table that sees no traffic at all to it, eh? The lock that vacuum has  
 taken out on it would prevent at least some things happening to the  
 table in question -- possibly even new inserts or updates?

No, since it's a lazy vacuum it's not going to block either reads or
writes.  Just other vacuums and DDL changes on the table.

 Could this potentially be alleviated in the future by a little code  
 reordering in vacuumdb or postmaster by completing working on the  
 current table completely before emitting output,

Wouldn't help.  I would imagine that by the time we got to this state,
the backend side is quite a few tables past the point where the end
client stopped accepting output.  You'd normally expect there to be
several Kb worth of buffering in the network channel, and the backend
isn't going to be blocked till that's *all* used up.

BTW, I suppose this was a VACUUM VERBOSE?  Without the verbose output,
you'd likely not have seen any problem ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread Tom Lane
James Robinson [EMAIL PROTECTED] writes:
 Given the other culprits in play are bash running a straightforward  
 shellscript line with redirected output to a simple file on a non- 
 full filesystem, I'm leaning more towards the odds that something  
 related to the sshd + tcp/ip + ssh client portion of things went  
 crazy.

Yeah, could be.  Anyway it doesn't seem like we can learn much more
today.  You might as well just zing the vacuumdb process and let
things get back to normal.  If it happens again, we'd have reason
to dig deeper.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Anonymous CVS working?

2005-11-28 Thread Marc G. Fournier


cvsup back up and running ...

On Mon, 28 Nov 2005, Alvaro Herrera wrote:


Tom Lane wrote:

Michael Fuhr [EMAIL PROTECTED] writes:

Any problems with CVS or anonymous CVS since the work last evening?


The master CVS is fine, but I agree that the anonymous mirror doesn't
seem to be tracking it ...


Apparently the CVSup server is down too.  I wonder if the anon CVS copy
is taken from CVSup.

--
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support





Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread James Robinson


On Nov 28, 2005, at 1:46 PM, Tom Lane wrote:


James Robinson [EMAIL PROTECTED] writes:

backtrace of the sshd doesn't look good:


Stripped executable :-( ... you won't get much info there.  What of
the client at the far end of the ssh connection?  You should probably
assume that the blockage is there, rather than in a commonly used bit
of software like ssh.


Ok: cron fired off a bash running our script which performs the  
backup + vacuuming on the backup box side, and that script was at the  
point of driving vacuumdb


sscadmin 20612 20610  0 Nov25 ?00:00:00 bash /usr/local/bin/ 
db_backup.sh
sscadmin 20622 20612  0 Nov25 ?00:00:00 ssh -c blowfish [ ssh  
identity file + host edited out ] /usr/local/pgsql/bin/vacuumdb -U  
postgres --all --analyze --verbose


[ yes, verbose vacuum. Who knew that'd be the camel-breaking straw ??! ]



The lines in the script invoking the ssh'd vacuumdb is:
--
# Vacuum all databases, storing log results.
$SSHCMD $DBHOST /usr/local/pgsql/bin/vacuumdb -U postgres --all -- 
analyze --verb

ose  $DATE/vacuum.log
--

Unfortunately the dir holding that date + hour's vacuum.log was swept  
away by the next day's activities.


The stuck bash is backtraceable:

(gdb) bt
#0  0xe410 in ?? ()
#1  0xb928 in ?? ()
#2  0x in ?? ()
#3  0xb918 in ?? ()
#4  0xb7ed1513 in __waitpid_nocancel () from /lib/tls/libc.so.6
#5  0x080935bf in default_tty_job_signals ()
#6  0x080949ca in wait_for ()
#7  0x0808acd7 in execute_command_internal ()
#8  0x0808a4f0 in execute_command ()
#9  0x0808241d in reader_loop ()
#10 0x08081364 in main ()
(gdb)

Nothing unexpected there. Funny though, file reports /bin/bash as  
being stripped ( as it does for /usr/bin/ssh and /usr/sbin/sshd ),  
but I could get far better debugging support from it. Could the  
debugging issue be ssh / sshd's apparent multithreadedness:


gdb /usr/bin/ssh
GNU gdb 6.2.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.
This GDB was configured as i586-suse-linux...(no debugging symbols  
found)...Using host libthread_db library /lib/tls/libthread_db.so.1.


(gdb) run localhost
Starting program: /usr/bin/ssh localhost
(no debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...[Thread  
debugging using libthread_db enabled]

[New Thread 1078408704 (LWP 29932)]
(no debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...(no  
debugging symbols found)...(no debugging symbols found)...The  
authenticity of host 'localhost (127.0.0.1)' can't be established.

RSA key fingerprint is f4:cd:bc:37:d7:08:bc:4f:04:91:45:9b:44:cf:d5:b9.
Are you sure you want to continue connecting (yes/no)?
Program received signal SIGINT, Interrupt.
[Switching to Thread 1078408704 (LWP 29932)]
0xe410 in ?? ()
(gdb) bt
#0  0xe410 in ?? ()
#1  0xbfffb9e8 in ?? ()
#2  0x0001 in ?? ()
#3  0xbfffb3af in ?? ()
#4  0x402f94b3 in __read_nocancel () from /lib/tls/libc.so.6
#5  0x080742e1 in mkstemp64 ()
#6  0x080684c8 in error ()
#7  0x08054e91 in ?? ()
#8  0xbfffcac0 in ?? ()
#9  0x0001 in ?? ()
#10 0x08096230 in ?? ()
#11 0xbfffcac0 in ?? ()


I know you have far better things to do than teach someone how to use  
gdb on multithreaded programs, but could a proper backtrace be  
salvageable on the ssh client? If you really care, that is, otherwise  
I'm off to kill that vacuumdb client.







At this moment in time, should we kill off the offending processes
from Nov 25 -- starting from client-most side all the way to the
vacuumdb process on the production server. The other vacuums would
probably then complete happily, and we'd be cool again, eh?


If you just want to get out of it, killing the vacuumdb should be the
least dangerous way to get out of the problem.  I'd suggest taking a
little more time to try to find out what's stuck though.


Given the other culprits in play are bash running a straightforward  
shellscript line with redirected output to a simple file on a non- 
full filesystem, I'm leaning more towards the odds that something  
related to the sshd + tcp/ip + ssh client portion of things went  
crazy. Just seems that's where more complexity is -- bash is 

Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

2005-11-28 Thread Tom Lane
Mario Weilguni [EMAIL PROTECTED] writes:
 Thanks for the quick response, I've tried the patch, but it did not work
 as expected. When I set enable_hashjoin to off, everything works as
 expected, but with hashjoin on I do not even get results anymore, CPU is
 going up to 100% and after 3 minutes I cancelled the query (it normale
 would take ~100-500 milliseconds).

Try letting it run longer.  I think your expectation is tuned for the
broken implementation (which runs the subqueries only once instead of
26k times...)

The test case I developed for this failure in the regression database is

select count(*) from tenk1 a
where exists (select 1 from tenk1 b, tenk1 c
  where b.unique1=c.unique2 and
  b.hundred in (4,5) and c.hundred=a.hundred+99);

8.0 prefers a nestloop for the subquery, and that plan runs in about
600 ms on my machine.  If forced to a hash join, it takes about 2450 ms.
8.1 prefers the hash join to start with, but takes 11300 ms to run it :-(
(after the patch that is).

The reason for the differential is that 8.1 guesses wrong about which
subplan to cycle first: most of the time, the inner plan is empty and
so there's no need to pull any rows from the outer plan, but 8.1 pulls
the first row from the outer plan anyway, and doing that 1 times is
what's eating the extra runtime.  It looks from your previous message
that similar things are happening with your data distribution, allowing
8.0 to run faster for you than 8.1 does.

Not sure if there's much we can do about this.  The presence of the
upper-query parameter in the subplan makes it difficult to derive any
stats at all, let alone guess how often the subplan will be completely
empty, so I'm not sure the planner can help.

For a query like this, where the hash join is being done repeatedly,
it might be useful for the executor itself to track how often each
subplan has been seen to be empty.  In particular, the executor knows
that the outer subplan is parameterless and therefore should deliver
the same results each time (modulo volatile functions of course), so
after the first cycle it could know that there's no point in trying
the early fetch on that side.  Dunno if this will be of wide enough
use to be worth implementing though --- in simple cases the join
won't be rescanned and so the executor can't help.

Anyone have any other ideas?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Anonymous CVS working?

2005-11-28 Thread Marc G. Fournier

On Mon, 28 Nov 2005, Alvaro Herrera wrote:


Tom Lane wrote:

Michael Fuhr [EMAIL PROTECTED] writes:

Any problems with CVS or anonymous CVS since the work last evening?


The master CVS is fine, but I agree that the anonymous mirror doesn't
seem to be tracking it ...


Apparently the CVSup server is down too.  I wonder if the anon CVS copy
is taken from CVSup.


Give me a couple of minutes ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

2005-11-28 Thread Mario Weilguni
Title: AW: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)






If the query runs slow it will be not such a problem, but I was very concerned about other queries having this problem too - without knowing it. I've already rewritten the query to use IN instead of exists.

I'll compile again and try it again.

Thanks alot!

Best regards,
Mario Weilguni



-Ursprüngliche Nachricht-
Von: Tom Lane [mailto:[EMAIL PROTECTED]]
Gesendet: Mo 28.11.2005 19:39
An: Mario Weilguni
Cc: pgsql-hackers@postgresql.org
Betreff: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

Mario Weilguni [EMAIL PROTECTED] writes:
 Thanks for the quick response, I've tried the patch, but it did not work
 as expected. When I set enable_hashjoin to off, everything works as
 expected, but with hashjoin on I do not even get results anymore, CPU is
 going up to 100% and after 3 minutes I cancelled the query (it normale
 would take ~100-500 milliseconds).

Try letting it run longer. I think your expectation is tuned for the
broken implementation (which runs the subqueries only once instead of
26k times...)

The test case I developed for this failure in the regression database is

select count(*) from tenk1 a
where exists (select 1 from tenk1 b, tenk1 c
 where b.unique1=c.unique2 and
 b.hundred in (4,5) and c.hundred=a.hundred+99);

8.0 prefers a nestloop for the subquery, and that plan runs in about
600 ms on my machine. If forced to a hash join, it takes about 2450 ms.
8.1 prefers the hash join to start with, but takes 11300 ms to run it :-(
(after the patch that is).

The reason for the differential is that 8.1 guesses wrong about which
subplan to cycle first: most of the time, the inner plan is empty and
so there's no need to pull any rows from the outer plan, but 8.1 pulls
the first row from the outer plan anyway, and doing that 1 times is
what's eating the extra runtime. It looks from your previous message
that similar things are happening with your data distribution, allowing
8.0 to run faster for you than 8.1 does.

Not sure if there's much we can do about this. The presence of the
upper-query parameter in the subplan makes it difficult to derive any
stats at all, let alone guess how often the subplan will be completely
empty, so I'm not sure the planner can help.

For a query like this, where the hash join is being done repeatedly,
it might be useful for the executor itself to track how often each
subplan has been seen to be empty. In particular, the executor knows
that the outer subplan is parameterless and therefore should deliver
the same results each time (modulo volatile functions of course), so
after the first cycle it could know that there's no point in trying
the early fetch on that side. Dunno if this will be of wide enough
use to be worth implementing though --- in simple cases the join
won't be rescanned and so the executor can't help.

Anyone have any other ideas?

   regards, tom lane







Re: [HACKERS] Help: 8.0.3 Vacuum of an empty table never completes ...

2005-11-28 Thread James Robinson


On Nov 28, 2005, at 4:13 PM, Tom Lane wrote:


Yeah, could be.  Anyway it doesn't seem like we can learn much more
today.  You might as well just zing the vacuumdb process and let
things get back to normal.  If it happens again, we'd have reason
to dig deeper.


Final report [ and apologies to hackers list in general -- sorry for  
the noise today ].


Killed the vacuumdb frontend. Then went off killing processes spawned  
by cron on Nov25th related to the cronjob. All of the related  
backends exited peacefully, and all is well. Manual vacuum verbose  
analyze completes successfully.


One possibly curious thing -- one final process remains on the backup  
box dated Nov25:


root 19912 3  0 Nov25 ?00:00:12 [pdflush]

Coincidence? This is some sort of kernel thread, right? Flushes dirty  
pages to disk? There are two on this machine:


root  9211 3  0 Nov22 ?00:02:56 [pdflush]
root 19912 3  0 Nov25 ?00:00:12 [pdflush]

The Nov25'ths pdflush's pid is suspiciously close to the pids which  
would be in use around the beginning of the cron'd process. [ checks / 
var/log/messages ... ] -- yep -- real close -- last known cross- 
referencable pid is:


Nov 25 04:59:01 db02 /usr/sbin/cron[20590]: (root) CMD ( rm -f /var/ 
spool/cron/lastrun/cron.hourly)


and the vacuumdb sshd connection on the production db box is logged  
at 05:02:22 AM, so that pdflush would have been started real close to  
the time which the remote backup + vacuum script would have been  
running.


Any Linux 2.6 gurus lurking? Under what circumstances do pdflush'es  
get spawned? The filesystem upon which the outputs were going is a  
software raid partition (raid-0? raid-1? Always confuse the two) --  
the interleaved one anyway, not mirrored -- formatted reiser3.


Neither pdflush instance on this machine was started anywhere near  
the boot time of the machine -- both much later. Whereas on the  
production box the two pdflush instances are both dated from machine   
boot time. Does this perchance indicate unhappiness afoot perhaps  
hardware-wise?



James Robinson
Socialserve.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] comment doesn't accept expressions returning text

2005-11-28 Thread Bruce Momjian

Added to TODO:

* Allow COMMENT ON to accept an expression rather than just a 
  string


---

Michael Glaesemann wrote:
 I've been trying to be better at documentation in general and have  
 been trying to take advantage of PostgreSQL's COMMENT ON  
 functionality to provide a little more information in the DDL itself.  
 I usually write my DDL in a text file and load it into the database  
 using psql. To make it (a little) easier to write comments, I'd like  
 to write the comment text as it's own paragraph so I don't have to  
 worry about accidently deleting the opening and closing quotes.
 
 For example:
 
 create table foo (foo_id integer primary key);
 comment on table foo is $$
 This is a comment for table foo.
 $$;
 
 Of course, this comment includes a new line at the beginning and end  
 of the comment.
 
 test=# select relname, description
 test-# from pg_description
 test-# join pg_class on (pg_class.oid = pg_description.objoid)
 test-# where relname = 'foo';
 relname | description
 -+-
 foo |
 This is a comment for table foo.
 
 (1 row)
 
 It would be nice to be able to strip those out using TRIM (or some  
 other function). However, this doesn't appear to work, as COMMENT ON  
 throws a syntax error as soon as it runs into anything that isn't a  
 pure text string. Examples below.
 
 Would there be any objection to allowing any text-returning  
 expression in this case? If not, what would be involved in allowing  
 this? I'm interested in contributing the change if it's something  
 that's considered worthwhile.
 
 Michael Glaesemann
 grzm myrealbox com
 
 
 test=# select version();
  
 version
  
 --
 PostgreSQL 8.1.0 on powerpc-apple-darwin8.3.0, compiled by GCC  
 powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.  
 build 5026)
 (1 row)
 
 test=# create table foo (foo_id integer primary key);
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index  
 foo_pkey for table foo
 CREATE TABLE
 test=# comment on table foo is trim('*', '***This is just an  
 example***');
 ERROR:  syntax error at or near trim at character 25
 LINE 1: comment on table foo is trim('*', '***This is just an exampl...
  ^
 test=# comment on table foo is (trim('*', '***This is just an  
 example***'));
 ERROR:  syntax error at or near ( at character 25
 LINE 1: comment on table foo is (trim('*', '***This is just an examp...
  ^
 test=# comment on table foo is 'This is just' || ' an example';
 ERROR:  syntax error at or near || at character 40
 LINE 1: comment on table foo is 'This is just' || ' an example';
 ^
 test=# comment on table foo is ('This is just' || ' an example');
 ERROR:  syntax error at or near ( at character 25
 LINE 1: comment on table foo is ('This is just' || ' an example');
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] POWER vs. POW ???

2005-11-28 Thread Bruce Momjian
Christopher Kings-Lynne wrote:
 Also, POW() is not documented here:

I think POW is just there for backward compatibility and people should
use POWER().

---


 
 http://www.postgresql.org/docs/8.1/interactive/functions-math.html
 
 Chris
 
 Christopher Kings-Lynne wrote:
  How come these give slightly different results?
  
  test=# SELECT POW(2,-2);
   pow
  --
   0.25
  (1 row)
  
  test=# SELECT POWER(2,-2);
   power
  ---
0.25
  (1 row)
  
  
  (Note width of result field.)
  
  Chris
  
  ---(end of broadcast)---
  TIP 4: Have you searched our list archives?
  
http://archives.postgresql.org
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] TRUNC vs. TRUNCATE

2005-11-28 Thread Bruce Momjian
Christopher Kings-Lynne wrote:
 Hi,
 
 I notice we added CEILING() as an alias to CEIL() for compatibility.  We 
 also have POWER() for POW().
 
 I notice that MySQL uses TRUNCATE() and we only have TRUNC().  Is 
 TRUNCATE actually spec compliant?  Should we add TRUNCATE anyway for 

I don't see either in the SQL99 standard.

 consistency and compatibility?

No, if we did that we would be carrying around a ton of stuff.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] NVL vs COALESCE

2005-11-28 Thread Bruce Momjian
Michael Glaesemann wrote:
 
 On Nov 24, 2005, at 21:21 , Marcus Engene wrote:
 
  When we're having an alias discussion, I'd really like to see NVL  
  in postgres. Not because of porting from oracle as much as just  
  spelling that without the reference manual is completely impossible.
 
 NVL: what a very unfortunate spelling. (NULL VaLue? NULL Valued  
 Logic? Named Very Loosely? Someone help me here :) ) AFAICT, COALESCE  
 is SQL standard, while NVL isn't. I think an index entry might be a  
 good idea.

Agreed, documentation patch applied to HEAD and 8.1.X.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/func.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.295
diff -c -c -r1.295 func.sgml
*** doc/src/sgml/func.sgml  19 Nov 2005 19:44:54 -  1.295
--- doc/src/sgml/func.sgml  28 Nov 2005 23:17:42 -
***
*** 7227,7232 
--- 7227,7240 
 primaryCOALESCE/primary
/indexterm
  
+   indexterm
+primaryNVL/primary
+   /indexterm
+ 
+   indexterm
+primaryIFNULL/primary
+   /indexterm
+ 
  synopsis
  functionCOALESCE/function(replaceablevalue/replaceable optional, 
.../optional)
  /synopsis
***
*** 7234,7242 
para
 The functionCOALESCE/function function returns the first of its
 arguments that is not null.  Null is returned only if all arguments
!are null.  This is often useful to substitute a
!default value for null values when data is retrieved for display,
!for example:
  programlisting
  SELECT COALESCE(description, short_description, '(none)') ...
  /programlisting
--- 7242,7249 
para
 The functionCOALESCE/function function returns the first of its
 arguments that is not null.  Null is returned only if all arguments
!are null.  It is often used to substitute a default value for 
!null values when data is retrieved for display, for example:
  programlisting
  SELECT COALESCE(description, short_description, '(none)') ...
  /programlisting
***
*** 7246,7252 
  Like a tokenCASE/token expression, functionCOALESCE/function will
  not evaluate arguments that are not needed to determine the result;
  that is, arguments to the right of the first non-null argument are
! not evaluated.
 /para
/sect2
  
--- 7253,7261 
  Like a tokenCASE/token expression, functionCOALESCE/function will
  not evaluate arguments that are not needed to determine the result;
  that is, arguments to the right of the first non-null argument are
! not evaluated.  This SQL-standard function provides capabilities similar
! to functionNVL/ and functionIFNULL/, which are used in some other
! database systems.
 /para
/sect2
  
***
*** 7262,7277 
  /synopsis
  
para
!The functionNULLIF/function function returns a null value if and only
!if replaceablevalue1/replaceable and
!replaceablevalue2/replaceable are equal.  Otherwise it returns
!replaceablevalue1/replaceable.  This can be used to perform the
!inverse operation of the functionCOALESCE/function example
!given above:
  programlisting
  SELECT NULLIF(value, '(none)') ...
  /programlisting
/para
  
/sect2
  
--- 7271,7289 
  /synopsis
  
para
!The functionNULLIF/function function returns a null value if
!replaceablevalue1/replaceable and replaceablevalue2/replaceable
!are equal;  otherwise it returns replaceablevalue1/replaceable.
!This can be used to perform the inverse operation of the
!functionCOALESCE/function example given above:
  programlisting
  SELECT NULLIF(value, '(none)') ...
  /programlisting
/para
+   para
+If replaceablevalue1/replaceable is literal(none)/, return a null,
+otherwise return replaceablevalue1/replaceable.
+   /para
  
/sect2
  

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

2005-11-28 Thread Tom Lane
I wrote:
 For a query like this, where the hash join is being done repeatedly,
 it might be useful for the executor itself to track how often each
 subplan has been seen to be empty.

I implemented a simple form of this, and it made 8.1 faster than 8.0
on the test case I was using.  Give it a try ...

regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.2
diff -c -r1.75.2.2 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c 28 Nov 2005 17:14:47 -  1.75.2.2
--- src/backend/executor/nodeHashjoin.c 28 Nov 2005 23:41:28 -
***
*** 120,135 
 * since we aren't going to be able to skip the join on the 
strength
 * of an empty inner relation anyway.)
 *
 * The only way to make the check is to try to fetch a tuple 
from the
 * outer plan node.  If we succeed, we have to stash it away 
for later
 * consumption by ExecHashJoinOuterGetTuple.
 */
!   if (outerNode-plan-startup_cost  
hashNode-ps.plan-total_cost ||
!   node-js.jointype == JOIN_LEFT)
{
node-hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
if (TupIsNull(node-hj_FirstOuterTupleSlot))
return NULL;
}
else
node-hj_FirstOuterTupleSlot = NULL;
--- 120,147 
 * since we aren't going to be able to skip the join on the 
strength
 * of an empty inner relation anyway.)
 *
+* If we are rescanning the join, we make use of information 
gained
+* on the previous scan: don't bother to try the prefetch if the
+* previous scan found the outer relation nonempty.  This is not
+* 100% reliable since with new parameters the outer relation 
might
+* yield different results, but it's a good heuristic.
+*
 * The only way to make the check is to try to fetch a tuple 
from the
 * outer plan node.  If we succeed, we have to stash it away 
for later
 * consumption by ExecHashJoinOuterGetTuple.
 */
!   if (node-js.jointype == JOIN_LEFT ||
!   (outerNode-plan-startup_cost  
hashNode-ps.plan-total_cost 
!!node-hj_OuterNotEmpty))
{
node-hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
if (TupIsNull(node-hj_FirstOuterTupleSlot))
+   {
+   node-hj_OuterNotEmpty = false;
return NULL;
+   }
+   else
+   node-hj_OuterNotEmpty = true;
}
else
node-hj_FirstOuterTupleSlot = NULL;
***
*** 159,164 
--- 171,183 
 * scanning the outer relation
 */
hashtable-nbatch_outstart = hashtable-nbatch;
+ 
+   /*
+* Reset OuterNotEmpty for scan.  (It's OK if we fetched a tuple
+* above, because ExecHashJoinOuterGetTuple will immediately
+* set it again.)
+*/
+   node-hj_OuterNotEmpty = false;
}
  
/*
***
*** 454,459 
--- 473,479 
hjstate-js.ps.ps_TupFromTlist = false;
hjstate-hj_NeedNewOuter = true;
hjstate-hj_MatchedOuter = false;
+   hjstate-hj_OuterNotEmpty = false;
  
return hjstate;
  }
***
*** 546,551 
--- 566,574 
*hashvalue = ExecHashGetHashValue(hashtable, econtext,

  hjstate-hj_OuterHashKeys);
  
+   /* remember outer relation is not empty for possible 
rescan */
+   hjstate-hj_OuterNotEmpty = true;
+ 
return slot;
}
  
***
*** 810,816 
if (node-hj_HashTable-nbatch == 1 
((PlanState *) node)-righttree-chgParam == NULL)
{
!   /* okay to reuse the hash table; needn't rescan inner, 
either */
}
else
{
--- 833,851 
if (node-hj_HashTable-nbatch == 1 
((PlanState *) node)-righttree-chgParam == NULL)
{
!   /*
!* okay to reuse the hash table; needn't rescan inner, 
either.
!*
! 

Re: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

2005-11-28 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 I suspect this is obvious but since you asked, there isn't any way to keep
 around the hash table and just reuse it repeatedly instead of having to rescan
 the data over and over is there?

We already do that when possible --- which it's not in the particular
case at hand, because there's an outer-query parameter used in the
hashed subplan.

It occurs to me that the planner ought to favor putting parameterized
subplans on the outside of a hash join instead of the inside, so as to
make reuse more likely.  Not sure how to factor that into the cost
model though.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] gprof SELECT COUNT(*) results

2005-11-28 Thread Jim C. Nasby
On Fri, Nov 25, 2005 at 10:20:11AM -0500, Tom Lane wrote:
 Qingqing Zhou [EMAIL PROTECTED] writes:
  I can see your computer is really slow, so my theory is that since it is
  easy to hold a running-slowly horse than a fast one, so my spinlock on a
  2.4G modern machine should takes relatively longer time to get effective.
  Just kidding.
 
 Is that modern machine a Xeon by any chance?  We know that Xeons have
 fairly awful concurrent performance, and the long latency for bus lock
 instructions may well be the reason why.  FWIW, the numbers I showed
 last night were for an HPPA machine, which I used just because I chanced
 to have CVS tip already built for profiling on it.  I've since
 reproduced the test on a spiffy new dual Xeon that Red Hat just bought
 me :-) ... and I get similar numbers to yours.  It'd be interesting to
 see the results from an SMP Opteron, if anyone's got one handy.

Is there still interest in this? I've got a dual Opteron running FBSD.
(What would be the profiler to use on FBSD?)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] gprof SELECT COUNT(*) results

2005-11-28 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 (What would be the profiler to use on FBSD?)

gprof should work fine.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Using multi-row technique with COPY

2005-11-28 Thread Jim C. Nasby
On Sun, Nov 27, 2005 at 07:44:55PM +, Simon Riggs wrote:
 not have any unique indexes or row triggers. It should be possible to
 take advantage of this automatically when those requirements are met,
 without any new options. Just as it was with Seq Scans, this is worth
 about 10% reduction in CPU for a COPY FROM.
snip 
 FSM access would need to change slightly to allow for whole-block-only
 requests to be made for heaps, without damaging the average row length
 calculation. It might be simpler to ignore FSM entirely?

Does that mean that this fast copy would end up not re-using space on
pages that have space available? ISTM that's something users would want
to be able to over-ride. In fact, it seems like it shouldn't be a
default behavior...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin on/off)

2005-11-28 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 In particular, the executor knows that the outer subplan is parameterless
 and therefore should deliver the same results each time (modulo volatile
 functions of course), so after the first cycle it could know that there's no
 point in trying the early fetch on that side.

 Anyone have any other ideas?

I suspect this is obvious but since you asked, there isn't any way to keep
around the hash table and just reuse it repeatedly instead of having to rescan
the data over and over is there?

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Allow an alias for the target table in UPDATE

2005-11-28 Thread Bruce Momjian
Atsushi Ogawa wrote:
 I am interested in a following item in TODO.
 
 o Allow an alias to be provided for the target table in
   UPDATE/DELETE
   This is not SQL-spec but many DBMSs allow it.
 
 I think that this functionality is useful for the migration from
 other RDBMS. However the SQL92 spec does not allow an alias for
 the target table in UPDATE.
 Is it still TODO? If it is TODO, I will try it.

Yes, this is still a valid TODO item.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Checking a heap page

2005-11-28 Thread Bruce Momjian
With the applied patch that checks an entire heap page with one lock, is
there any advantage of considering this for index pages?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Checking a heap page

2005-11-28 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 With the applied patch that checks an entire heap page with one lock, is
 there any advantage of considering this for index pages?

Done already (see bitmap index scans).

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] ice-broker scan thread

2005-11-28 Thread Qingqing Zhou

I am considering add an ice-broker scan thread to accelerate PostgreSQL
sequential scan IO speed. The basic idea of this thread is just like the
read-ahead method, but the difference is this one does not read the data
into shared buffer pool directly, instead, it reads the data into file
system cache, which makes the integration easy and this is unique to
PostgreSQL.

What happens to the original sequential scan:
for (;;)
{
/*
 * a physical read may happen, due to current content of
 * file system cache and if the kernel is smart enough to
 * understand you want to do sequential scan
 */
physical or logical read a page;
process the page;
}

What happens to the sequential scan with ice-broker:
for (;;)
{
/* since the ice-broker has read the page in already */
logical read a page with big chance;
process the page;
}

I wrote a program to simulate the sequential scan in PostgreSQL
with/without ice-broker. The results indicate this technique has the
following characters:
(1) The important factor of speedup is the how much CPU time PostgreSQL
used on each data page. If PG is fast enough, then no speedup occurs; else
a 10% to 20% speedup is expected due to my test.
(2) It uses more CPU - this is easy to understand, since it does more
work;
(3) The benefits also depends on other factors, like how smart your file
system ...

Here is a test results on my machine:
---
$#uname -a
Linux josh.db 2.4.29-1 #2 Tue Jan 25 17:03:33 EST 2005 i686 unknown
$#cat /proc/meminfo | grep MemTotal
MemTotal:  1030988 kB
$#cat /proc/cpuinfo | grep CPU
model name  : Intel(R) Pentium(R) 4 CPU 2.40GHz
$#./seqscan 10 $HOME/pginstall/bin/data/base/10794/18986 50
PostgreSQL sequential scan simulator configuration:
Memory size: 943718400
CPU cost per page: 50
Scan thread read unit size: 4

With scan threads off - duration: 56862.738 ms
With scan threads on - duration: 40611.101 ms
With scan threads off - duration: 46859.207 ms
With scan threads on - duration: 38598.234 ms
With scan threads off - duration: 56919.572 ms
With scan threads on - duration: 47023.606 ms
With scan threads off - duration: 52976.825 ms
With scan threads on - duration: 43056.506 ms
With scan threads off - duration: 54292.979 ms
With scan threads on - duration: 42946.526 ms
With scan threads off - duration: 51893.590 ms
With scan threads on - duration: 42137.684 ms
With scan threads off - duration: 46552.571 ms
With scan threads on - duration: 41892.628 ms
With scan threads off - duration: 45107.800 ms
With scan threads on - duration: 38329.785 ms
With scan threads off - duration: 47527.787 ms
With scan threads on - duration: 38293.581 ms
With scan threads off - duration: 48810.656 ms
With scan threads on - duration: 39018.500 ms
---

Notice in above the cpu_cost=50 might looks too big (if you look into the
code) - but in concurrent situation, it is not that huge. Also, on my
windows box(PIII, 800), a cpu_cost=5 can is enough to prove the benefits
of 10%.

So in general, it does help in some situations, but not a rocket science
since we can't predicate the performance of the file system. It fairly
easy to be integrated, and we should add a GUC parameter to control it.

We need more tests, any comments and tests are welcome,

Regards,
Qingqing

---

/*
 * seqscan.c
 *  PostgreSQL sequential scan simulator with helper scan thread
 *
 * Note
 *  I wrote this simulator to see if there is any benefits for 
sequential scan to
 *  do read-ahead by another thread. The only thing you may want to 
change in the
 *  source file is MEMSZ, make it big enough to thrash your file 
system cache.
 *
 *  Use the following command to compile:
 *  $gcc -O2 -Wall -pthread -lm seqscan.c -o seqscan
 *  To use it:
 *  $./seqscan rounds datafile cpu_cost
 *  In which rounds is how many times you want to run the test 
(notice each round include
 *  two disk-burn test), datafile is the path to any file (suggest 
size  100M), and cpu_cost
 *  is the cost that processing each page of the file. Try 
different cpu_cost.
 */

#include stdio.h
#include stdlib.h
#include fcntl.h
#include memory.h
#include errno.h
#include math.h

#ifdef WIN32
#include io.h
#include windows.h
#define PG_BINARY   O_BINARY
#else
#include unistd.h
#include pthread.h
#include sys/stat.h
#include sys/time.h
#include sys/file.h
#define PG_BINARY   0
#endif

typedef char bool;
#define true((bool) 1)
#define false   ((bool) 0)

#define BLCKSZ  8192
#define UNITSZ  4
#define MEMSZ   (950*1024*1024)

char*data_file;
int cpu_cost;
volatile bool stop_scan;
charthread_buffer[BLCKSZ*UNITSZ];

static void
cleanup_cache(void)
{
char*p;

if (NULL == (p = (char *)malloc(MEMSZ)))
{
fprintf(stderr, 

Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread David Boreham

Qingqing Zhou wrote:


I am considering add an ice-broker scan thread to accelerate PostgreSQL
sequential scan IO speed. The basic idea of this thread is just like the
read-ahead method, but the difference is this one does not read the data
into shared buffer pool directly, instead, it reads the data into file
system cache, which makes the integration easy and this is unique to
PostgreSQL.
 

Interesting, and I wondered about this too. But for my taste the 
demonstrated benefit really

isn't large enough to make it worthwhile.
BTW, I heard a long time ago that NTFS has quite fancy read-ahead, where 
it attempts
to detect the application's access pattern including if it is reading 
sequentially and even
if there is a 'stride' to the accesses when they're not contiguous. I 
would imagine that
other filesystems attempt similar tricks. So one might expect a simple 
linear prefectch

to not help much in the presence of such a filesystem.

Were you worried about the icebreaker thread getting too far ahead of 
the scan ?
If it did it might page out the data you're about to read, I think. Of 
course this could
be fixed by having the read ahead thread perodically check the current 
location being

read by the query thread and pausing if it's got too far ahead.

Anyway, the recent performance thread has been intersting to me because 
in all my career
I've never seen a database that scanned scads of data from disk to 
process a query.
Typically the problems I work on arrange to read the entire database 
into memory.

I think I need to get out more... ;)




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Gavin Sherry
On Mon, 28 Nov 2005, Qingqing Zhou wrote:


 I am considering add an ice-broker scan thread to accelerate PostgreSQL
 sequential scan IO speed. The basic idea of this thread is just like the
 read-ahead method, but the difference is this one does not read the data
 into shared buffer pool directly, instead, it reads the data into file
 system cache, which makes the integration easy and this is unique to
 PostgreSQL.


MySQL, Oracle and others implement read-ahead threads to simulate async IO
'pre-fetching'. I've been experimenting with two ideas. The first is to
increase the readahead when we're doing sequential scans (see prototype
patch using posix fadvise attached). I've not got any hardware at the
moment which I can test this patch on but I am waiting on some dbt-3
results which should indicate whether fadvise is a good idea or a bad one.

The second idea is using posix async IO at key points within the system
to better parallelise CPU and IO work. There areas I think we could use
async IO are: during sequential scans, use async IO to do pre-fetching of
blocks; inside WAL, begin flushing WAL buffers to disk before we commit;
and, inside the background writer/check point process, asynchronously
write out pages and, potentially, asynchronously build new checkpoint segments.

The motivation for using async IO is two fold: first, the results of this
paper[1] are compelling; second, modern OSs support async IO. I know that
Linux[2], Solaris[3], AIX and Windows all have async IO and I presume that
all their rivals have it as well.

The fundamental premise of the paper mentioned above is that if the
database is busy, IO should be busy. With our current block-at-a-time
processing, this isn't always the case. This is why Qingqing's read-ahead
thread makes sense. My reason for mailing is, however, that the async IO
results are more compelling than the read ahead thread.

I haven't had time to prototype whether we can easily implement async IO
but I am planning to work on it in December. The two main goals will be to
a) integrate and utilise async IO, at least within the executor context,
and b) build a primitive kind of scheduler so that we stop prefetching
when we know that there are a certain number of outstanding IOs for a
given device.

Thanks,

Gavin



[1] http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf
[2] http://lse.sourceforge.net/io/aionotes.txt
[3] http://developers.sun.com/solaris/articles/event_completion.html - I'm
fairly sure they have a posix AIO wrapper around these routines, but I
cannot see it documented anywhere :-(Index: src/backend/access/heap/heapam.c
===
RCS file: /usr/local/cvsroot/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.200
diff -c -p -r1.200 heapam.c
*** src/backend/access/heap/heapam.c15 Oct 2005 02:49:08 -  1.200
--- src/backend/access/heap/heapam.c18 Nov 2005 04:10:21 -
***
*** 36,41 
--- 36,44 
   *
   *-
   */
+ 
+ #include fcntl.h
+ 
  #include postgres.h
  
  #include access/heapam.h
***
*** 49,54 
--- 52,58 
  #include miscadmin.h
  #include pgstat.h
  #include storage/procarray.h
+ #include storage/smgr.h
  #include utils/inval.h
  #include utils/relcache.h
  
*** heap_beginscan(Relation relation, Snapsh
*** 659,665 
pgstat_initstats(scan-rs_pgstat_info, relation);
  
initscan(scan, key);
! 
return scan;
  }
  
--- 663,673 
pgstat_initstats(scan-rs_pgstat_info, relation);
  
initscan(scan, key);
!   if(!IsBootstrapProcessingMode())
!   {
!   RelationOpenSmgr(relation);
!   RelationSetSmgrAdvice(relation, POSIX_FADV_SEQUENTIAL);
!   }
return scan;
  }
  
*** heap_rescan(HeapScanDesc scan,
*** 693,698 
--- 701,710 
  void
  heap_endscan(HeapScanDesc scan)
  {
+ /*if(!IsBootstrapProcessingMode())
+   smgradvise(scan-rs_rd-rd_smgr, POSIX_FADV_NORMAL);
+ */
+ 
/* Note: no locking manipulations needed */
  
/*
Index: src/backend/access/index/indexam.c
===
RCS file: /usr/local/cvsroot/pgsql/src/backend/access/index/indexam.c,v
retrieving revision 1.86
diff -c -p -r1.86 indexam.c
*** src/backend/access/index/indexam.c  15 Oct 2005 02:49:09 -  1.86
--- src/backend/access/index/indexam.c  18 Nov 2005 03:13:01 -
***
*** 61,73 
   *-
   */
  
  #include postgres.h
  
  #include access/genam.h
  #include access/heapam.h
! #include pgstat.h
  #include utils/relcache.h
  
  
  /* 
   *macros used in index_ routines
--- 61,77 
   

Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Christopher Kings-Lynne

Qingqing,


I am considering add an ice-broker scan thread to accelerate PostgreSQL
sequential scan IO speed. The basic idea of this thread is just like the
read-ahead method, but the difference is this one does not read the 
data

into shared buffer pool directly, instead, it reads the data into file
system cache, which makes the integration easy and this is unique to
PostgreSQL.


You probably mean ice-breaker by the way :)

Chris


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread David Boreham

Gavin Sherry wrote:


MySQL, Oracle and others implement read-ahead threads to simulate async IO


I always believed that Oracle used async file I/O. Not that I've seen their
code, but I'm fairly sure they funded the addition of kernel aio to Linux
a few years back.

ButOracle comes from a time long ago when threads and decent
filesystems didn't exist, so some of the things they do may not be 
appropriate

to add to a product that doesn't have them today.

Now...network async I/O...that'd be really useful in my world...



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Gavin Sherry
On Mon, 28 Nov 2005, David Boreham wrote:

 Gavin Sherry wrote:

  MySQL, Oracle and others implement read-ahead threads to simulate async IO

 I always believed that Oracle used async file I/O. Not that I've seen their
 code, but I'm fairly sure they funded the addition of kernel aio to Linux
 a few years back.

That's right.


 ButOracle comes from a time long ago when threads and decent
 filesystems didn't exist, so some of the things they do may not be
 appropriate
 to add to a product that doesn't have them today.

The paper I linked to seemed to suggest that they weren't using async IO
in 9.2 -- which is fairly old. I'm not sure why the authors didn't test
10g.

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Mark Kirkwood

Tom Lane wrote:

Gavin Sherry [EMAIL PROTECTED] writes:


I haven't had time to prototype whether we can easily implement async IO



Just as with any suggestion to depend on threads, you are going to have
to show results that border on astounding to have any chance of getting
this in.  Otherwise the portability issues are just going to make it not
worth the trouble.


Do these ideas require threads in principle? ISTM that there could be 
(additional) process(es) waiting to perform pre-fetching or async io, 
and we could use the usual IPC machinary to talk between them...


cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread David Boreham






  
The paper I linked to seemed to suggest that they weren't using async IO
in 9.2 -- which is fairly old. I'm not sure why the authors didn't test
10g.
  

...reads paper... ok, interesting. Did they say that Oracle
isn't using aio ?
I can't see that. They that Oracle has no more than one outstanding I/O

operation in flight per concurrent query, 
and they appear to think that's a bad thing. I'm not seeing
that myself. Perhaps once I sleep on it, it'll become clear what
they're getting at.

One theory for lack of aio in Oracle as tested in that paper would be
that they
were testing on Linux. Since aio is relatively new in Linux I wouldn't
be surprised
if Oracle didn't actually use it until it's known to be widely deployed
in the field
and to have proven reliability. Perhaps we've reached that state around
now,
and so Oracle may not yet have released an aio-capable Linux version of
their
RDBMS. Just a theory...someone from those tubular towers lurking here
could tell us for sure I guess...








Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Gavin Sherry
On Mon, 28 Nov 2005, Tom Lane wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:
  I haven't had time to prototype whether we can easily implement async IO

 Just as with any suggestion to depend on threads, you are going to have
 to show results that border on astounding to have any chance of getting
 this in.  Otherwise the portability issues are just going to make it not
 worth the trouble.

The architecture I am looking at would not rely on threads.

I didn't want to jump on list and waive my hands until I had something to
show, but since Qingqing is looking at the issue I thought I better raise
it.

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Mark Kirkwood

Gavin Sherry wrote:



The paper I linked to seemed to suggest that they weren't using async IO
in 9.2 -- which is fairly old. I'm not sure why the authors didn't test
10g.



There have been async io type parameters in Oracle's init.ora files from 
(at least) 8i (disk_async_io=true IIRC) - on Solaris anyway. Whether 
this enabled real or simulated async io is probably a good question - I 
recall during testing turning it off and seeing kio()? or similar type 
calls become write()/read() in truss oupout.


regards

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Qingqing Zhou


On Mon, 28 Nov 2005, Mark Kirkwood wrote:

 Do these ideas require threads in principle? ISTM that there could be
 (additional) process(es) waiting to perform pre-fetching or async io,
 and we could use the usual IPC machinary to talk between them...


Right. I use threads because it is easy to write simulation program :-)

Regards,
Qingqing

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Jonah H. Harris
FYI, I've personally used Oracle 9.2.0.4's async IO on Linux and have seen several installations which make use of it also.

On 11/28/05, Gavin Sherry [EMAIL PROTECTED] wrote:
On Mon, 28 Nov 2005, Tom Lane wrote: Gavin Sherry [EMAIL PROTECTED]
 writes:  I haven't had time to prototype whether we can easily implement async IO Just as with any suggestion to depend on threads, you are going to have to show results that border on astounding to have any chance of getting
 this in.Otherwise the portability issues are just going to make it not worth the trouble.The architecture I am looking at would not rely on threads.I didn't want to jump on list and waive my hands until I had something to
show, but since Qingqing is looking at the issue I thought I better raiseit.Gavin---(end of broadcast)---TIP 2: Don't 'kill -9' the postmaster



Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Qingqing Zhou


On Mon, 28 Nov 2005, Gavin Sherry wrote:

 MySQL, Oracle and others implement read-ahead threads to simulate async IO
 'pre-fetching'.

Due to my tests on Windows (using the attached program and change
enable_aio=true), seems aio doesn't help as a separate thread - but maybe
because my usage is wrong ...

Regards,
Qingqing

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Qingqing Zhou


On Mon, 28 Nov 2005, Gavin Sherry wrote:

 I didn't want to jump on list and waive my hands until I had something to
 show, but since Qingqing is looking at the issue I thought I better raise
 it.


Don't worry :-) I separate the logic into a standalone program in order to
let more people can help on this issue.

Regards,
Qingqing

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Qingqing Zhou

David Boreham [EMAIL PROTECTED] wrote

 BTW, I heard a long time ago that NTFS has quite fancy read-ahead, where 
 it attempts to detect the application's access pattern including if it is
 reading sequentially and even if there is a 'stride' to the accesses when
 they're not contiguous. I would imagine that other filesystems attempt 
 similar tricks. So one might expect a simple linear prefectch
 to not help much in the presence of such a filesystem.


So we need more tests. I understand how smart current file systems are, and 
seems that depends on the interval that you send next file block read 
request (decided by cpu_cost parameter in my program).

I imagine on a multi-way machine with strong IO device, the ice-breaker 
could do much better ...

 Were you worried about the icebreaker thread getting too far ahead of the 
 scan ? If it did it might page out the data you're about to read, I think. 
 Of course this could be fixed by having the read ahead thread perodically 
 check the current location being read by the query thread and pausing if 
 it's got too far ahead.


Right.

Regards,
Qingqing 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread David Boreham




Qingqing Zhou wrote:

  
On Mon, 28 Nov 2005, Gavin Sherry wrote:
  
  
MySQL, Oracle and others implement read-ahead threads to simulate async IO
'pre-fetching'.

  
  
Due to my tests on Windows (using the attached program and change
enable_aio=true), seems aio doesn't help as a separate thread - but maybe
because my usage is wrong ...
  

I don't think your NT overlapped I/O code is quite right. At least
I think it will issue reads at a high rate without waiting for any of
them
to complete. Beyond some point that has to give the kernel gut-rot.
But anyway, I wouldn't expect the use of aio to make any
significant difference in an already threaded test program. 
The point of aio is to allow
I/O concurrency _without_ the use of threads or multiple processes.
You could re-write your program to have a single thread but use aio.
In that case it should show the same read ahead benefit that you see
with the thread.







Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Gavin Sherry
On Mon, 28 Nov 2005, Qingqing Zhou wrote:



 On Mon, 28 Nov 2005, Gavin Sherry wrote:
 
  MySQL, Oracle and others implement read-ahead threads to simulate async IO
  'pre-fetching'.

 Due to my tests on Windows (using the attached program and change
 enable_aio=true), seems aio doesn't help as a separate thread - but maybe
 because my usage is wrong ...

Right, I would imagine that it's very close. I intend to use kernel based
async IO so that we can have the prefetch effect of your sample program
without the need for threads.

Thanks,

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ice-broker scan thread

2005-11-28 Thread Qingqing Zhou

David Boreham [EMAIL PROTECTED] wrote

 I don't think your NT overlapped I/O code is quite right. At least
 I think it will issue reads at a high rate without waiting for any of them
 to complete. Beyond some point that has to give the kernel gut-rot.


[also with reply to Gavin] look up dictionary for gut-rot, got it ... Uh, 
this behavior is intended - I try to push enough requests shortly to kernel 
so that it understands that I am doing sequential scan, so it would pull the 
data from disk to file system cache more efficiently. Some file systems may 
have free-behind mechanism, but our main thread (who really process the 
query) should be fast enough before the data vanished.


 You could re-write your program to have a single thread but use aio.
 In that case it should show the same read ahead benefit that you see
 with the thread.


I guess this is also Gavin's point - I understand that will be two different 
methodologies to handle read-ahead. If no other thread/process involved, 
then the main thread will be responsible to grab a free buffer page from 
bufferpool and ask the kernel to put the data there by sync IO (current 
PostgreSQL does) or async IOs. And that's what I want to avoid. I'd like to 
use a dedicated thread/process to break the ice only, i.e., pull data from 
disk to file system cache, so that the main thread will only issue *logical* 
read.

Regards,
Qingqing 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq