[HACKERS] SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5)

2005-01-12 Thread Simon Riggs

Not sure what is going on here: why is SUSE not listed on the supported
platforms list? (still)

...is it because Reinhard seems resistant (after private conversation)
to the idea of submitting a formal port report via HACKERS, like
everybody else?

...or is it because his postings to ANNOUNCE that the port to SUSE have
gone unnoticed by those that compile the supported platforms list?

...or both?

There seems no reason for either to occur, but still - no port listed.

Please list the SUSE port, as reported by Reinhard Max.
Please can Reinhard (or another SUSE rep) submit port reports to 
[EMAIL PROTECTED]

Best Regards, Simon Riggs


On Tue, 2005-01-11 at 16:15 -0400, Marc G. Fournier wrote:
 Due to several small, and one fairly large, bugs that were found in 
 Release Candidate 4, we have been forced to release our 5th Release (and 
 hopefully last) Candidate so that we can get some proper testing in on the 
 changes before release.
 
 A current list of *known* supported platforms can be found at:
 
   http://developer.postgresql.org/supported-platforms.html
 
 We're always looking to improve that list, so we encourage anyone that is 
 running a platform not listed to please report on any success or failures 
 with 
 Release Candidate 4.
 
 Baring *any* coding changes (documentation != code) over the next week or so, 
 we *hope* that this will the final Release Candidate before Full Release, 
 with 
 that being aimed for the 19th of January.
 
 As always, this release is available on all mirrors, as listed at:
 
   http://wwwmaster.postgresql.org/download/mirrors-ftp
 
 For those using Bittorrent, David Fetter has updated the .torrents, 
 available at:
 
   http://bt.postgresql.org
 
 Please report any bug reports with this Release Candidate to:
 
   pgsql-bugs@postgresql.org
 
 
 Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
 Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5)

2005-01-12 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Not sure what is going on here: why is SUSE not listed on the supported
 platforms list? (still)

I haven't seen any reports of passes on SUSE.  I have zero doubt that PG
works on SUSE, since it's pretty much exactly like every other Linux,
but there's been no specific reports on the lists AFAIR.

 ...is it because Reinhard seems resistant (after private conversation)
 to the idea of submitting a formal port report via HACKERS, like
 everybody else?

See above.

 ...or is it because his postings to ANNOUNCE that the port to SUSE have
 gone unnoticed by those that compile the supported platforms list?

If he insists on posting such routine stuff to pgsql-announce, he should
not be too surprised that his postings do not get approved.  That isn't
the correct forum.  We don't peruse the New York Times classified ads
for such reports, either ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] Re: Re: SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5)

2005-01-12 Thread simon

Reinhard Max max@suse.de wrote on 12.01.2005, 11:17:52:
 On Wed, 12 Jan 2005 at 08:23, Simon Riggs wrote:
 
  Not sure what is going on here: why is SUSE not listed on the supported
  platforms list? (still)
  
  ...is it because Reinhard seems resistant
 
 why do you think so?
 
  (after private conversation) to the idea of submitting a formal port 
  report via HACKERS, like everybody else?
 
 I andwered you that I will do it, but last week was a short week for 
 me, and this Monday I had an email from Peter Eisentraut telling me 
 that he will add the SUSE ports to the list, so I didn't see a need to 
 send a report in addition.

Forgive my admittedly blunt tactics - I want to see SUSE on the list
before we release, and time was short - that's all. If it wasn't for
RC5, no port report would be listed in the docs in the final
release It's better to have a SUSE contact support the port than
for me to report it.

I should note here also that SGI have replied No to my request for
help (or access) with porting PostgreSQL onto the latest version of
IRIX...

Anyone got access to an HP/UX development system?

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: Re: [HACKERS] SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5)

2005-01-12 Thread simon

Reinhard Max max@suse.de wrote on 12.01.2005, 11:38:55:
 On Wed, 12 Jan 2005 at 03:53, Tom Lane wrote:
 
   ...or is it because his postings to ANNOUNCE that the port to SUSE 
   have gone unnoticed by those that compile the supported platforms 
   list?
  
  If he insists on posting such routine stuff to pgsql-announce, he 
  should not be too surprised that his postings do not get approved.
  That isn't the correct forum. We don't peruse the New York Times 
  classified ads for such reports, either ...
 
 no need to be rude to me for posting one single email to ANNOUNCE 
 after years of providing the SUSE RPMs silently. There were other 
 posts about RPM builds on ANNOUNCE, so I thought it would be the right 
 place to announce mine as well.

Please no more. I started this, so I apologise now to both of you, and
to the list and hope this ends here.

We all look forward to SUSE being a listed port.

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5)

2005-01-12 Thread Peter Eisentraut
Simon Riggs wrote:
 Not sure what is going on here: why is SUSE not listed on the
 supported platforms list? (still)

RC5 contains:

  SUSE Linux x86 8.0.0 Peter Eisentraut ([EMAIL PROTECTED]), 2005-01-10
  9.1

In the meantime I have received confirmation from Reinhard Max that his 
test methods match our requirements, so the list will be completed with 
the other platforms he reported in due time.

I'm sorry, but I won't just add it works notices if it's not clear 
what kind of testing was done.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Re: Re: SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release Candidate 5)

2005-01-12 Thread Peter Eisentraut
[EMAIL PROTECTED] wrote:
 I should note here also that SGI have replied No to my request for
 help (or access) with porting PostgreSQL onto the latest version of
 IRIX...

Another year, some result...

 Anyone got access to an HP/UX development system?

Check out HP's testdrive program.  Many of the port reports for 7.4 were 
done there, but I was too busy this time around to do that.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] port report: Linux SuSE 9.1

2005-01-12 Thread Mikhail Terekhov
Hello,
./configure --prefix=/tmp/pgsql --enable-thread-safety --with-pam 
--with-openssl --with-tcl --with-python --with-perl

==
All 96 tests passed.
==
Linux usenterekhovx2l 2.6.5-7.111-smp #1 SMP Wed Oct 13 15:45:13 UTC 
2004 i686 i686 i386 GNU/Linux

SuSE Linux 9.1 (i586)
Regards, 
Mikhail Terekhov


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] port report: Linux SuSE 9.1

2005-01-12 Thread Mikhail Terekhov
Sorry for the follow-up. That was with postgresql-8.0rc5.tar.bz2
Mikhail Terekhov wrote:
Hello,
./configure --prefix=/tmp/pgsql --enable-thread-safety --with-pam 
--with-openssl --with-tcl --with-python --with-perl

==
All 96 tests passed.
==
Linux usenterekhovx2l 2.6.5-7.111-smp #1 SMP Wed Oct 13 15:45:13 UTC 
2004 i686 i686 i386 GNU/Linux

SuSE Linux 9.1 (i586)
Regards, Mikhail Terekhov

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

--
Mikhail Terekhov, 176 South Street, Hopkinton, MA 01748
E-mail: [EMAIL PROTECTED], Ext. 44232, Tel: 1-508-249-4232

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release

2005-01-12 Thread Reinhard Max
On Wed, 12 Jan 2005 at 16:29, Peter Eisentraut wrote:

 In the meantime I have received confirmation from Reinhard Max that 
 his test methods match our requirements, so the list will be 
 completed with the other platforms he reported in due time.

Today I've updated the RPMs on the FTP server to RC5, which implicitly 
means that PostgreSQL passes the regression tests on the provided 
platforms. 
ftp://ftp.suse.com/pub/projects/postgresql/postgresql-8.0.0rc5

I have also successfully compiled and tested RC5 (but not yet created 
RPMs) on the following platforms:

  8.1-i386
  8.2-i386
  sles8-i386 
  sles8-ia64
  sles8-ppc
  sles8-ppc64
  sles8-s390
  sles8-s390x

The only failure I have to report is sles8-x86_64, where I am getting 
segfaults from psql during the regression tests. I am still 
investigating what's going on there

cu
Reinhard

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SUSE port (was [ANNOUNCE] PostgreSQL 8.0.0 Release

2005-01-12 Thread Reinhard Max
On Wed, 12 Jan 2005 at 17:29, Reinhard Max wrote:

 The only failure I have to report is sles8-x86_64, where I am 
 getting segfaults from psql during the regression tests.

The segfault in a call to snprintf somewhere in libpq's kerberos5 
code. So when I leave out --with-krb5 it compiles and passes the test 
suite without problems.

I am still not sure whether the kerberos library, glibc, or PostgreSQL 
is to blame, or if it's a combination of bugs in these components that 
triggers the segfault.

More details to follow...

cu
Reinhard

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Tom, Bruce, and others involved in this recurring TODO discussion
First, let me start by saying that I understand this has been discussed 
many times before; however, Id like to see what the current state of 
affairs is regarding the possibility of using a unique index scan to 
speed up the COUNT aggregate.

A few of my customers (some familiar with Oracle) are confused by the 
amount of time it takes PostgreSQL to come up with the result and are 
hesitating to use it because they think its too slow.  Ive tried to 
explain to them why it is slow, but in doing so Ive come to see that 
it may be worth working on.

I've reviewed the many messages regarding COUNT(*) and have looked 
through some of the source (8.0-RC4) and have arrived at the following 
questions:

1.  Is there any answer to Bruces last statement in the thread, Re: 
[PERFORM] COUNT(*) again (was Re: Index/Function organized 
(http://archives.postgresql.org/pgsql-hackers/2003-10/msg00245.php)

2.  What do you think about a separate plan type such as IndexOnlyScan? 
Good/stupid/what is he on?

3.  Assuming that Bruces aforementioned statement is correct, what 
hidden performance bottlenecks might there be?

4.  What is the consensus of updating a per-relation value containing 
the row counts?

Though not exactly like PostgreSQL, Oracle uses MVCC and performs an 
index scan on a unique value for all unqualified counts.  Admittedly, 
counts are faster than they used to be, but this is always a complaint 
I hear from open source users and professionals alike.

Ive been pretty busy, and I still need to get the user/group quota 
working with 8.0 and forward the diffs to you all, but I would be
willing to work on speeding up the count(*) if you guys give me
your input.

As always, keep up the good work!
Respectfully,
Jonah H. Harris, Senior Web Administrator
Albuquerque TVI
505.224.4814

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 Tom, Bruce, and others involved in this recurring TODO discussion…
 First, let me start by saying that I understand this has been discussed 
 many times before; however, I’d like to see what the current state of 
 affairs is regarding the possibility of using a unique index scan to 
 speed up the COUNT aggregate.

It's not happening, because no one has come up with a workable proposal.
In particular, we're not willing to slow down every other operation in
order to make COUNT-*-with-no-WHERE-clause faster.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] segfault caused by heimdal (was: SUSE port)

2005-01-12 Thread Reinhard Max
On Wed, 12 Jan 2005 at 18:20, Reinhard Max wrote:

 I am still not sure whether the kerberos library, glibc, or 
 PostgreSQL is to blame, or if it's a combination of bugs in these 
 components that triggers the segfault.

The problem is, that the heimdal implementation of kerberos5 used on 
sles8 needs an extra include statement for com_err.h in 
src/interfaces/libpq/fe-auth.c to get the prototype for 
error_message(), while on newer SUSE-releases using the MIT Kerberos5 
implementation this prototype is provided by krb5.h itself.

So I suspect this bug might hit everyone using heimdal, but it only 
gets triggered when one of the calls to kerberos in 
src/interfaces/libpq/fe-auth.c returns with an error.

I am still not sure why the crash only happened on x86_64.

cu
Reinhard

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Merlin Moncure
 Tom, Bruce, and others involved in this recurring TODO discussion...
 
 First, let me start by saying that I understand this has been
discussed
 many times before; however, I'd like to see what the current state of
 affairs is regarding the possibility of using a unique index scan to
 speed up the COUNT aggregate.

To sum up:
1.  There are good technical reasons why not to do this.  The pg
aggregate system is very elegant...not worth compromising it for a
specific case.
2.  postgresql can do many things faster than oracle.  If you prefer the
way oracle behaves, use oracle.
3.  workaround #1: just run analyze once in a while (you should do that
anyways) and query pg_Class for the #tuples in a relation.
4.  workaround #2: rig up a materialized view and query that.  This will
be faster than what oracle does, btw, at the price of some coherency.
5.  understand that count(*) from t, although frequently used, is of
dubious value in the general sense.  Sooner or later someone will
optimize this, but in light of the currently available workarounds it
doesn't seem that important.
6.  for large tables, you can get a pretty accurate count by doing:
select count(*) * 10 from t where random()  .9;
on my setup, this shaved about 15% off of the counting time...YMMV.

Merlin


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Tom,
Thank you for your prompt response and I understand your statement 
completely.

My thinking is that we may be able to implement index usage for not only 
unqualified counts, but also on any query that can be satisfied by the 
index itself. Index usage seems to be a feature that could speed up 
PostgreSQL for many people. I'm working on a project right now that 
could actually take advantage of it.

Looking at the message boards, there is significant interest in the 
COUNT(*) aspect. However, rather than solely address the COUNT(*) TODO 
item, why not fix it and add additional functionality found in 
commercial databases as well? I believe Oracle has had this feature 
since 7.3 and I know people take advantage of it.

I understand that you guys have a lot more important stuff to do than 
work on something like this. Unlike other people posting the request and 
whining about the speed, I'm offering to take it on and fix it.

Take this mesage as my willingness to propose and implement this 
feature. Any details, pitfalls, or suggestions are appreciated.

Thanks again!
-Jonah
Tom Lane wrote:
Jonah H. Harris [EMAIL PROTECTED] writes:
 

Tom, Bruce, and others involved in this recurring TODO discussion
First, let me start by saying that I understand this has been discussed 
many times before; however, Id like to see what the current state of 
affairs is regarding the possibility of using a unique index scan to 
speed up the COUNT aggregate.
   

It's not happening, because no one has come up with a workable proposal.
In particular, we're not willing to slow down every other operation in
order to make COUNT-*-with-no-WHERE-clause faster.
			regards, tom lane
 


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] segfault caused by heimdal (was: SUSE port)

2005-01-12 Thread Reinhard Max

Sorry for following up to myself once more...

On Wed, 12 Jan 2005 at 19:36, Reinhard Max wrote:

 The problem is, that the heimdal implementation of kerberos5 used on 
 sles8 needs an extra include statement for com_err.h in 
 src/interfaces/libpq/fe-auth.c to get the prototype for 
 error_message(), while on newer SUSE-releases using the MIT 
 Kerberos5 implementation this prototype is provided by krb5.h 
 itself.

after finding and reading the thread on HACKERS about com_err.h from 
last December, I think either should configure check if including 
krb5.h is sufficient for getting the prototype of error_message(), or 
a conditional include for krb5.h should be added to 
src/interfaces/libpq/fe-auth.c.

A proposed patch to achieve the latter is attached to this mail.

Either way will lead to a build time error when error_message() isn't 
declared or com_err.h can't be found, which is better than the current 
situation where only a warning about a missing prototype is issued, 
but compilation continues resulting in a broken libpq.


cu
Reinhard--- src/interfaces/libpq/fe-auth.c
+++ src/interfaces/libpq/fe-auth.c
@@ -244,6 +244,11 @@
 
 #include krb5.h
 
+#if !defined(__COM_ERR_H)  !defined(__COM_ERR_H__)
+/* if krb5.h didn't include it already */
+#include com_err.h
+#endif
+
 /*
  * pg_an_to_ln -- return the local name corresponding to an authentication
  *   name

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Greg Stark
Jonah H. Harris [EMAIL PROTECTED] writes:

 Looking at the message boards, there is significant interest in the COUNT(*)
 aspect. However, rather than solely address the COUNT(*) TODO item, why not 
 fix
 it and add additional functionality found in commercial databases as well? I
 believe Oracle has had this feature since 7.3 and I know people take advantage
 of it.

I think part of the problem is that there's a bunch of features related to
these types of queries and the lines between them blur. 

You seem to be talking about putting visibility information inside indexes for
so index-only plans can be performed. But you're also talking about queries
like select count(*) from foo with no where clauses. Such a query wouldn't
be helped by index-only scans.

Perhaps you're thinking about caching the total number of records in a global
piece of state like a materialized view? That would be a nice feature but I
think it should done as a general materialized view implementation, not a
special case solution for just this one query.

Perhaps you're thinking of the min/max problem of being able to use indexes to
pick out just the tuples satisfying the min/max constraint. That seems to me
to be one of the more tractable problems in this area but it would still
require lots of work.

I suggest you post a specific query you find is slow. Then discuss how you
think it ought to be executed and why.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] segfault caused by heimdal (was: SUSE port)

2005-01-12 Thread Kurt Roeckx
On Wed, Jan 12, 2005 at 07:36:52PM +0100, Reinhard Max wrote:
 
 The problem is, that the heimdal implementation of kerberos5 used on 
 sles8 needs an extra include statement for com_err.h in 
 src/interfaces/libpq/fe-auth.c to get the prototype for 
 error_message(), while on newer SUSE-releases using the MIT Kerberos5 
 implementation this prototype is provided by krb5.h itself.
[...]
 I am still not sure why the crash only happened on x86_64.

This is because the proper prototype is:
extern char const *error_message (long);

And C automaticly generates a prototype with in int instead.

On 32 bit platforms this ussualy isn't a problem since both int
and long are ussualy both 32 bit, but on x86_64 a long is 64 bit
while an int is only 32.


Kurt


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] segfault caused by heimdal (was: SUSE port)

2005-01-12 Thread Reinhard Max
On Wed, 12 Jan 2005 at 20:28, Kurt Roeckx wrote:

 This is because the proper prototype is:
 extern char const *error_message (long);
 
 And C automaticly generates a prototype with in int instead.
 
 On 32 bit platforms this ussualy isn't a problem since both int and 
 long are ussualy both 32 bit, but on x86_64 a long is 64 bit while 
 an int is only 32.

It's actually not the long argument, but the returned pointer that 
caused the segfault.

But this only explains why it didn't crash on i386, but not why it 
also didn't crash on IA64, ppc64 and s390x which are also LP64 
platforms.

cu
Reinhard

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Greg Stark wrote:
I think part of the problem is that there's a bunch of features related to
these types of queries and the lines between them blur. 

You seem to be talking about putting visibility information inside indexes for
so index-only plans can be performed. But you're also talking about queries
like select count(*) from foo with no where clauses. Such a query wouldn't
be helped by index-only scans.
Perhaps you're thinking about caching the total number of records in a global
piece of state like a materialized view? That would be a nice feature but I
think it should done as a general materialized view implementation, not a
special case solution for just this one query.
Perhaps you're thinking of the min/max problem of being able to use indexes to
pick out just the tuples satisfying the min/max constraint. That seems to me
to be one of the more tractable problems in this area but it would still
require lots of work.
I suggest you post a specific query you find is slow. Then discuss how you
think it ought to be executed and why.
 

You are correct, I am proposing to add visibility to the indexes.
As for unqualified counts, I believe that they could take advantage of 
an index-only scan as it requires much less I/O to perform an index scan 
than a sequential scan on large tables.

Min/Max would also take advantage of index only scans but say, for 
example, that someone has the following:

Relation SOME_USERS
user_id BIGINT PK
user_nm varchar(32) UNIQUE INDEX
some_other_attributes...
If an application needs the user names, it would run SELECT user_nm FROM 
SOME_USERS... in the current implementation this would require a 
sequential scan.  On a relation which contains 1M+ tuples, this requires 
either a lot of I/O or a lot of cache.  An index scan would immensely 
speed up this query.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 My thinking is that we may be able to implement index usage for not only 
 unqualified counts, but also on any query that can be satisfied by the 
 index itself.

The fundamental problem is that you can't do it without adding at least
16 bytes, probably 20, to the size of an index tuple header.  That would
double the physical size of an index on a simple column (eg an integer
or timestamp).  The extra I/O costs and extra maintenance costs are
unattractive to say the least.  And it takes away some of the
justification for the whole thing, which is that reading an index is
much cheaper than reading the main table.  That's only true if the index
is much smaller than the main table ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Merlin Moncure
 Greg Stark wrote:
 
 I think part of the problem is that there's a bunch of features
related
 to
 these types of queries and the lines between them blur.
 
 You seem to be talking about putting visibility information inside
 indexes for
 so index-only plans can be performed. But you're also talking about
 queries
 like select count(*) from foo with no where clauses. Such a query
 wouldn't
 be helped by index-only scans.
 
 Perhaps you're thinking about caching the total number of records in
a
 global
 piece of state like a materialized view? That would be a nice feature
but
 I
 think it should done as a general materialized view implementation,
not a
 special case solution for just this one query.
 
 Perhaps you're thinking of the min/max problem of being able to use
 indexes to
 pick out just the tuples satisfying the min/max constraint. That
seems to
 me
 to be one of the more tractable problems in this area but it would
still
 require lots of work.
 
 I suggest you post a specific query you find is slow. Then discuss
how
 you
 think it ought to be executed and why.
 
 
 
 You are correct, I am proposing to add visibility to the indexes.
 
 As for unqualified counts, I believe that they could take advantage of
 an index-only scan as it requires much less I/O to perform an index
scan
 than a sequential scan on large tables.

I agree with Greg...I think the way to approach this is a general
materialized view solution.  This would solve a broad class of tricky
problems including count() and count(*)...you get to choice between the
pay now/pay later tradeoff, etc.

Merlin

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Tom Lane wrote:
The fundamental problem is that you can't do it without adding at least
16 bytes, probably 20, to the size of an index tuple header.  That would
double the physical size of an index on a simple column (eg an integer
or timestamp).  The extra I/O costs and extra maintenance costs are
unattractive to say the least.  And it takes away some of the
justification for the whole thing, which is that reading an index is
much cheaper than reading the main table.  That's only true if the index
is much smaller than the main table ...
			regards, tom lane
 

I recognize the added cost of implementing index only scans.  As storage 
is relatively cheap these days, everyone I know is more concerned about 
faster access to data.  Similarly, it would still be faster to scan the 
indexes than to perform a sequential scan over the entire relation for 
this case.  I also acknowledge that it would be a negative impact to 
indexes where this type of acces isn't required, as you suggested and 
which is more than likely not the case.  I just wonder what more people 
would be happier with and whether the added 16-20 bytes would be 
extremely noticable considering most 1-3 year old hardware.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Dann Corbit
A notion for indices that are not unique... (won't help much on select
count(*) but might be helpful for other types of query optimization)

Put a count in the index for each distinct type.
In the worst case, the index is actually unique and you have 8 wasted
bytes per index entry and all the entries are in the leaves (perhaps it
could be an OPTION for some tables).  I don't know enough about the
structure of PostgreSQL's indexes to know if my suggestion is pure
hogwash, so don't laugh to hard if it is pure stupidity.

The most practical value of SELECT COUNT(*) is for updating statistics
(and looking good in phony-baloney benchmarks).  But the statistics only
need to be updated when you vacuum, so it hardly seems a crucial issue
to me.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
Sent: Wednesday, January 12, 2005 11:42 AM
To: Jonah H. Harris
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Much Ado About COUNT(*) 

Jonah H. Harris [EMAIL PROTECTED] writes:
 My thinking is that we may be able to implement index usage for not
only 
 unqualified counts, but also on any query that can be satisfied by the

 index itself.

The fundamental problem is that you can't do it without adding at least
16 bytes, probably 20, to the size of an index tuple header.  That would
double the physical size of an index on a simple column (eg an integer
or timestamp).  The extra I/O costs and extra maintenance costs are
unattractive to say the least.  And it takes away some of the
justification for the whole thing, which is that reading an index is
much cheaper than reading the main table.  That's only true if the index
is much smaller than the main table ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if
your
  joining column's datatypes do not match

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] segfault caused by heimdal (was: SUSE port)

2005-01-12 Thread Tom Lane
Reinhard Max max@suse.de writes:
 --- src/interfaces/libpq/fe-auth.c
 +++ src/interfaces/libpq/fe-auth.c
 @@ -244,6 +244,11 @@
  
  #include krb5.h
  
 +#if !defined(__COM_ERR_H)  !defined(__COM_ERR_H__)
 +/* if krb5.h didn't include it already */
 +#include com_err.h
 +#endif
 +
  /*
   * pg_an_to_ln -- return the local name corresponding to an authentication
   * name

That looks like a reasonable fix, but isn't it needed in
backend/libpq/auth.c as well?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] segfault caused by heimdal (was: SUSE port)

2005-01-12 Thread Reinhard Max
On Wed, 12 Jan 2005 at 14:59, Tom Lane wrote:

 That looks like a reasonable fix, but isn't it needed in 
 backend/libpq/auth.c as well?

Yes, indeed:

auth.c: In function `pg_krb5_init':
auth.c:202: warning: implicit declaration of function `com_err'

cu
Reinhard

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Rod Taylor
On Wed, 2005-01-12 at 12:52 -0700, Jonah H. Harris wrote:
 Tom Lane wrote:
 
 The fundamental problem is that you can't do it without adding at least
 16 bytes, probably 20, to the size of an index tuple header.  That would
 double the physical size of an index on a simple column (eg an integer
 or timestamp).  The extra I/O costs and extra maintenance costs are
 unattractive to say the least.  And it takes away some of the
 justification for the whole thing, which is that reading an index is
 much cheaper than reading the main table.  That's only true if the index
 is much smaller than the main table ...

 I recognize the added cost of implementing index only scans.  As storage 
 is relatively cheap these days, everyone I know is more concerned about 
 faster access to data.  Similarly, it would still be faster to scan the 
 indexes than to perform a sequential scan over the entire relation for 
 this case.  I also acknowledge that it would be a negative impact to 
 indexes where this type of acces isn't required, as you suggested and 
 which is more than likely not the case.  I just wonder what more people 
 would be happier with and whether the added 16-20 bytes would be 
 extremely noticable considering most 1-3 year old hardware.

I'm very much against this. After some quick math, my database would
grow by about 40GB if this was done. Storage isn't that cheap when you
include the hot-backup master, various slaves, RAM for caching of this
additional index space, backup storage unit on the SAN, tape backups,
additional spindles required to maintain same performance due to
increased IO because I don't very many queries which would receive an
advantage (big one for me -- we started buying spindles for performance
a long time ago), etc.

Make it a new index type if you like, but don't impose any new
performance constraints on folks who have little to no advantage from
the above proposal.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Andrew Dunstan
Jonah H. Harris said:
 Tom Lane wrote:

The fundamental problem is that you can't do it without adding at least
16 bytes, probably 20, to the size of an index tuple header.  That
would double the physical size of an index on a simple column (eg an
integer or timestamp).  The extra I/O costs and extra maintenance costs
are unattractive to say the least.  And it takes away some of the
justification for the whole thing, which is that reading an index is
much cheaper than reading the main table.  That's only true if the
index is much smaller than the main table ...

 I recognize the added cost of implementing index only scans.  As
 storage  is relatively cheap these days, everyone I know is more
 concerned about  faster access to data.  Similarly, it would still be
 faster to scan the  indexes than to perform a sequential scan over the
 entire relation for  this case.  I also acknowledge that it would be a
 negative impact to  indexes where this type of acces isn't required, as
 you suggested and  which is more than likely not the case.  I just
 wonder what more people  would be happier with and whether the added
 16-20 bytes would be
 extremely noticable considering most 1-3 year old hardware.




Monetary cost is not the issue - cost in time is the issue.

cheers

andrew



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Greg Stark

Jonah H. Harris [EMAIL PROTECTED] writes:

 You are correct, I am proposing to add visibility to the indexes.

Then I think the only way you'll get any support is if it's an option. Since
it would incur a performance penalty on updates and deletes.

 As for unqualified counts, I believe that they could take advantage of an
 index-only scan as it requires much less I/O to perform an index scan than a
 sequential scan on large tables.

No, sequential scans require slightly more i/o than index scans. More
importantly they require random access i/o instead of sequential i/o which is
much slower.

Though this depends. If the tuple is very wide then the index might be faster
to scan since it would only contain the data from the fields being indexed.

This brings to mind another approach. It might be handy to split the heap for
a table into multiple heaps. The visibility information would only be in one
of the heaps. This would be a big win if many of the fields were rarely used,
especially if they're rarely used by sequential scans.


 Relation SOME_USERS
 user_id BIGINT PK
 user_nm varchar(32) UNIQUE INDEX
 some_other_attributes...

What's with the fetish with unique indexes? None of this is any different for
unique indexes versus non-unique indexes.


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jeff Davis


 No, sequential scans require slightly more i/o than index scans. More
 importantly they require random access i/o instead of sequential i/o which is
 much slower.
 

Just to clear it up, I think what you meant was the index requires
random i/o, not the table. And the word slightly depends on the size
of the table I suppose. And of course it also depends on how many tuples
you actually need to retrieve (in this case we're talking about
retrieving all the tuples ragardless).

 Though this depends. If the tuple is very wide then the index might be faster
 to scan since it would only contain the data from the fields being indexed.
 

That, and it seems strange on the surface to visit every entry in an
index, since normally indexes are used to find only a small fraction of
the tuples.

 This brings to mind another approach. It might be handy to split the heap for
 a table into multiple heaps. The visibility information would only be in one
 of the heaps. This would be a big win if many of the fields were rarely used,
 especially if they're rarely used by sequential scans.

Except then the two heaps would have to be joined somehow for every
operation. It makes sense some times to (if you have a very wide table)
split off the rarely-accessed attributes into a seperate table to be
joined one-to-one when those attributes are needed. To have the system
do that automatically would create problems if the attributes that are
split off are frequently accessed, right?

Perhaps you could optionally create a seperate copy of the same tuple
visibility information linked in a way similar to an index. It still
seems like you gain very little, and only in some very rare situation
that I've never encountered (I've never had the need to do frequent
unqualified count()s at the expense of other operations). 

Now, it seems like it might make a little more sense to use an index for
min()/max(), but that's a different story.

Regards,
Jeff Davis



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Andrew Dunstan wrote:
Monetary cost is not the issue - cost in time is the issue.
cheers
andrew
 

We seem to be in agreement.  I'm looking for faster/smarter access to 
data, not the monetary cost of doing so.  Isn't it faster/smarter to 
satisfy a query with the index rather than sequentially scanning an 
entire relation if it is possible?

Replying to the list as a whole:
If this is such a bad idea, why do other database systems use it?  As a 
businessperson myself, it doesn't seem logical to me that commercial 
database companies would spend money on implementing this feature if it 
wouldn't be used.  Remember guys, I'm just trying to help.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] PANIC: right sibling's left-link doesn't match

2005-01-12 Thread Jim Buttafuoco
Postgres on one of my big database servers just crashed with the following 
message

PANIC:  right sibling's left-link doesn't match

Does any one have any idea's what might cause this.  


Some background.  This is a Debian Sarge system running PG 7.4.5 on i386 dual 
XEON system with 4G of memory.  I just 
rebooted the system because a fan had failed and was replaced. 

Thanks
Jim



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Rod Taylor wrote:
grow by about 40GB if this was done. Storage isn't that cheap when you
include the hot-backup master, various slaves, RAM for caching of this
additional index space, backup storage unit on the SAN, tape backups,
additional spindles required to maintain same performance due to
increased IO because I don't very many queries which would receive an
advantage (big one for me -- we started buying spindles for performance
a long time ago), etc.
 

Thanks for the calculation and example.  This would be a hefty amount of 
overhead if none of your queries would benefit from this change.

Make it a new index type if you like, but don't impose any new
performance constraints on folks who have little to no advantage from
the above proposal.
 

I agree with you that some people may not see any benefit from this and 
that it may look worse performance/storage-wise.  I've considered this 
route, but it seems like more of a workaround than a solution.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jeff Davis

 
 I recognize the added cost of implementing index only scans.  As storage 
 is relatively cheap these days, everyone I know is more concerned about 
 faster access to data.  Similarly, it would still be faster to scan the 
 indexes than to perform a sequential scan over the entire relation for 
 this case.  I also acknowledge that it would be a negative impact to 
 indexes where this type of acces isn't required, as you suggested and 
 which is more than likely not the case.  I just wonder what more people 
 would be happier with and whether the added 16-20 bytes would be 
 extremely noticable considering most 1-3 year old hardware.

I think perhaps you missed the point: it's not about price. If an index
takes up more space, it will also take more time to read that index.
16-20 bytes per index entry is way too much extra overhead for most
people, no matter what hardware they have. That overhead also tightens
the performace at what is already the bottleneck for almost every DB:
i/o bandwidth.

The cost to count the tuples is the cost to read that visibility
information for each tuple in the table. A seqscan is the most efficient
way to do that since it's sequential i/o, rather than random i/o. The
only reason the word index even comes up is because it is inefficient
to retrieve a lot of extra attributes you don't need from a table.

You might be able to pack that visibility information a little bit more
densely in an index than a table, assuming that the table has more than
a couple columns. But if you shoehorn the visibility information into an
index, you destroy much of the value of an index to most people, who
require the index to be compact to be efficient.

An index isn't really the place for something when all you really want
to do is a sequential scan over a smaller amount of data (so that the
visibility information is more dense). Make a narrow table, and seqscan
over that. Then, if you need more attributes in the table, just do a
one-to-one join with a seperate table.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jon Jensen
On Wed, 12 Jan 2005, Jonah H. Harris wrote:
Andrew Dunstan wrote:
Monetary cost is not the issue - cost in time is the issue.
We seem to be in agreement.  I'm looking for faster/smarter access to data, 
not the monetary cost of doing so.  Isn't it faster/smarter to satisfy a 
query with the index rather than sequentially scanning an entire relation if 
it is possible?

Replying to the list as a whole:
If this is such a bad idea, why do other database systems use it?  As a 
businessperson myself, it doesn't seem logical to me that commercial database 
companies would spend money on implementing this feature if it wouldn't be 
used.  Remember guys, I'm just trying to help.
If you're willing to do the work, and have the motivation, probably the 
best thing to do is just do it. Then you can use empirical measurements of 
the effect on disk space, speed of various operations, etc. to discuss the 
merits/demerits of your particular implementation.

Then others don't need to feel so threatened by the potential change. 
Either it'll be (1) an obvious win, or (2) a mixed bag, where allowing the 
new way to be specified as an option is a possibility, or (3) you'll have 
to go back to the drawing board if it's an obvious loss.

This problem's been talked about a lot, but seeing some code and metrics 
from someone with a personal interest in solving it would really be 
progress IMHO.

Jon
--
Jon Jensen
End Point Corporation
http://www.endpoint.com/
Software development with Interchange, Perl, PostgreSQL, Apache, Linux, ...
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Jon Jensen wrote:
If you're willing to do the work, and have the motivation, probably 
the best thing to do is just do it. Then you can use empirical 
measurements of the effect on disk space, speed of various operations, 
etc. to discuss the merits/demerits of your particular implementation.

Then others don't need to feel so threatened by the potential change. 
Either it'll be (1) an obvious win, or (2) a mixed bag, where allowing 
the new way to be specified as an option is a possibility, or (3) 
you'll have to go back to the drawing board if it's an obvious loss.

This problem's been talked about a lot, but seeing some code and 
metrics from someone with a personal interest in solving it would 
really be progress IMHO.
Jon,
This is pretty much where I'm coming from.  I'm looking at working on 
this and I'd rather discuss suggestions for how to implement this than 
whether we should have it, etc.  If it turns out better, great; if not, 
then I just wasted my time.  I really do appreciate everyone's comments 
and suggestions.

Thanks!
-Jonah
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Bruno Wolff III
On Wed, Jan 12, 2005 at 13:42:58 -0700,
  Jonah H. Harris [EMAIL PROTECTED] wrote:
 We seem to be in agreement.  I'm looking for faster/smarter access to 
 data, not the monetary cost of doing so.  Isn't it faster/smarter to 
 satisfy a query with the index rather than sequentially scanning an 
 entire relation if it is possible?

Not necessarily. Also note that Postgres will use an index scan for
count(*) if there is a relatively selective WHERE clause.

 Replying to the list as a whole:
 
 If this is such a bad idea, why do other database systems use it?  As a 
 businessperson myself, it doesn't seem logical to me that commercial 
 database companies would spend money on implementing this feature if it 
 wouldn't be used.  Remember guys, I'm just trying to help.

Other databases use different ways of handling tuples that are only visible
to some concurrent transactions.

Postgres is also flexible enough that you can make your own materialized
view (using triggers) to handle count(*) if that makes sense for you.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jeff Davis
 
 We seem to be in agreement.  I'm looking for faster/smarter access to 
 data, not the monetary cost of doing so.  Isn't it faster/smarter to 
 satisfy a query with the index rather than sequentially scanning an 
 entire relation if it is possible?
 

You have to scan every tuple's visibility information no matter what.
Sequential i/o is fastest (per byte), so the most efficient possible
method is seqscan. Unfortunately for count(*), the tables also store
columns, which are really just clutter as you're moving through the
table looking for visibility information.

Indexes are designed for searches, not exhaustive retrieval of all
records. If you can store that visibility information in a seperate
place so that it's not cluttered by unneeded attributes that could be
helpful, but an index is not the place for that. If you store that in
the index, you are really imposing a new cost on yourself: the cost to
do random i/o as you're jumping around the index trying to access every
entry, plus the index metadata.

You could make a narrow table and join with the other attributes. That
might be a good place that wouldn't clutter up the visibility
information much (it would just need a primary key). A seqscan over that
would be quite efficient.


 If this is such a bad idea, why do other database systems use it?  As a 
 businessperson myself, it doesn't seem logical to me that commercial 
 database companies would spend money on implementing this feature if it 
 wouldn't be used.  Remember guys, I'm just trying to help.
 

Some databases use an internal counter to count rows as they are
added/deleted. This does not give accurate results in a database that
supports ACID -- more specifically a database that implements the
isolation part of ACID. Two concurrent transactions, if the database
supports proper isolation, could have two different results for count(*)
and both would be correct. That makes all the optimized count(*)
databases really just give a close number, not the real number. If you
just want a close number, there are other ways of doing that in
PostgreSQL that people have already mentioned.

If you know of a database that supports proper isolation and also has a
faster count(*) I would be interested to know what it is. There may be a
way to do it without sacrificing in other areas, but I don't know what
it is. Does someone know exactly what oracle actually does?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Jeff Davis wrote:
Does someone know exactly what oracle actually does?
 

some old info resides here, http://www.orsweb.com/techniques/fastfull.html
I'll try and find something more recent.
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Marek Mosiewicz
I agree with last statement. count(*) is not most important.
Most nice thing with index only scan is when it contains more than one
column.
When there is join among many tables where from each table only one or few
columns are taken
it take boost query incredibly.

For exmaple on when you have customer table and ID, NAME index on it then:

select c.name,i.* from customer c, invoice i where c.id=i.customer_id

then it is HUGE difference there. without index only scan you require to
make index io and
random table access (assuming no full scan). With index only scan you need
only
index scan and can skip expensive random table io.
It is very simple but powerful optmization in many cases to reduce join
expence on many
difficult queries.
You can have get some kind of index organized table (you use only index so
in fact it is
ordered table)

Selecting only few columns is quite often scenario in reporting.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Jonah H. Harris
Sent: Wednesday, January 12, 2005 8:36 PM
To: Greg Stark
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Much Ado About COUNT(*)


Greg Stark wrote:

I think part of the problem is that there's a bunch of features related to
these types of queries and the lines between them blur.

You seem to be talking about putting visibility information inside indexes
for
so index-only plans can be performed. But you're also talking about queries
like select count(*) from foo with no where clauses. Such a query
wouldn't
be helped by index-only scans.

Perhaps you're thinking about caching the total number of records in a
global
piece of state like a materialized view? That would be a nice feature but I
think it should done as a general materialized view implementation, not a
special case solution for just this one query.

Perhaps you're thinking of the min/max problem of being able to use indexes
to
pick out just the tuples satisfying the min/max constraint. That seems to
me
to be one of the more tractable problems in this area but it would still
require lots of work.

I suggest you post a specific query you find is slow. Then discuss how you
think it ought to be executed and why.



You are correct, I am proposing to add visibility to the indexes.

As for unqualified counts, I believe that they could take advantage of
an index-only scan as it requires much less I/O to perform an index scan
than a sequential scan on large tables.

Min/Max would also take advantage of index only scans but say, for
example, that someone has the following:

Relation SOME_USERS
user_id BIGINT PK
user_nm varchar(32) UNIQUE INDEX
some_other_attributes...

If an application needs the user names, it would run SELECT user_nm FROM
SOME_USERS... in the current implementation this would require a
sequential scan.  On a relation which contains 1M+ tuples, this requires
either a lot of I/O or a lot of cache.  An index scan would immensely
speed up this query.





---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Alvaro Herrera
On Wed, Jan 12, 2005 at 12:41:38PM -0800, Jeff Davis wrote:

 Except then the two heaps would have to be joined somehow for every
 operation. It makes sense some times to (if you have a very wide table)
 split off the rarely-accessed attributes into a seperate table to be
 joined one-to-one when those attributes are needed. To have the system
 do that automatically would create problems if the attributes that are
 split off are frequently accessed, right?

That mechanism exists right now, and it's called TOAST, dubbed the best
thing since sliced bread.  We even have documentation for it, new as of
our latest RC:

http://developer.postgresql.org/docs/postgres/storage-toast.html

-- 
Alvaro Herrera ([EMAIL PROTECTED])
El día que dejes de cambiar dejarás de vivir

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] PANIC: right sibling's left-link doesn't match

2005-01-12 Thread Tom Lane
Jim Buttafuoco [EMAIL PROTECTED] writes:
 Postgres on one of my big database servers just crashed with the following 
 message
 PANIC:  right sibling's left-link doesn't match

 Does any one have any idea's what might cause this.  

Corrupted btree index.  REINDEX should help, though I'm afraid the error
message isn't very helpful about identifying which index is busted.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Bruno Wolff III
On Wed, Jan 12, 2005 at 14:09:07 -0700,
  Jonah H. Harris [EMAIL PROTECTED] wrote:

Please keep stuff posted to the list so that other people can contribute
and learn from the discussion unless there is a particular reason to
limited who is involved in the discussion.

 Bruno,
 
 Thanks for the information.  I was told that PostgreSQL couldn't use 
 index scans for count(*) because of the visibility issue.  Has something 
 changed or was I told incorrectly?

It isn't that it can't, it is that for cases where you are counting more
than a few percent of a table, it will be faster to use a sequential
scan. Part of the reason is that for any hits you get in the index, you
have to check in the table to make sure the current transaction can see
the current tuple. Even if you could just get away with using just an
index scan you are only going to see a constant factor speed up with
probably not too big of a constant.

Perhaps you think that the count is somehow saved in the index so that
you don't have to scan through the whole index to get the number of
rows in a table? That isn't the case, but is what creating a materialized
view would effectively do for you.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] segfault caused by heimdal (was: SUSE port)

2005-01-12 Thread Tom Lane
Reinhard Max max@suse.de writes:
 On Wed, 12 Jan 2005 at 14:59, Tom Lane wrote:
 That looks like a reasonable fix, but isn't it needed in 
 backend/libpq/auth.c as well?

 Yes, indeed:
 auth.c: In function `pg_krb5_init':
 auth.c:202: warning: implicit declaration of function `com_err'

OK, patch applied in both files.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Simon Riggs
On Wed, 2005-01-12 at 15:09 -0500, Rod Taylor wrote:
 On Wed, 2005-01-12 at 12:52 -0700, Jonah H. Harris wrote:
  Tom Lane wrote:
  
  The fundamental problem is that you can't do it without adding at least
  16 bytes, probably 20, to the size of an index tuple header.  That would
  double the physical size of an index on a simple column (eg an integer
  or timestamp).  The extra I/O costs and extra maintenance costs are
  unattractive to say the least.  And it takes away some of the
  justification for the whole thing, which is that reading an index is
  much cheaper than reading the main table.  That's only true if the index
  is much smaller than the main table ...
 
  I recognize the added cost of implementing index only scans.  As storage 
  is relatively cheap these days, everyone I know is more concerned about 
  faster access to data.  Similarly, it would still be faster to scan the 
  indexes than to perform a sequential scan over the entire relation for 
  this case.  I also acknowledge that it would be a negative impact to 
  indexes where this type of acces isn't required, as you suggested and 
  which is more than likely not the case.  I just wonder what more people 
  would be happier with and whether the added 16-20 bytes would be 
  extremely noticable considering most 1-3 year old hardware.
 
 I'm very much against this. After some quick math, my database would
 grow by about 40GB if this was done. Storage isn't that cheap when you
 include the hot-backup master, various slaves, RAM for caching of this
 additional index space, backup storage unit on the SAN, tape backups,
 additional spindles required to maintain same performance due to
 increased IO because I don't very many queries which would receive an
 advantage (big one for me -- we started buying spindles for performance
 a long time ago), etc.
 
 Make it a new index type if you like, but don't impose any new
 performance constraints on folks who have little to no advantage from
 the above proposal.

Jonah,

People's objections are:
- this shouldn't be the system default, so would need to be implemented
as a non-default option on a b-tree index
- its a lot of code and if you want it, you gotta do it

Remember you'll need to
- agree all changes via the list and accept that redesigns may be
required, even at a late stage of coding
- write visibility code into the index
- write an additional node type to handle the new capability
- microarchitecture performance testing so you know whether its really
worthwhile, covering a range of cases
- add code to the optimiser to so it can estimate the cost of using this
and to know when to do this
- add a column to the catalog to record whether an index has the
visibility option
- add code to the parser to invoke the option
- update pg_dump so that it correctly dumps tables with that option
- copy and adapt all of the existing tests for the new mechanism
- document it

If you really want to do all of that, I'm sure you'd get help, but
mostly it will be you that has to drive the change through.

There are some other benefits of that implementation:
You'd be able to vacuum the index (only), allowing index access to
remain reasonably constant, even as the table itself grew from dead
rows.

The index could then make sensible the reasonably common practice of
using a covered index - i.e. putting additional columns into the index
to satisfy the whole query just from the index.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Bruno Wolff III wrote:
On Wed, Jan 12, 2005 at 14:09:07 -0700,
 Jonah H. Harris [EMAIL PROTECTED] wrote:
Please keep stuff posted to the list so that other people can contribute
and learn from the discussion unless there is a particular reason to
limited who is involved in the discussion.
 

not a problem.
Perhaps you think that the count is somehow saved in the index so that
you don't have to scan through the whole index to get the number of
rows in a table? That isn't the case, but is what creating a materialized
view would effectively do for you.
 

I understand that the count is not stored in the index.  I am saying 
that it may be faster to generate the count off the keys in the index.

I shouldn't have titled this message COUNT(*) as that isn't the only 
thing I'm trying to accomplish.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] PANIC: right sibling's left-link doesn't match

2005-01-12 Thread Jim Buttafuoco
It did print the query right after the PANIC message, so I do have the table 
name.  I just completed the reindex.

Thanks



-- Original Message ---
From: Tom Lane [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: pgsql-hackers pgsql-hackers@postgresql.org
Sent: Wed, 12 Jan 2005 16:45:11 -0500
Subject: Re: [HACKERS] PANIC: right sibling's left-link doesn't match 

 Jim Buttafuoco [EMAIL PROTECTED] writes:
  Postgres on one of my big database servers just crashed with the following 
  message
  PANIC:  right sibling's left-link doesn't match
 
  Does any one have any idea's what might cause this.
 
 Corrupted btree index.  REINDEX should help, though I'm afraid the error
 message isn't very helpful about identifying which index is busted.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
--- End of Original Message ---


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jonah H. Harris
Simon Riggs wrote:
Jonah,
People's objections are:
- this shouldn't be the system default, so would need to be implemented
as a non-default option on a b-tree index
- its a lot of code and if you want it, you gotta do it
Remember you'll need to
- agree all changes via the list and accept that redesigns may be
required, even at a late stage of coding
- write visibility code into the index
- write an additional node type to handle the new capability
- microarchitecture performance testing so you know whether its really
worthwhile, covering a range of cases
- add code to the optimiser to so it can estimate the cost of using this
and to know when to do this
- add a column to the catalog to record whether an index has the
visibility option
- add code to the parser to invoke the option
- update pg_dump so that it correctly dumps tables with that option
- copy and adapt all of the existing tests for the new mechanism
- document it
If you really want to do all of that, I'm sure you'd get help, but
mostly it will be you that has to drive the change through.
There are some other benefits of that implementation:
You'd be able to vacuum the index (only), allowing index access to
remain reasonably constant, even as the table itself grew from dead
rows.
The index could then make sensible the reasonably common practice of
using a covered index - i.e. putting additional columns into the index
to satisfy the whole query just from the index.
 

Simon,
I am willing to take it on and I understand that the workload is mine.  
As long as everyone gives me some suggestions, I'm good it being optional.

-Jonah
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Rod Taylor
 The index could then make sensible the reasonably common practice of
 using a covered index - i.e. putting additional columns into the index
 to satisfy the whole query just from the index.

 I am willing to take it on and I understand that the workload is mine.  
 As long as everyone gives me some suggestions, I'm good it being optional.

If nobody is working on it, you may find that the below TODO item might
accomplish most of what you're looking for as well as generally
improving performance. The count(*) on a where clause would result in
one index scan and one partial sequential heap scan. Not as fast for the
specific examples you've shown, but far better than today and covers
many other cases as well.

Fetch heap pages matching index entries in sequential order 

Rather than randomly accessing heap pages based on index
entries, mark heap pages needing access in a bitmap and do the
lookups in sequential order. Another method would be to sort
heap ctids matching the index before accessing the heap rows.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] sparse (static analyzer) report

2005-01-12 Thread Mark Wong
Hi,

Just wondering if anyone finds spare's analysis useful.  I ran it
against 8.0-rc5:
http://developer.osdl.org/markw/pgsql/sparse/pg-8.0rc5.txt

Sparse can be downloaded
http://www.codemonkey.org.uk/projects/bitkeeper/sparse/
or
bk://sparse.bkbits.net/sparse

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] looking for rh9 rpms for pgadmin v 1.2

2005-01-12 Thread Dave Cramer
Hi,
Is there any intent to build these rpm's ? Who is responsible for this ?
Dave
--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] pg_autovacuum w/ dbt2

2005-01-12 Thread Mark Wong
On Tue, Dec 21, 2004 at 05:56:47PM -0500, Tom Lane wrote:
 If you want to track it yourself, please change those elog(ERROR)s to
 elog(PANIC) so that they'll generate core dumps, then build with
 --enable-debug if you didn't already (--enable-cassert would be good too)
 and get a debugger stack trace from the core dump.

Ok, well I got a core dump with 8.0rc4, but I'm not sure if it's
exactly the same problem.  I have the postgres binary and the core
here:
http://developer.osdl.org/markw/pgsql/core/2files.tar.bz2

But it's for ia64, if you got one.  Otherwise, this is what gdb is
telling me with a bt:

(gdb) bt
#0  FunctionCall2 (flinfo=0x60187850, arg1=16, arg2=1043)
at fmgr.c:1141
#1  0x4007a320 in _bt_checkkeys (scan=0x600c2d80, 
tuple=0x2101f660, dir=ForwardScanDirection, 
continuescan=0x6fff8ae0 \001) at nbtutils.c:542
#2  0x40078eb0 in _bt_endpoint (scan=0x60187690, 
dir=ForwardScanDirection) at nbtsearch.c:1309
#3  0x400771e0 in _bt_first (scan=0x60187690, 
dir=ForwardScanDirection) at nbtsearch.c:482
#4  0x40074350 in btgettuple (fcinfo=0x1) at nbtree.c:265
#5  0x403bd430 in FunctionCall2 (flinfo=0x60187700, 
arg1=6917529027642685072, arg2=1) at fmgr.c:1141
#6  0x4006b3a0 in index_getnext (scan=0x60187690, 
direction=ForwardScanDirection) at indexam.c:429
#7  0x4006a1e0 in systable_getnext (sysscan=0x60187668)
at genam.c:253
#8  0x4039c970 in SearchCatCache (cache=0x20001f1e0140, v1=0, 
v2=6917529027641871376, v3=4294966252, v4=6917546619827097184)
at catcache.c:1217
#9  0x403a9ee0 in SearchSysCache (cacheId=33, key1=1043, key2=0, 
key3=0, key4=0) at syscache.c:524
#10 0x40049110 in TupleDescInitEntry (desc=0x601872c8, 
attributeNumber=4, attributeName=0x60187614 \023\004, 
oidtypeid=1043, typmod=28, attdim=0) at tupdesc.c:444
#11 0x401b5fc0 in ExecTypeFromTLInternal (
targetList=0x60135d40, hasoid=-64 'À', skipjunk=1 '\001')
at execTuples.c:570
#12 0x401a4a20 in ExecInitJunkFilter (targetList=0x60135b38, 
hasoid=-64 'À', slot=0x601258a0) at execJunk.c:76
#13 0x401a6890 in InitPlan (queryDesc=0x60177ed0, 
explainOnly=0 '\0') at execMain.c:456
#14 0x401a5800 in ExecutorStart (queryDesc=0x60177ed0, 
explainOnly=0 '\0') at execMain.c:160
#15 0x401d6ab0 in _SPI_pquery (queryDesc=0x60177ed0, tcount=0)
at spi.c:1521
#16 0x401d6390 in _SPI_execute_plan (plan=0x6fff9380, 
Values=0x0, Nulls=0x0, snapshot=0x0, crosscheck_snapshot=0x0, 
read_only=0 '\0', tcount=0) at spi.c:1452


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pg_autovacuum w/ dbt2

2005-01-12 Thread Tom Lane
Mark Wong [EMAIL PROTECTED] writes:
 Ok, well I got a core dump with 8.0rc4, but I'm not sure if it's
 exactly the same problem.  I have the postgres binary and the core
 here:
   http://developer.osdl.org/markw/pgsql/core/2files.tar.bz2
 But it's for ia64, if you got one.

Poking around with gdb, it seems that the scankey structure being used
by SearchCatCache got clobbered; which is a bit surprising because
that's just a local variable in that function, and hence isn't really
very exposed.  The contents of cache-cc_skey are okay, but cur_skey[0]
and cur_skey[1] don't match, which implies the clobber happened
somewhere between lines 1110 and 1217 of catcache.c.

(gdb) f 8
#8  0x4039c970 in SearchCatCache (cache=0x20001f1e0140, v1=0,
v2=6917529027641871376, v3=4294966252, v4=6917546619827097184)
at catcache.c:1217
1217in catcache.c
(gdb) p cache-cc_skey
$7 = {{sk_flags = 0, sk_attno = -2, sk_strategy = 3, sk_subtype = 0,
sk_func = {fn_addr = 0x2003a9c8, fn_oid = 184, fn_nargs = 2,
  fn_strict = 1 '\001', fn_retset = 0 '\0', fn_extra = 0x0,
  fn_mcxt = 0x6009e550, fn_expr = 0x0}, sk_argument = 0}, {
sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {
  fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0',
  fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument = 0}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0,
sk_subtype = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0,
  fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0,
  fn_expr = 0x0}, sk_argument = 0}, {sk_flags = 0, sk_attno = 0,
sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0, fn_oid = 0,
  fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0,
  fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}}
(gdb) p cur_skey
$8 = {{sk_flags = 0, sk_attno = 1, sk_strategy = 24932, sk_subtype = 24948,
sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0',
  fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument = 1043}, {sk_flags = 0, sk_attno = 1043, sk_strategy = 0,
sk_subtype = 4294967295, sk_func = {fn_addr = 0, fn_oid = 0,
  fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0,
  fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {sk_flags = 0,
sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0,
  fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0',
  fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {
sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {
  fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0',
  fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument = 0}}

The core dump happens because we eventually try to jump through the
zeroed-out fn_addr function pointer.

Not sure what to make of this.  That's extremely heavily used,
well-debugged code; it's hard to believe that there are any intermittent
bugs in it.

I notice that the backend seems to have been using some nonstandard C
code:

Error while reading shared library symbols:
/home/markw/dbt2/storedproc/pgsql/c/../../../storedproc/pgsql/c/funcs.so: No 
such file or directory.

What is that, and how much confidence have you got in it?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Jeff Davis
 That mechanism exists right now, and it's called TOAST, dubbed the best
 thing since sliced bread.  We even have documentation for it, new as of
 our latest RC:
 
 http://developer.postgresql.org/docs/postgres/storage-toast.html
 

Thanks for the link. It looks like it breaks it up into chunks of about
2KB. I think the conversation was mostly assuming the tables were
somewhat closer to the size of an index. If you have more than 2KB per
tuple, pretty much anything you do with an index would be faster I would
think.

My original concern was if I had a table like (x int) and then postgres
broke the visibility information away form that, that would cause
serious performance problems if postgres had to do a join just to do
select ... where x = 5. Right?

But of course, we all love toast. Everyone needs to make those wide
tables once in a while, and toast does a great job of taking those
worries away in an efficient way. I am just saying that hopefully we
don't have to seqscan a table with wide tuples very often :)

Regards,
Jeff Davis




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Greg Stark

Jeff Davis [EMAIL PROTECTED] writes:

 But of course, we all love toast. Everyone needs to make those wide
 tables once in a while, and toast does a great job of taking those
 worries away in an efficient way. I am just saying that hopefully we
 don't have to seqscan a table with wide tuples very often :)

I thought toast only handled having individual large columns. So if I have a
2kb text column it'll pull that out of the table for me. But if I have 20
columns each of which have 100 bytes will it still help me? Will it kick in if
I define a single column which stores a record type with 20 columns each of
which have a 100 byte string?

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 I thought toast only handled having individual large columns. So if I have a
 2kb text column it'll pull that out of the table for me. But if I have 20
 columns each of which have 100 bytes will it still help me? Will it kick in if
 I define a single column which stores a record type with 20 columns each of
 which have a 100 byte string?

Yes, and yes.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Bruce Momjian
Jonah H. Harris wrote:
 1.  Is there any answer to Bruce?s last statement in the thread, ?Re: 
 [PERFORM] COUNT(*) again (was Re: Index/Function organized? 
 (http://archives.postgresql.org/pgsql-hackers/2003-10/msg00245.php)

Let me give you my ideas in the above URL and why they are probably
wrong.

My basic idea was to keep a status bit on each index entry telling it if
a previous backend looked at the heap and determined it was valid.  Here
are the details:

Each heap tuple stores the creation and expire xid.  To determine if a
tuple is visible, you have to check the clog to see if the recorded
transaction ids were committed, in progress, or aborted.  When you do
the lookup the first time and the transaction isn't in progress, you can
update a bit to say that the tuple is visible or not.  In fact we have
several tuple bits:

#define HEAP_XMIN_COMMITTED 0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID   0x0200  /* t_xmin invalid/aborted */
#define HEAP_XMAX_COMMITTED 0x0400  /* t_xmax committed */
#define HEAP_XMAX_INVALID   0x0800  /* t_xmax invalid/aborted */

Once set they allow a later backend access to know the visiblity of the
row without having to re-check clog.

The big overhead in index lookups is having to check the heap row for
visibility.  My idea is that once you check the visibility the first
time in the heap, why can't we set some bit in the index so that later
index lookups don't have to look up the heap anymore.  Over time most
index entries would have bits set and you wouldn't need to recheck the
heap.  (Oh, and you could only update the bit when all active
transactions are newer than the creation transaction so we know they
should all see it as visible.)

Now, this would work for telling us that the transaction that created
the index entry was committed or aborted.  Over time most index entries
would have that status.

I think the problem with this idea is expiration.  If a row is deleted
we never go and update the index pointing to that heap, so the creation
status isn't enough for us to know that the row is valid.

I can't think of a way to fix this.  I think it is expensive to clear
the status bits on a row delete because finding an index row that goes
with a particular heap is quite expensive.

So, those are my ideas.  If they could be made to work it would give us
most of the advantages of an index scan with _few_ heap lookups with
very little overhead.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 My basic idea was to keep a status bit on each index entry telling it if
 a previous backend looked at the heap and determined it was valid.

Even if you could track the tuple's committed-good status reliably, that
isn't enough under MVCC.  The tuple might be committed good, and seen
that way by some other backend that set the bit, and yet it's not supposed
to be visible to your older transaction.  Or the reverse at tuple
deletion.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  My basic idea was to keep a status bit on each index entry telling it if
  a previous backend looked at the heap and determined it was valid.
 
 Even if you could track the tuple's committed-good status reliably,
 that isn't enough under MVCC.  The tuple might be committed good, and
 seen that way by some other backend that set the bit, and yet it's not
 supposed to be visible to your older transaction.  Or the reverse at
 tuple deletion.

I mentioned that:

 (Oh, and you could only update the bit when all active transactions
 are newer than the creation transaction so we know they should all see
 it as visible.)

--
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Tom Lane wrote:
 Even if you could track the tuple's committed-good status reliably,
 that isn't enough under MVCC.

 I mentioned that:

 (Oh, and you could only update the bit when all active transactions
 are newer than the creation transaction so we know they should all see
 it as visible.)

Ah, right, I missed the connection.  Hmm ... that's sort of the inverse
of the killed tuple optimization we put in a release or two back,
where an index tuple is marked as definitely dead once it's committed
dead and the deletion is older than all active transactions.  Maybe that
would work.  You'd still have to visit the heap when a tuple is in the
uncertain states, but with luck that'd be only a small fraction of the
time.

I'm still concerned about the update costs of maintaining these bits,
but this would at least escape the index-bloat objection.  I think we
still have one free bit in index tuple headers...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Tom Lane wrote:
  Even if you could track the tuple's committed-good status reliably,
  that isn't enough under MVCC.
 
  I mentioned that:
 
  (Oh, and you could only update the bit when all active transactions
  are newer than the creation transaction so we know they should all see
  it as visible.)
 
 Ah, right, I missed the connection.  Hmm ... that's sort of the inverse
 of the killed tuple optimization we put in a release or two back,
 where an index tuple is marked as definitely dead once it's committed
 dead and the deletion is older than all active transactions.  Maybe that
 would work.  You'd still have to visit the heap when a tuple is in the
 uncertain states, but with luck that'd be only a small fraction of the
 time.

Yes, it is sort of the reverse, but how do you get around the delete
case?  Even if the bit is set, how do you know it wasn't deleted since
you set the bit?  Seems you always have to still check the heap, no?

 I'm still concerned about the update costs of maintaining these bits,
 but this would at least escape the index-bloat objection.  I think we
 still have one free bit in index tuple headers...

You mean you are considering clearing the index bit when you delete the
row?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-12 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Ah, right, I missed the connection.  Hmm ... that's sort of the inverse
 of the killed tuple optimization we put in a release or two back,
 where an index tuple is marked as definitely dead once it's committed
 dead and the deletion is older than all active transactions.

 Yes, it is sort of the reverse, but how do you get around the delete
 case?

A would-be deleter of a tuple would have to go and clear the known
good bits on all the tuple's index entries before it could commit.
This would bring the tuple back into the uncertain status condition
where backends would have to visit the heap to find out what's up.
Eventually the state would become certain again (either dead to
everyone or live to everyone) and one or the other hint bit could be
set again.

The ugly part of this is that clearing the bit is not like setting a
hint bit, ie it's not okay if we lose that change.  Therefore, each
bit-clearing would have to be WAL-logged.  This is a big part of my
concern about the cost.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] pg_autovacuum w/ dbt2

2005-01-12 Thread Mark Wong
On Wed, Jan 12, 2005 at 09:17:33PM -0500, Tom Lane wrote:
 I notice that the backend seems to have been using some nonstandard C
 code:
 
 Error while reading shared library symbols:
 /home/markw/dbt2/storedproc/pgsql/c/../../../storedproc/pgsql/c/funcs.so: No 
 such file or directory.
 
 What is that, and how much confidence have you got in it?

That's my C stored function library.  I'll attached it if anyone wants
to take a persusal.  Well, it was my first attempt with C stored
functions and SPI calls, so it wouldn't surprise me if it was flawed.
Would supplying the .so help the debugging?

Mark
/*
 * This file is released under the terms of the Artistic License.  Please see
 * the file LICENSE, included in this package, for details.
 *
 * Copyright (C) 2003 Mark Wong  Open Source Development Lab, Inc.
 *
 * Based on TPC-C Standard Specification Revision 5.0 Clause 2.8.2.
 */

#include sys/types.h
#include unistd.h
#include postgres.h
#include fmgr.h
#include executor/spi.h

/*
#define DEBUG
*/

#define DELIVERY_1 \
SELECT no_o_id\n \
FROM new_order\n \
WHERE no_w_id = %d\n \
  AND no_d_id = %d

#define DELIVERY_2 \
DELETE FROM new_order\n \
WHERE no_o_id = %s\n \
  AND no_w_id = %d\n \
  AND no_d_id = %d

#define DELIVERY_3 \
SELECT o_c_id\n \
FROM orders\n \
WHERE o_id = %s\n \
  AND o_w_id = %d\n \
  AND o_d_id = %d

#define DELIVERY_4 \
UPDATE orders\n \
SET o_carrier_id = %d\n \
WHERE o_id = %s\n \
  AND o_w_id = %d\n \
  AND o_d_id = %d

#define DELIVERY_5 \
UPDATE order_line\n \
SET ol_delivery_d = current_timestamp\n \
WHERE ol_o_id = %s\n \
  AND ol_w_id = %d\n \
  AND ol_d_id = %d

#define DELIVERY_6 \
SELECT SUM(ol_amount * ol_quantity)\n \
FROM order_line\n \
WHERE ol_o_id = %s\n \
  AND ol_w_id = %d\n \
  AND ol_d_id = %d

#define DELIVERY_7 \
UPDATE customer\n \
SET c_delivery_cnt = c_delivery_cnt + 1,\n \
c_balance = c_balance + %s\n \
WHERE c_id = %s\n \
  AND c_w_id = %d\n \
  AND c_d_id = %d

#define NEW_ORDER_1 \
SELECT w_tax\n \
FROM warehouse\n \
WHERE w_id = %d

#define NEW_ORDER_2 \
SELECT d_tax, d_next_o_id\n \
FROM district \n \
WHERE d_w_id = %d\n \
  AND d_id = %d\n \
FOR UPDATE

#define NEW_ORDER_3 \
UPDATE district\n \
SET d_next_o_id = d_next_o_id + 1\n \
WHERE d_w_id = %d\n \
  AND d_id = %d

#define NEW_ORDER_4 \
SELECT c_discount, c_last, c_credit\n \
FROM customer\n \
WHERE c_w_id = %d\n \
  AND c_d_id = %d\n \
  AND c_id = %d

#define NEW_ORDER_5 \
INSERT INTO new_order (no_o_id, no_w_id, no_d_id)\n \
VALUES (%s, %d, %d)

#define NEW_ORDER_6 \
INSERT INTO orders (o_id, o_d_id, o_w_id, o_c_id, o_entry_d,\n \
o_carrier_id, o_ol_cnt, o_all_local)\n \
VALUES (%s, %d, %d, %d, current_timestamp, NULL, %d, %d)

#define NEW_ORDER_7 \
SELECT i_price, i_name, i_data\n \
FROM item\n \
WHERE i_id = %d

#define NEW_ORDER_8 \
SELECT s_quantity, %s, s_data\n \
FROM stock\n \
WHERE s_i_id = %d\n \
  AND s_w_id = %d

#define NEW_ORDER_9 \
UPDATE stock\n \
SET s_quantity = s_quantity - %d\n \
WHERE s_i_id = %d\n \
  AND s_w_id = %d

#define NEW_ORDER_10 \
INSERT INTO order_line (ol_o_id, ol_d_id, ol_w_id, ol_number,\n \
ol_i_id, ol_supply_w_id, ol_delivery_d,\n \
ol_quantity, ol_amount, ol_dist_info)\n \
VALUES (%s, %d, %d, %d, %d, %d, NULL, %d, %f, '%s')

#define ORDER_STATUS_1 \
SELECT c_id\n \
FROM customer\n \
WHERE c_w_id = %d\n \
  AND c_d_id = %d\n \
  AND c_last = '%s'\n \
ORDER BY c_first ASC

#define ORDER_STATUS_2 \
SELECT c_first, c_middle, c_last, c_balance\n \
FROM customer\n \
WHERE c_w_id = %d\n \
  AND c_d_id = %d\n \
  AND c_id = %d

#define ORDER_STATUS_3 \
SELECT o_id, o_carrier_id, o_entry_d, o_ol_cnt\n \
FROM orders\n \
WHERE o_w_id = %d\n \
  AND o_d_id = %d \n \
  AND o_c_id = %d\n \
ORDER BY o_id DESC

#define ORDER_STATUS_4 \
SELECT ol_i_id, ol_supply_w_id, ol_quantity, ol_amount,\n \
   ol_delivery_d\n \
FROM order_line\n \
WHERE ol_w_id = %d\n \
  AND ol_d_id = %d\n \
  AND ol_o_id = %s

#define PAYMENT_1 \
SELECT w_name, w_street_1, w_street_2, w_city, w_state, w_zip\n \
FROM warehouse\n \
WHERE w_id = %d

#define PAYMENT_2 \
UPDATE warehouse\n \
SET w_ytd = w_ytd + %f\n \
WHERE w_id = %d

#define