date:20060119

On Thu, Jan 19, 2006 at 02:46:11AM -0500, Tom Lane wrote:
 The concerns that I find more interesting are changes in the underlying
 objects.  We don't have an ALTER OPERATOR CLASS, much less an ALTER
 ACCESS METHOD, but it's certainly theoretically possible to change the
 definition of a support function used by an index.  There isn't
 presently any mechanism to force timely propagation of such a change,
 and so you'd be largely on your own --- but realistically, wouldn't such
 a change require rebuilding the index anyway?

I wondered about the same problem when dealing with the collation
stuff. If you change anything about a collation, you essentially have
to invalidate any indexes, plans or views using it. Like you say,
there's isn't really a way of doing this.

This isn't the first time I've wondered about a flag on the an index
stating broken, pending rebuild. If one of these details changes, we
really need to stop using the indexes ASAP until they're back into a
consistant state. The only question remaining is when to do the
rebuild: you can either wait for manual intervention or (probably
better) simply do the REINDEXes in the same transaction as the ALTER
OPERATOR CLASS et al.

The locking would be painful, perhaps a better way would be to clone
the opclass, clone the indexes with the new opclass, build them and
then drop the old indexes. Once you've rebuilt the views and indexes,
simply delete the old opclass next database restart since plans don't
survive DB restart, right?. (Isn't this Read-Copy-Update style
locking?).

Have a nice day,

-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature

[HACKERS] restrict column-level GRANTs to a single relation?

2006-01-19 Thread kevin brintnall

Fellow hackers,

I'm curious about the best way to handle something like this:

GRANT SELECT (col1, col2, col3) ON table1, table2 TO grantee;

Is it reasonable to restrict this to a single relation, and throw an error
if multiple relations are specified?  That would require the preceding
grant to be specified as:

GRANT SELECT (col1, col2, col3) ON table1 TO grantee;
GRANT SELECT (col1, col2, col3) ON table2 TO grantee;

The SQL standards don't seem to mandate the first form (unless I
misread?)..  Do y'all think this is a reasonable compromise?

-- 
 kevin brintnall =~ [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread David Fetter

On Thu, Jan 19, 2006 at 12:06:41AM -0500, Tom Lane wrote:
 Michael Glaesemann [EMAIL PROTECTED] writes:
  As far as I can tell, the only difference between your position,
  Dann, and Date and Darwen's, is that you think no natural key is
  immutable.
 
 DD's examples of natural keys are worth a second look though:
 
  If a primary key exists for a collection that is known never to
  change, for example social security number, student registration
  number, or employee number, then no additional system-assigned
  UID is required.
 
 The problem with SSN is that somebody other than you controls it.

No, that's not the big problem.  The big problem is that it's very
likely illegal for you to use it for anything unless you happen to be
the Social Security Administration.

 If you are the college registrar, then you control the student's
 registration number, and you don't have to change it.  In fact,
 guess what: you probably generated it in the same way as a surrogate
 key.

True.

 I'd argue that all of these are in reality the exact same thing as a
 surrogate key --- from the point of view of the issuing authority.
 But from anyone else's point of view, they are external data and you
 can't hang your own database design on the assumption that they
 won't change.

Right :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Surrogate keys (Was: enums)

On Thu, Jan 19, 2006 at 12:50:52AM -0800, David Fetter wrote:
 On Thu, Jan 19, 2006 at 12:06:41AM -0500, Tom Lane wrote:
 No, that's not the big problem.  The big problem is that it's very
 likely illegal for you to use it for anything unless you happen to be
 the Social Security Administration.

Actually no. From reading here[1] it appears anyone is allowed to ask
you your SSN and they can do what they like with it.

What you're describing is more like the TFN in Australia. Apart from
the fact you're not required to have one or provide it if asked, if
you're not a bank, or share registrary or some other such institution,
you're not allowed to ask for it, let alone store it. Medicare number
the same, if you're not a health service provider, you can't ask for
it.

Anyway, this doesn't mean an SSN is a good key, for all sorts of other
reasons people have already stated.

[1] http://www.cpsr.org/prevsite/cpsr/privacy/ssn/ssn.faq.html#IsItIllegalToAsk

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature

[HACKERS] Bogus path in postmaster.opts

2006-01-19 Thread Peter Eisentraut

Since 8.0, postmaster.opts has been containing .../bin/postgres even though 
the postmaster was started.  This was evidently broken by some 
Windows-related reshuffling.

Earlier, CreateOptsFile was called with argv, now it's passed the result of 
find_my_exec instead.  I'm not sure whether that change was wrong to begin 
with or whether find_my_exec is mishaving (it should return something 
containing postmaster, no?).

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pgxs/windows

2006-01-19 Thread Andrew Dunstan




Bruce Momjian wrote:



OK, thanks.  Next question --- are the installed file locations the same
for a MinGW install and a pginstaller install?  I don't think
pginstaller does a MinGW install because it doesn't have the build
environment in the tarball.

However, the big difference seems to be that Magnus has -Llib and -Lbin,
while you have only the -Lbin.  I have MinGW and pginstaller installed
here.  How can I set things up to test this?
   



Now looking at the Makefile.global in the 8.1.2 pginstaller install, in
Makefile.global, $libdir is set in a pgxs-specific block:

libdir := $(shell pg_config --libdir)

and that seems to work:

C:\Program Files\PostgreSQL\8.1\binpg_config --libdir
C:/PROGRA~1/POSTGR~1/8.1/lib

and that is set to LDFLAGS, which is later propogated to SHLIB_LINK,
though SHLIB_LINK moves all the -L flags to the front, so what you see
on the link line is not the ordering used to create the value.

Andrew, can you try echoing $libdir and $SHLIB_LINK in the Makefile to
find those values?

 



here is a test case log (including a test makefile). I ran the tests as 
you can see with both installer 8.1 and mingw installed CVS tip, with 
the same results.



cheers

andrew

$ touch foo.c
$ cat Makefile

MODULES = foo
SRCS += foo.c
OBJS = $(SRCS:.c=.o)
PGXS := $(shell pg_config --pgxs)
include $(PGXS)
override CFLAGS := $(filter-out -Wendif-labels 
-Wdeclaration-after-statement, $(shell pg_config --cflags))

showme:
   @echo libdir = $(libdir)
   @echo SHLIB_LINK = $(SHLIB_LINK)
$ which pg_config
/c/Program Files/PostgreSQL/8.1/bin/pg_config
$ rm -f foo.dll
$ make
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
-fno-strict-aliasing -I. -Ic:/PROGRA~1/POSTGR~1/8.1/include/server 
-Ic:/PROGRA~1/POSTGR~1/8.1/include/internal -I./src/include/port/win32 
-DEXEC_BACKEND  -I/mingw/include/krb5 
-Ic:/PROGRA~1/POSTGR~1/8.1/lib/pgxs/src/MAKEFI~1/../../src/include/port/win32 
-Ic:/PROGRA~1/POSTGR~1/8.1/include/server/port/win32  -c -o foo.o foo.c

dlltool --export-all --output-def foo.def foo.o
dllwrap -o foo.dll --def foo.def foo.o 
c:/PROGRA~1/POSTGR~1/8.1/lib/pgxs/src/MAKEFI~1/../../src/utils/dllinit.o 
-Lc:/PROGRA~1/POSTGR~1/8.1/bin -lpostgres
c:\mingw\bin\..\lib\gcc-lib\mingw32\3.2.3\..\..\..\..\mingw32\bin\ld.exe: 
cannot find -lpostgres

c:\mingw\bin\dllwrap.exe: c:\mingw\bin\gcc exited with status 1
make: *** [foo.dll] Error 1
rm foo.o
$ make showme
libdir = c:/PROGRA~1/POSTGR~1/8.1/lib
SHLIB_LINK = -Lc:/PROGRA~1/POSTGR~1/8.1/bin -lpostgres
$ export PATH=/usr/local/pgsql/bin:$PATH
$ which pg_config
/usr/local/pgsql/bin/pg_config
$ rm -f foo.dll
$ make
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
-fno-strict-aliasing -I. -IC:/msys/1.0/local/pgsql/include/server 
-IC:/msys/1.0/local/pgsql/include/internal -I./src/include/port/win32 
-DEXEC_BACKEND  
-IC:/msys/1.0/local/pgsql/lib/pgxs/src/MAKEFI~1/../../src/include/port/win32 
-IC:/msys/1.0/local/pgsql/include/server/port/win32  -c -o foo.o foo.c

dlltool --export-all --output-def foo.def foo.o
dllwrap -o foo.dll --def foo.def foo.o 
C:/msys/1.0/local/pgsql/lib/pgxs/src/MAKEFI~1/../../src/utils/dllinit.o 
-LC:/msys/1.0/local/pgsql/bin -lpostgres
c:\mingw\bin\..\lib\gcc-lib\mingw32\3.2.3\..\..\..\..\mingw32\bin\ld.exe: 
cannot find -lpostgres

c:\mingw\bin\dllwrap.exe: c:\mingw\bin\gcc exited with status 1
make: *** [foo.dll] Error 1
rm foo.o
$ make showme
libdir = C:/msys/1.0/local/pgsql/lib
SHLIB_LINK = -LC:/msys/1.0/local/pgsql/bin -lpostgres


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Unique constraints for non-btree indexes

On Wed, Jan 18, 2006 at 05:27:29PM -0500, Tom Lane wrote:
 Right, the deadlock risk is exactly the reason you need some secret
 sauce or other.  Btree's page-level lock ensures that two insertions of
 conflicting keys can't overlap (even if they ultimately get stored on
 different pages).  That's not the only way to fix this but it's a pretty
 good way.

Ok, I'm not that great with locking issues, but how about this:

1. In the root page of the GiST index you store a counter, let's call
it the Insertion ID (IID).
2. An index tuple has a state where it is not visible yet and contains
an IID.
3. When you go to insert, you follow the normal GiST method all the way
down to the leaf node.
4. Once you know where you're going to put it, take an exclusive lock
on the page.
5. Now, take an exclusive lock on the root page, read the IID (this one
is yours) and store the next one back in the root page.
6. Now write the index tuple to the leaf page recording your IID.
7. Release the root page lock, then the leaf page lock.

The thing to note is that once someone has gotten a particular IID, all
IIDs that are less than it will already have been written to the index
leaf pages. Also, there should be no risk of deadlock since we lock the
pages in a consistant order.

8. Do a normal index scan with your ~ operator. Ignore any provisional
tuples with IID greater than yours. If a match has IID less than yours
you may block on it using the same logic as for b-tree indexes.
9. Once the scan in completed and you know that there are no duplicates
with your IID or less, make the tuple fully visible (for your
transaction anyway). At this point the IID is no longer relevent and
you can forget it.

What I tried to acheive was avoiding holding any locks while doing
scans, since for GiST indexes they may cover a lot of ground.

Perhaps a better name for IID is Generation ID?

Storing the IID could take extra space, but you could probably overlap
it with the ctid since no-one's going to lookup the data tuple before
it's fully visible.

OTOH, if the backend dies before completing the process, how does one
clean up the provisional index tuple? The ctid would provide a way for
VACUUM to know when to remove it.

 BTW, the deadlock risk also applies to deferred uniqueness checks.
 Again, in btree it's possible to avoid this if you do a fresh indexscan
 (and take a lock on the first scanned page while you do that).  If you
 try to do it without consulting the index then you need some other way
 to break ties.

I think the above should work for deferred checks as well, as long as
you store the list as tuples to check for each inserted tuple. These
lists should form a directed acyclic graph so there should be no risk
of deadlock. Conflicts with VACUUM is still an issue.

Any thoughts?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread Jonah H. Harris

As an Oracle internals person myself, I don't see how making a comparison between the specifics of Oracle's MVCC to PostgreSQL's MVCC is relevant to this discussion.As does *MOST* other commercial databases, Oracle's storage manager performs an update-in-place whereas PostgreSQL's (for the most part) does not. There are several ways to implement update-in-place, and Oracle has chosen their own rollback segment methodology which has issues that without tuning, are major hassles. I'm not saying that one is better than the other in ALL cases,
but I and many other Oracle consultants have tuned Oracle installations to eliminate the headaches others in this
list have mentioned. Any knowledgable Oracle person evaluating PostgreSQL that may be reading this list is just going to see it as a lot of anti-Oracle discussion with no basis in fact.
Regardless, there is NO WAY to perform an apples-to-apples comparison between the implementations, locking strategies, etc. as the MVCC implementations and goals are completely different. In short, Oracle's implementation is not perfect; neither is ours. Oracle's initial design (as a commercial database) is much different than PostgreSQL's (as a research database).
While I'm always game for looking at other implementations when designing and discussing new features for PostgreSQL, let's focus on making PostgreSQL better rather than spending time discussing unrealistic comparisons.
If we want to do a comparison on the how/why Oracle's index implementation is faster in the context of this situation and how we could make PostgreSQL's faster, let's stick to that.
On 1/19/06, Tom Lane [EMAIL PROTECTED] wrote:
Greg Stark [EMAIL PROTECTED] writes: I wonder if the bitmap can actually be one bit per page actually.Yeah, I think we'd agreed that per-page was the way to go.Per-tuple
bitmaps are painful to manage because of the variable number of tuplesper page.And really all you need to know is whether to read the pageor not --- once you have, examining multiple tuples on it doesn't cost
much.regards, tom lane---(end of broadcast)---TIP 6: explain analyze is your friend

Re: [HACKERS] restrict column-level GRANTs to a single relation?

2006-01-19 Thread Peter Eisentraut

Am Donnerstag, 19. Januar 2006 09:50 schrieb kevin brintnall:
 GRANT SELECT (col1, col2, col3) ON table1, table2 TO grantee;

 Is it reasonable to restrict this to a single relation, and throw an error
 if multiple relations are specified?

Yes

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread Pollard, Mike

Martijn van Oosterhout wrote:

 Please provides natural keys for any of the following:
 
 - A Person
 - A phone call: (from,to,date,time,duration) is not enough
 - A physical address
 - A phone line: (phone numbers arn't unique over time)
 - An internet account: (usernames not unique over time either)

Ahh, a challenge.  Hmm, not sure about all of them, but here goes:

A Person - well, you could use a bit map of their fingerprints, or maybe
their retinal scan.  Of course, that could change due to serious injury.
Maybe some kind of representation of their DNA?

A physical address - how about longitude/latitude/height from sea level?

The point here is two-fold.  First, what we call 'natural' is frequently
itself a surrogate key (yes, even your name is really just a surrogate
key.  As with all surrogate keys, it is a sequence of symbols that you
use to represent yourself).  The second point is even when you find a
truly 'natural' key (something not arbitrarily made up by anyone, and
uniquely identifying the data in question), it may be completely and
utterly inappropriate to use in a database.

What is 'natural' anyway?  If someone phones in an order, we usually
assign an order number to that request.  This order number is not the
actual order, and the customer couldn't care a less what it is, but I've
never heard a DBA argue we should get rid of it (well, to be fair, I've
never discussed order numbers with a DBA at all).  After all, would it
make sense for the key for that order to be the customer's name, the
date/time of the order, all the items ordered, and the address to ship
the order?  That isn't a key, but it's the only 'natural' thing that
identifies that order that immediately comes to my mind.

On the other hand, would anyone argue that an order_item table should
have a surrogate key?  Well, I wouldn't.  The key for the order_item
table should be something like the order number and the inventory item
number together (IMHO).

The point?  Surrogate keys and natural keys are two tools in the
database arsenal.  Just as it is unwise to use a hammer to drive a screw
just because you don't believe in screwdrivers, it is unwise to just off
hand discard either method of specifying a key.  Rather, use
intelligence and education (one of which is discussions such as this) in
deciding how best to represent your data to aide in performance, ease of
use, and adaptability.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread mark

On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote:
 Martijn van Oosterhout wrote:
  Please provides natural keys for any of the following:
  - A Person
  - A phone call: (from,to,date,time,duration) is not enough
  - A physical address
  - A phone line: (phone numbers arn't unique over time)
  - An internet account: (usernames not unique over time either)
 Ahh, a challenge.  Hmm, not sure about all of them, but here goes:
 A Person - well, you could use a bit map of their fingerprints, or maybe
 their retinal scan.  Of course, that could change due to serious injury.
 Maybe some kind of representation of their DNA?

Yes. Representation of the DNA is probably best. But - that's a lot of
data to use as a key in multiple tables. :-)

 A physical address - how about longitude/latitude/height from sea level?

Planet? Solar system? Galaxy? Universe? :-)

I agreed with what you had to say (the stuff I deleted). Just felt
like being funny. Not sure if I'm successful. Hehe...

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] restrict column-level GRANTs to a single relation?

kevin brintnall [EMAIL PROTECTED] writes:
 GRANT SELECT (col1, col2, col3) ON table1, table2 TO grantee;

 Is it reasonable to restrict this to a single relation, and throw an error
 if multiple relations are specified?

The SQL spec doesn't actually allow multiple things after GRANT ... ON
--- that's a PG extension.  So you could make the restriction and not
violate the spec.  OTOH it seems unlikely that this would save much.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] No heap lookups on index

On Thu, Jan 19, 2006 at 01:56:51AM -0500, Greg Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Christopher Kings-Lynne [EMAIL PROTECTED] writes:
   Oracle does, but you pay in other ways. Instead of keeping dead tuples
   in the main heap, they shuffle them off to an 'undo log'. This has some
   downsides:
   Rollbacks take *forever*, though this usually isn't much of an issue
   unless you need to abort a really big transaction.
  
   It's a good point though.  Surely a database should be optimised for the 
   most common operation - commits, rather than rollbacks?
  
  The shuffling off of the data is expensive in itself, so I'm not sure
  you can argue that the Oracle way is more optimal for commits either.
 
 You pay in Oracle when you read these records too. If there are pending
 updates you have to do a second read to the rollback segment to get the old
 record. This hits long-running batch queries especially hard since by the time
 they finish a large number of the records they're reading could have been
 updated and require a second read to the rollback segments.

You pay the same cost in PostgreSQL though... If you index-scan to a
dead tuple, you get pointed to where the new one is. And if you're
seqscanning, well, you'll be reading everything anyway.

 You also pay if the new value is too big to fit in the same space as the old
 record. Then you get to have to follow a pointer to the new location. Oracle
 tries to minimize that by intentionally leaving extra free space but that has
 costs too.

Again, similar to the cost with our MVCC.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 8.0.5 Bug in unique indexes?

2006-01-19 Thread Joshua D. Drake


Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:
  

Odd problem with unique indexes:



What's the database's locale?  This could be the same problem fixed in
8.0.6, if the locale has weird ideas about what string equality means.
  


lc_collate | C
lc_ctype   | C
lc_messages| en_US.UTF-8
lc_monetary| en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time| en_US.UTF-8

Sincerely,

Joshua D. Drake

regards, tom lane
  



--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] 8.0.5 Bug in unique indexes?

Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 What's the database's locale?  This could be the same problem fixed in
 8.0.6, if the locale has weird ideas about what string equality means.

  lc_collate | C
  lc_ctype   | C

OK, scratch that theory.  Don't suppose you can create a reproducible
test case ;-)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] 8.0.5 Bug in unique indexes?

2006-01-19 Thread Joshua D. Drake




 lc_collate | C
 lc_ctype   | C


OK, scratch that theory.  Don't suppose you can create a reproducible
test case ;-)

That may be a bit tough... What really struck me is that the
duplication only occurs in this set of 100 tables and the
duplication is always violating the same index. We currently
have 4-5 tables that are in violation.

Let me see what I can do to duplicate this.

Sincerely,

Joshua D. Drake



regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Surrogate keys

2006-01-19 Thread Chris Browne

[EMAIL PROTECTED] writes:

 On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote:
 Martijn van Oosterhout wrote:
  Please provides natural keys for any of the following:
  - A Person
  - A phone call: (from,to,date,time,duration) is not enough
  - A physical address
  - A phone line: (phone numbers arn't unique over time)
  - An internet account: (usernames not unique over time either)
 Ahh, a challenge.  Hmm, not sure about all of them, but here goes:
 A Person - well, you could use a bit map of their fingerprints, or maybe
 their retinal scan.  Of course, that could change due to serious injury.
 Maybe some kind of representation of their DNA?

 Yes. Representation of the DNA is probably best. But - that's a lot of
 data to use as a key in multiple tables. :-)

That is arguably about the best choice possible, for a human being, as
DNA isn't supposed to be able to change (much).  

Mind you, there do exist odd cases where a person might have two sets
of DNA in different parts of their body.  This commonly (well, it's
not really very common...) happens when non-identical twins share a
blood supply; that twins were involved may not be noticed if one does
not survive to birth...

 A physical address - how about longitude/latitude/height from sea level?

 Planet? Solar system? Galaxy? Universe? :-)

 I agreed with what you had to say (the stuff I deleted). Just felt
 like being funny. Not sure if I'm successful. Hehe...

Well, that's useful for representing a key for a piece of real estate.

It's fairly interestingly useless for representing a human attribute,
at least in terms of being a primary key...
-- 
output = (cbbrowne @ acm.org)
http://cbbrowne.com/info/linux.html
It's a pretty rare beginner who isn't clueless.  If beginners weren't
clueless, the infamous Unix learning cliff wouldn't be a problem.
-- david parsons

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] 8.0.5 Bug in unique indexes?

2006-01-19 Thread Michael Paesold


Joshua D. Drake wrote:


Tom Lane wrote:

What's the database's locale?  This could be the same problem fixed in
8.0.6, if the locale has weird ideas about what string equality means.



lc_collate | C
lc_ctype   | C


You don't user pl/perl, do you -- i.e. I guess you read the latest release 
notes and the thread here before that?


Best Regards,
Michael 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread Greg Stark


Jonah H. Harris [EMAIL PROTECTED] writes:

 As an Oracle internals person myself, I don't see how making a comparison
 between the specifics of Oracle's MVCC to PostgreSQL's MVCC is relevant to
 this discussion.
 
 As does *MOST* other commercial databases, Oracle's storage manager performs
 an update-in-place whereas PostgreSQL's (for the most part) does not.  There
 are several ways to implement update-in-place, and Oracle has chosen their
 own rollback segment methodology which has issues that without tuning, are
 major hassles.  I'm not saying that one is better than the other in ALL
 cases, but I and many other Oracle consultants have tuned Oracle
 installations to eliminate the headaches others in this list have
 mentioned.  Any knowledgable Oracle person evaluating PostgreSQL that may be
 reading this list is just going to see it as a lot of anti-Oracle discussion
 with no basis in fact.

 Regardless, there is NO WAY to perform an apples-to-apples comparison
 between the implementations, locking strategies, etc. as the MVCC
 implementations and goals are completely different.  
...

Well it seems there were lots of facts posted. Yes you can avoid headaches
caused by these issues, but we're not really talking about the headaches. 

We're comparing the performance costs of what are update-in-place and
non-update-in-place approach. All of the costs named so far are to some degree
fundamental costs of update-in-place. All you can hope to do in tuning a
system is make sure the costs are kept within manageable bounds. 

There are fundamental costs to non-update-in-place as well. The table sizes
are bloated by the amount of space used to store older versions and the dead
tuples that haven't been reused yet. Whether this slows down Postgres as much
as having to do a second (or third or fourth) read to a rollback segment is a
valid area for discussion. It's especially interesting to discuss since the
two costs hit different sets of queries unequally.

 If we want to do a comparison on the how/why Oracle's index implementation
 is faster in the context of this situation and how we could make
 PostgreSQL's faster, let's stick to that.

Well the main difference is the MVCC implementation. Talking about Oracle's
index implementation while avoiding mentioning the elephant in the room would
be sort of pointless.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bogus path in postmaster.opts

Peter Eisentraut [EMAIL PROTECTED] writes:
 Earlier, CreateOptsFile was called with argv, now it's passed the result of 
 find_my_exec instead.  I'm not sure whether that change was wrong to begin 
 with or whether find_my_exec is mishaving (it should return something 
 containing postmaster, no?).

find_my_exec is not misbehaving: it's designed to expand symlinks, and
would in fact be pretty useless if it did not.  We could go back to
storing the raw value of argv[0] in the opts file, but I rather like
having the full path in there.

There is another possible answer, and it's something I've been meaning
to bring up for awhile.  Is there a good reason why postmaster is a
symlink to postgres, rather than a hard link?  Surely the symlink way
is not any more efficient.  The thing that ticks me off about this
is that it confuses gdb: if you say gdb postmaster and then run,
what gets passed as argv[0] is postgres, thus breaking your test.
So I not infrequently find myself manually replacing the symlink with
a hard link in order to be able to test.  If it'd result in nicer
output in the opts file too, then all the more reason to change.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread Jonah H. Harris

On 19 Jan 2006 11:25:21 -0500, Greg Stark [EMAIL PROTECTED] wrote:
Well it seems there were lots of facts posted. Yes you can avoid headachescaused by these issues, but we're not really talking about the headaches.Several were mentioned; some of which could generally be avoided by good tuning.
We're comparing the performance costs of what are update-in-place andnon-update-in-place approach.
As PostgreSQL is not an update-in-place system, what is the point in discussing the costs? How does this solve David's original problem?
There are fundamental costs to non-update-in-place as well. The table sizesare bloated by the amount of space used to store older versions and the deadtuples that haven't been reused yet. Whether this slows down Postgres as much
as having to do a second (or third or fourth) read to a rollback segment is avalid area for discussion. It's especially interesting to discuss since thetwo costs hit different sets of queries unequally.
I agree, but again, we're not talking apples-to-apples. There's far too many variables to compare Oracle's speed to PostgreSQL's for most types of operations in the varying types of database deployments.
Well the main difference is the MVCC implementation. Talking about Oracle'sindex implementation while avoiding mentioning the elephant in the room would
be sort of pointless.I agree that Oracle's MVCC plays *a little* into this index discussion, but isn't it pointless to discuss the pitfalls of an MVCC implementation that PostgreSQL does not have? Similarly, how does it solve David's original question.
Again, I'm fine with discussing these things, but let's keep on topic for David's sake. He posted a problem that we have discussed many times over. Let's focus on that problem and give him possible options.
David has stated that the index to heap visibility check is slowing him down, so what are the possible options:- Visibility in indexes (-hackers archives cover the pros/cons)- True organized heaps- Block level index (Tom/Simon's earlier discussion)

Re: [HACKERS] Surrogate keys (Was: enums)

On Thu, Jan 19, 2006 at 09:37:12AM -0500, Pollard, Mike wrote:
 The point?  Surrogate keys and natural keys are two tools in the
 database arsenal.  Just as it is unwise to use a hammer to drive a screw
 just because you don't believe in screwdrivers, it is unwise to just off
 hand discard either method of specifying a key.  Rather, use
 intelligence and education (one of which is discussions such as this) in
 deciding how best to represent your data to aide in performance, ease of
 use, and adaptability.

There is one thing to consider: consistency. If you mix and match
'natural' keys and surrogate keys as PK, then how do you know which one
you're supposed to be joining on? How does everyone else on the team
know?

Sure, there's many examples where you don't really need a surrogate key.
But there's just as many (if not more) where you want a surrogate key so
that you don't have to deal with the pain of a multiple-field key. (Note
that I don't consider simply defining a multiple-field key to be unique
as painful). So ISTM it's much easier to just use surrogate keys and be
done with it. Only deviate when you have a good reason to do so.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Bogus path in postmaster.opts

2006-01-19 Thread Peter Eisentraut

Tom Lane wrote:
 find_my_exec is not misbehaving: it's designed to expand symlinks,
 and would in fact be pretty useless if it did not.

I don't want to contest that in certain cases this is required but I can 
easily come up with scenarios (which perhaps no PostgreSQL user has 
encountered yet) where the currently behavior is broken.  One example 
is a GNU Stow like installation management where each package is 
installed in a private directory and the canonical locations 
in /usr/local are symlinks.  (It's altogether strange that this would 
distinguish between symbolic and hard links anyway, except that of 
course it cannot actually resolve hard links, since many installation 
schemes that one needs to cope with work the same with hard and soft 
links.)

 There is another possible answer, and it's something I've been
 meaning to bring up for awhile.  Is there a good reason why
 postmaster is a symlink to postgres, rather than a hard link?

I don't know of one.  Something I have thought of during the recent 
options reorganization is that we could do away with the 
postmaster/postgres dichotomy altogether.  Just call the thing 
postmaster and give it a --single-user-mode option.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread Josh Berkus


Martjin,


In any of these either misspellings, changes of names, ownership or
even structure over time render the obvious useless as keys. There are
techniques for detecting and reducing duplication but the point is that
for any of these duplicates *can* be valid data.


Please point me out where, in the writings of E.F. Codd or in the SQL 
Standard, it says that keys have to be immutable for the life of the row.


Duplicate *values* can be valid data.  Duplicate *tuples* show some 
serious flaws in your database design.  If you have a personnel 
directory on which you've not bothered to define any unique constraints 
other than the ID column, then you can't match your data to reality.  If 
you have two rows with the same first and last name, you don't know if 
they are two different people or the same person, duplicated.  Which 
will be a big problem come paycheck time.


Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising a 
set of values definining a *unique* data entity.  i.e. The employeee 
named John Little at extension 4531.  There is nothing anywhere 
said about keys never changing.


This is Databases 101 material.  Really!

--Josh


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread Josh Berkus


Jim,

 So ISTM it's much easier to just use surrogate keys and be

done with it. Only deviate when you have a good reason to do so.


The lazy man's guide to SQL database design, but Jim Nasby.

;-)

--Josh


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread Josh Berkus


Jonah,

David has stated that the index to heap visibility check is slowing him 
down, so what are the possible options:


- Visibility in indexes (-hackers archives cover the pros/cons)
- True organized heaps
- Block level index (Tom/Simon's earlier discussion)


also
  - Frozen relations

This last solution was proposed as a possibility for the data 
warehousing case.  For a time-partitioned table, we're going to know 
that all but one of the partitions has not been updated anywhere within 
visible transaction scope, and therefore index-only access is a possibility.


also
  - join tables

One of the other most valuable targets for index-only access is the 
many-to-many join table whose primary key consists of two (or more) 
foreign keys to two (or more) other tables.  It's actually not necessary 
to check visibility on this kind of table as the visibility of tuples in 
the join table will be determined by the visibility of tuples in the two 
data tables.  Since often join tables consist *only* of the join key, 
being able to do index-only access on them could dramatically speed up 
certian kinds of queries.


Both of the above are corner cases but are very common ones and might 
be much easier to implement than the other solutions.


--Josh


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] 8.0.5 Bug in unique indexes?

2006-01-19 Thread Joshua D. Drake


Michael Paesold wrote:

Joshua D. Drake wrote:


Tom Lane wrote:

What's the database's locale?  This could be the same problem fixed in
8.0.6, if the locale has weird ideas about what string equality means.



lc_collate | C
lc_ctype   | C


You don't user pl/perl, do you -- i.e. I guess you read the latest 
release notes and the thread here before that?


Yes I did. I didn't know that the person was running plPerl. I have 
verified that they are. We are now going to check if upgrading to 8.0.6

with a deletion of the duplicates and a reindex resolves the issue.

Sincerely,

Joshua D. Drake




Best Regards,
Michael


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread David Scott


Thanks for all the help and thought to our problem.

Jonah H. Harris wrote:



David has stated that the index to heap visibility check is slowing 
him down, so what are the possible options:


- Visibility in indexes (-hackers archives cover the pros/cons)
- True organized heaps
- Block level index (Tom/Simon's earlier discussion)


Several people seem to have a problem with full index visibility, 
however both last thread and this thread people seem to agree that 
keeping a page level visibility would be an acceptable compromise.  
Perhaps we should implement a rough patch and run some time stats to 
post for everyone to weigh the pros and cons against hard figures?



The discussion on block level indexing and true organized heaps, if I am 
interpreting this topic correctly, would essentially keep the table 
optimized for accessing the data through one index?  This seems like a 
very interesting idea, but I don't know that it would really solve our 
current problem because we need to use more then one index to 
efficiently access a table.


We have a very driving need to solve this problem and so don't mind 
doing legwork to do feasibility testing or stat collection to help the 
community decide the best solution for Postgres.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread David Scott


Tom Lane wrote:


What sort of problems are you dealing with exactly?  There has been
some discussion of changes that would improve certain scenarios.  For
instance it might be plausible to do joins using index information and
only go back to the heap for entries that appear to pass the join test.

 



   We tried that scenario, writing a dirty index hack to experiment 
with, that returned values whether they were valid or not.  We saw some 
definite improvements inside of joins and sub queries, but we were still 
slowed down at the end because we still had to validate every row being 
returned.


   My hands are very tied as to what specific examples I can send, so I 
apologize for how long it took to get back to you on this.  A simple 
(generalized) example of one the types of queries we are running:


   SELECT col1, col2, cool_func(stat_count, 
COALESCE(raw_counts.raw_count, (SELECT alt_count FROM alt_raw_table 
WHERE alt_raw_table.pk = col2))) as cool_func


   FROM
   (SELECT col1, col2, stat_count FROM pair_table WHERE col1 = $1) 
pair_table

   LEFT JOIN
   raw_counts
   ON pair_table.col2 = raw_counts.pk
 
We tried not validating the return of both of these as we only want to 
see the rows which have a high value for cool_func, but it was still 
necessary to validate the rows which did match the criteria.  So we did 
see an improvement, but not enough.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread mark

On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote:
  So ISTM it's much easier to just use surrogate keys and be
 done with it. Only deviate when you have a good reason to do so.
 The lazy man's guide to SQL database design, but Jim Nasby.
 ;-)

Hehe... I was thinking the same thing. I've definately seen cases
where the use of surrogate keys verges on ridiculous. It hasn't
harmed the application, except it terms of complexity. It still works.
It still performs fine. The SQL queries are awful looking. :-)

That's where I would tend to draw the line. For me, I find
implementation and maintenance to be the most expensive part of my
applications. My data hasn't yet become large enough to make disk
space, compute resources, or I/O bandwidth a serious concern.

If I think the use of surrogate keys may make my life harder, I'll try
not to use them. If I think they may make my life easier, I'll use
them without blinking an eye. Harder vs. easier = cost to implement.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] tsearchd (tsearch2 daemon) is available for playing

2006-01-19 Thread Oleg Bartunov


Hi there,

we did a 8.1 compatible version of tsearchd, available from 
http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch2


tsearchd is our old experiment with inverted index. It's
fully compatible with tsearch2, actually it's tsearch2+several functions
and daemon. Very brief documentation is available
http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearchd

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Josh Berkus
 Sent: Thursday, January 19, 2006 10:09 AM
 To: Martijn van Oosterhout
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Surrogate keys (Was: enums)

 Martjin,

  In any of these either misspellings, changes of names, ownership or
  even structure over time render the obvious useless as keys. There
are
  techniques for detecting and reducing duplication but the point is
that
  for any of these duplicates *can* be valid data.

 Please point me out where, in the writings of E.F. Codd or in the SQL
 Standard, it says that keys have to be immutable for the life of the
row.

Only do that for data that you care about.  If you think that the data
has no value, there is no need to have a way to identify a row.

 Duplicate *values* can be valid data.  Duplicate *tuples* show some
 serious flaws in your database design.  If you have a personnel
 directory on which you've not bothered to define any unique
constraints
 other than the ID column, then you can't match your data to reality.
If
 you have two rows with the same first and last name, you don't know if
 they are two different people or the same person, duplicated.  Which
 will be a big problem come paycheck time.

 Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising
a
 set of values definining a *unique* data entity.  i.e. The employeee
 named John Little at extension 4531.  There is nothing anywhere
 said about keys never changing.

 This is Databases 101 material.  Really!

I give it an 'F.'

When the data changes, the problems generated are not just due to
repercussions related to the child and parent tables related through the
primary key.

Someone has an invoice, and they call in with a question. A combination
of their name and address was used as a primary key.  They moved, and
sent in a forwarding address.  The DBA was smart enough to design the
database to cascade results, so that there are no orphan records and we
have not compromised the structure of the database.
The customer calls in with a question about an old invoice.
We have no record of that transaction.

I was a DBA for a database for a company with many millions of customers
worldwide (e.g. the product registration table was 24 GB).

Their design had natural keys in it.  It caused dozens of problems,
every single day.

I content that most people are not smart enough to decide when a natural
key is a good idea.  The engineers that designed the database were
probably pretty smart, since it sort of worked and had thousands of
tables and hundreds of millions of rows in it.  But one bad decision on
a natural key will cause literally millions of dollars of damage.

The primary defense I have heard so far is that the Oids are hard to
understand.  They are nothing in comparison with understanding what to
do when you have 25 changes to primary keys on various tables every
single day.

Once you get used to Oids, I find it hard to believe that any
intelligent person finds them confusing.  Confusion resulting from
having primary keys that are a moving target?  Now that's confusion for
you.

IMO-YMMV.

I think it is time for me to give it a rest, though.  My experience may
be very atypical and I feel strangely passionate about it.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] un-vacuum?

2006-01-19 Thread Michael Fuhr

On Thu, Jan 19, 2006 at 02:25:15PM -0500, uwcssa wrote:
 i have a table that is already vacuumed.  for some reason i want
 to un-vacuum it instead of dropping the table and recreate the table
 and indexes on it.  is there a existing command to do so?

What effect do you want this un-vacuum to have?   What problem are
you trying to solve?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] un-vacuum?

2006-01-19 Thread Jaime Casanova

On 1/19/06, uwcssa [EMAIL PROTECTED] wrote:
 I have a simple question here, not sure if i should posted here but
 if you have the quick answer, it helps a lot

 i have a table that is already vacuumed.  for some reason i want
 to un-vacuum it instead of dropping the table and recreate the table
 and indexes on it.  is there a existing command to do so?


can you explain yourself a bit better?
vacuum is good, why do you think you want to undo it?
why do you think that drop and create will undo vacuum?

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] un-vacuum?

2006-01-19 Thread Hannu Krosing

Ühel kenal päeval, N, 2006-01-19 kell 14:25, kirjutas uwcssa:
 I have a simple question here, not sure if i should posted here but
 if you have the quick answer, it helps a lot
 
 i have a table that is already vacuumed.  for some reason i want
 to un-vacuum it instead of dropping the table and recreate the table
 and indexes on it.  is there a existing command to do so?

What exactly are you tryingto achieve ?

-
Hannu



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Surrogate keys (Was: enums)

On Thu, Jan 19, 2006 at 02:01:14PM -0500, [EMAIL PROTECTED] wrote:
 On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote:
   So ISTM it's much easier to just use surrogate keys and be
  done with it. Only deviate when you have a good reason to do so.
  The lazy man's guide to SQL database design, but Jim Nasby.
  ;-)
 
 Hehe... I was thinking the same thing. I've definately seen cases
 where the use of surrogate keys verges on ridiculous. It hasn't
 harmed the application, except it terms of complexity. It still works.
 It still performs fine. The SQL queries are awful looking. :-)

Got an example?

 That's where I would tend to draw the line. For me, I find
 implementation and maintenance to be the most expensive part of my
 applications. My data hasn't yet become large enough to make disk
 space, compute resources, or I/O bandwidth a serious concern.

Which is exactly what my thought process is. If you mix surrogate and
non-surrogate keys, how do you know which table has which? Sure, while
you're actively writing the code it's not an issue, but what about 6
months later? What about if someone else picks up the code?

I know Josh was poking fun with his comment about me being lazy, but
lazy can make for better code. I can go back to code I wrote 3 years ago
and I know that 99% of tables will have something_id (where something is
almost certain to be the table name) as a surrogate key to join on;
there's no need for me to go and figure out what does and what doesn't
have a surrogate key. The 1% that don't fall into that generally aren't
an issue because they're normally very large tables that nothing joins
to.

There's actually an article floating around somewhere about how lazy
coders are good coders... :)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Surrogate keys (Was: enums)

On Thu, Jan 19, 2006 at 11:22:24AM -0800, Dann Corbit wrote:
   In any of these either misspellings, changes of names, ownership or
   even structure over time render the obvious useless as keys. There
 are
   techniques for detecting and reducing duplication but the point is
 that
   for any of these duplicates *can* be valid data.
  
  Please point me out where, in the writings of E.F. Codd or in the SQL
  Standard, it says that keys have to be immutable for the life of the
 row.
 
 Only do that for data that you care about.  If you think that the data
 has no value, there is no need to have a way to identify a row.

Erm... if you don't care, why are you storing it? :)

 I was a DBA for a database for a company with many millions of customers
 worldwide (e.g. the product registration table was 24 GB).
 
 Their design had natural keys in it.  It caused dozens of problems,
 every single day.
 
 I content that most people are not smart enough to decide when a natural
 key is a good idea.  The engineers that designed the database were
 probably pretty smart, since it sort of worked and had thousands of
 tables and hundreds of millions of rows in it.  But one bad decision on
 a natural key will cause literally millions of dollars of damage.
 
 The primary defense I have heard so far is that the Oids are hard to
 understand.  They are nothing in comparison with understanding what to
 do when you have 25 changes to primary keys on various tables every
 single day.
 
 Once you get used to Oids, I find it hard to believe that any
 intelligent person finds them confusing.  Confusion resulting from
 having primary keys that are a moving target?  Now that's confusion for
 you.

Well, I wouldn't use OIDs as in the PostgreSQL OID, but I agree. If
nothing else an ID gives you a fallback... if you absolutely can't find
a customer (or whatever else) through natural keys, you ask them for
their customer ID/number, which has no reason to ever change.

BTW, if you want to see a mess*, take a look at the distributed.net stats
code, which unfortunately uses email as the means to identify
participants. It made perfect sense originally, anyone running the
client was bound to have an email address, and they all had to be
unique, right? Worked great until the first person contacted us
wondering how to change his email address in stats because he'd changed
ISPs. If you look at todays statscode (at least the database portion of
it) approximately 50% of it is there to deal with people retiring one
email address into another, and I'd say that 90%+ of the bugs are in
this code. Had we just required new users to register to get a nice
shiny unique numeric ID (or a unique username...), none of that code
would exist.

* note that I'm not trying to rag on any of the numerous people who've
been involved in the stats code over the years, but it is insightful to
look at some of the 'dumb mistakes' that have been made and the large
amount of pain that it's caused.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] No heap lookups on index

On Thu, Jan 19, 2006 at 10:19:01AM -0800, Josh Berkus wrote:
 One of the other most valuable targets for index-only access is the 
 many-to-many join table whose primary key consists of two (or more) 
 foreign keys to two (or more) other tables.  It's actually not necessary 
 to check visibility on this kind of table as the visibility of tuples in 
 the join table will be determined by the visibility of tuples in the two 
 data tables.  Since often join tables consist *only* of the join key, 
 being able to do index-only access on them could dramatically speed up 
 certian kinds of queries.

How would that handle 'delinking' item A from foobaz 2? (IE: DELETE FROM
join_table WHERE id1=231 and id2=24842)

The only way I can see this working is if it is required that items in
both tables as well as the link in the many-many table are only inserted
and deleted in the same transaction, which seems to be really pushing
this into corner-case territory.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] No heap lookups on index

On Thu, Jan 19, 2006 at 10:35:30AM -0800, David Scott wrote:
 Tom Lane wrote:
 
 What sort of problems are you dealing with exactly?  There has been
 some discussion of changes that would improve certain scenarios.  For
 instance it might be plausible to do joins using index information and
 only go back to the heap for entries that appear to pass the join test.

Do you still have that patch that folks could look at? ISTM that this
technique would be rather dependant on your actual workload, and as such
could result in a big win for certain types of queries.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread mark

On Thu, Jan 19, 2006 at 01:48:18PM -0600, Jim C. Nasby wrote:
 On Thu, Jan 19, 2006 at 02:01:14PM -0500, [EMAIL PROTECTED] wrote:
  On Thu, Jan 19, 2006 at 10:11:51AM -0800, Josh Berkus wrote:
So ISTM it's much easier to just use surrogate keys and be
   done with it. Only deviate when you have a good reason to do so.
   The lazy man's guide to SQL database design, but Jim Nasby.
   ;-)
  Hehe... I was thinking the same thing. I've definately seen cases
  where the use of surrogate keys verges on ridiculous. It hasn't
  harmed the application, except it terms of complexity. It still works.
  It still performs fine. The SQL queries are awful looking. :-)
 Got an example?

Sure, but I have to be vague, because it's a company DB. :-)

The DB has a primary table, that maps the primary key to a surrogate
key. The surrogate key is used in several 1:1 and 1:N relationships.
Pretty straight forward. (I tend to put the primary key in the most
primary table that would have a 1:1 relationship, of which there is
one in this database - but whatever) The primary key is an identifier
used for all inputs and outputs to the application. It is used by
manual and automatic processes internal and external to the company.
Definately a primary key / surrogate key scenario.

The problem here, is that the primary key *is* a natural key. It
is generated to be unique, and it is immutable. There are no interfaces
provided to allow the rename of the key. It is a short character
string of 5 to 20 characters.

All queries to the table are joined with this primary key/surrogate
key table, to allow lookup by the primary key, for records only
identified by the surrogate key.

The database is only likely to have a few thousands records, with
the 1:N relationships not exceeding 5 or 10, and not recursive.
For performance, or disk space, it doesn't really matter which way
they went.

The confusion, though, of joining using a surrogate, that is
intended to be opaque (the value is never queried), ensures that
the program has no simple queries. All queries involve at least
one join.

I said almost ridiculous. It's not enough for me to complain, and
request a re-design. I don't really care what it does, as long as
it accepts my data, and allows me to query my data. But, it does
seem silly to me.

  That's where I would tend to draw the line. For me, I find
  implementation and maintenance to be the most expensive part of my
  applications. My data hasn't yet become large enough to make disk
  space, compute resources, or I/O bandwidth a serious concern.
 Which is exactly what my thought process is. If you mix surrogate and
 non-surrogate keys, how do you know which table has which? Sure, while
 you're actively writing the code it's not an issue, but what about 6
 months later? What about if someone else picks up the code?

It's usually pretty obvious, looking at a database diagram. You look
up the primary key, and see that it only shows up in one table. :-)

 I know Josh was poking fun with his comment about me being lazy, but
 lazy can make for better code. I can go back to code I wrote 3 years ago
 and I know that 99% of tables will have something_id (where something is
 almost certain to be the table name) as a surrogate key to join on;
 there's no need for me to go and figure out what does and what doesn't
 have a surrogate key. The 1% that don't fall into that generally aren't
 an issue because they're normally very large tables that nothing joins
 to.

I don't disagree with you. I just don't mind deciding to use a surrogate
key if I'm unsure, and not using a surrogate if it seems more effort than
gain.

 There's actually an article floating around somewhere about how lazy
 coders are good coders... :)

Dunno where it started, but that's one of the tenets of the developers
of Perl. Of course, with Perl 6, they admitted to having made quite a
few deisgn errors with Perl 5 and earlier... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] 8.0.5 Bug in unique indexes?

2006-01-19 Thread Michael Paesold


Joshua D. Drake wrote:

Michael Paesold wrote:
You don't user pl/perl, do you -- i.e. I guess you read the latest 
release notes and the thread here before that?


Yes I did. I didn't know that the person was running plPerl. I have 
verified that they are. We are now going to check if upgrading to 8.0.6

with a deletion of the duplicates and a reindex resolves the issue.


I thought I'd ask because this sound so familiar...

Best Regards,
Michael Paesold


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Indexes vs. cache flushes

On Thu, Jan 19, 2006 at 02:46:11AM -0500, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  But you couldn't make any meaningful changes in the definition of an
  index, such as changing its column set, operator classes, partial-index
  predicate, etc, except by dropping and recreating it.
 
  The only example that comes to mind of something you might want to be able 
  to
  twiddle and wouldn't expect to be a slow operation is making a unique index 
  a
  non-unique index.
 
 I think actually that that would still work, so long as you acquired
 exclusive lock on the parent table first (which you'd have to do anyway,
 because this would constitute a significant change to the table's schema
 --- it could invalidate plans for example).  The lock would guarantee
 that no one has the index open.  It's only in the case of an opened
 index that I propose not flushing the index support info.
 
 The concerns that I find more interesting are changes in the underlying
 objects.  We don't have an ALTER OPERATOR CLASS, much less an ALTER
 ACCESS METHOD, but it's certainly theoretically possible to change the
 definition of a support function used by an index.  There isn't
 presently any mechanism to force timely propagation of such a change,
 and so you'd be largely on your own --- but realistically, wouldn't such
 a change require rebuilding the index anyway?

How would this affect changing the type of a column? Specifically, I'm
thinking of the case of domains, where it would be very handy if the
type of a domain could be changed (possibly with some restrictions). IE:
CREATE DOMAIN name varchar(50); is now too small and you want to up it
to varchar(64). Granted, not supported now, though changing constraints
is currently supported.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Bogus path in postmaster.opts

Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 There is another possible answer, and it's something I've been
 meaning to bring up for awhile.  Is there a good reason why
 postmaster is a symlink to postgres, rather than a hard link?

 I don't know of one.  Something I have thought of during the recent 
 options reorganization is that we could do away with the 
 postmaster/postgres dichotomy altogether.  Just call the thing 
 postmaster and give it a --single-user-mode option.

No strong objection here, though combining the two manpages is likely
to be a mess :-(

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Surrogate keys (Was: enums)

On Thu, Jan 19, 2006 at 10:09:26AM -0800, Josh Berkus wrote:
 Martjin,
 
 In any of these either misspellings, changes of names, ownership or
 even structure over time render the obvious useless as keys. There are
 techniques for detecting and reducing duplication but the point is that
 for any of these duplicates *can* be valid data.
 
 Please point me out where, in the writings of E.F. Codd or in the SQL 
 Standard, it says that keys have to be immutable for the life of the row.

Possibly nowhere. But when you send invoices to customers, any details
on there *are* immutable. Sure, in your database you don't care if
things change, but then they don't match reality anymore do they?

 Duplicate *values* can be valid data.  Duplicate *tuples* show some 
 serious flaws in your database design.  If you have a personnel 
 directory on which you've not bothered to define any unique constraints 
 other than the ID column, then you can't match your data to reality.  If 
 you have two rows with the same first and last name, you don't know if 
 they are two different people or the same person, duplicated.  Which 
 will be a big problem come paycheck time.

I never said there were duplicate tuples, just that the data has no
natural keys. The tuples are unique because there's a surrogate key. It
is entirely possible to have two people with the same first name, last
name and date of birth. Rather uncommon, but the database must be able
to support it.

I don't understand your example though. If you have a personnel
directory with two rows with the same first and last name, what does
that tell you. Nothing. You have to go find out whether there really
are two of those people or not. You can simplify the process by taking
into account the fact that it's very unlikely, but a unique constraint
is not the answer. Besides, it's far more likely the same person will
appear twice with two different spellings of their name. :)

Anyway, the discussion was about surrogate vs natural keys. Nothing
here has convinced me that there are any useful natural keys to be
found in the examples I gave. Most of the examples I gave come from a
system I had to maintain where some designer had assumed there was some
kind of natural key and in *each* and *every* case it caused
problems...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature

Re: [HACKERS] suppress output for benchmarking

2006-01-19 Thread Hannu Krosing

Ühel kenal päeval, K, 2006-01-18 kell 22:35, kirjutas uwcssa:
  I am testing the performance of postgresql on a set of workloads.
 However,
 the output significantly affects the performance evaluation. Is there
 a way
 to by-pass all output of select statements so the timing reflects only
 the 
 query evaluation process?

If you do EXPLAIN ANALYSE QUERY instead of just QUERY, then the backend
discards all rows returned and just gives back performance data

---
Hannu




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Cache-flush stress testing

I've completed a round of stress testing the system for vulnerabilities
to unexpected cache flush events (relcache, catcache, or typcache
entries disappearing while in use).  I'm pleased to report that the 8.1
branch now passes all available regression tests (main, contrib, pl)
with CLOBBER_CACHE_ALWAYS defined as per the attached patch.
I have not had the patience to run a full regression cycle with
CLOBBER_CACHE_RECURSIVELY (I estimate that would take over a week on the
fastest machine I have) but I have gotten through the first dozen or so
tests, and I doubt that completing the full set would find anything not
found by CLOBBER_CACHE_ALWAYS.

HEAD is still broken pending resolution of the lookup_rowtype_tupdesc()
business.  8.0 should be OK but I haven't actually tested it.

I'm still bothered by the likelihood that there are cache-flush bugs in
code paths that are not exercised by the regression tests.  The
CLOBBER_CACHE patch is far too slow to consider enabling on any regular
basis, but it seems that throwing in cache flushes at random intervals,
as in the test program I posted here:
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php
doesn't provide very good test coverage.  Has anyone got any ideas about
better ways to locate such bugs?


regards, tom lane


Index: inval.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/cache/inval.c,v
retrieving revision 1.74
diff -c -r1.74 inval.c
*** inval.c 22 Nov 2005 18:17:24 -  1.74
--- inval.c 19 Jan 2006 21:47:07 -
***
*** 625,630 
--- 625,660 
  {
ReceiveSharedInvalidMessages(LocalExecuteInvalidationMessage,
 
InvalidateSystemCaches);
+ 
+   /*
+* Test code to force cache flushes anytime a flush could happen.
+*
+* If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a
+* fairly thorough test that the system contains no cache-flush hazards.
+* However, it also makes the system unbelievably slow --- the 
regression
+* tests take about 100 times longer than normal.
+*
+* If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY.
+* This slows things by at least a factor of 1, so I wouldn't 
suggest
+* trying to run the entire regression tests that way.  It's useful to
+* try a few simple tests, to make sure that cache reload isn't subject
+* to internal cache-flush hazards, but after you've done a few thousand
+* recursive reloads it's unlikely you'll learn more.
+*/
+ #if defined(CLOBBER_CACHE_ALWAYS)
+   {
+   static bool in_recursion = false;
+ 
+   if (!in_recursion)
+   {
+   in_recursion = true;
+   InvalidateSystemCaches();
+   in_recursion = false;
+   }
+   }
+ #elif defined(CLOBBER_CACHE_RECURSIVELY)
+   InvalidateSystemCaches();
+ #endif
  }
  
  /*

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] un-vacuum?

Michael Fuhr [EMAIL PROTECTED] writes:
 On Thu, Jan 19, 2006 at 03:54:33PM -0500, uwcssa wrote:
 I want to do this for  repeating some experiment results, not for
 tuning the db (pretty much like using an old machine to find
 performance difference for an algorithm).  so if i have a way
 of knowing which tables are storing the statistics, i guess i can
 delete all from that table to archieve this.

 pg_statistic stores statistics.  I think it's safe to delete rows,

DELETE FROM pg_statistic is safe enough, but it's more of an
un-analyze than an un-vacuum.  There is no un-vacuum.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Indexes vs. cache flushes

Jim C. Nasby [EMAIL PROTECTED] writes:
 How would this affect changing the type of a column?

It doesn't, because we drop and rebuild indexes completely during ALTER
COLUMN TYPE.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] un-vacuum?

2006-01-19 Thread uwcssa

I want to do this for  repeating some experiment results, not for
tuning the db (pretty much like using an old machine to find
performance difference for an algorithm).  so if i have a way
of knowing which tables are storing the statistics, i guess i can
delete all from that table to archieve this.


 On 1/19/06, Hannu Krosing [EMAIL PROTECTED] wrote:
  Ühel kenal päeval, N, 2006-01-19 kell 14:25, kirjutas uwcssa:
   I have a simple question here, not sure if i should posted here but
   if you have the quick answer, it helps a lot
  
   i have a table that is already vacuumed.  for some reason i want
   to un-vacuum it instead of dropping the table and recreate the table
   and indexes on it.  is there a existing command to do so?
 
  What exactly are you tryingto achieve ?
 
  -
  Hannu
 
 
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bug: random() can return 1.0

2006-01-19 Thread Bruce Momjian

Andrew - Supernews wrote:
 src/backend/utils/adt/float.c:
 
 /*
  *drandom - returns a random number
  */
 Datum
 drandom(PG_FUNCTION_ARGS)
 {
   float8  result;
 
   /* result 0.0-1.0 */
   result = ((double) random()) / ((double) MAX_RANDOM_VALUE);
 
   PG_RETURN_FLOAT8(result);
 }
 
 Whoever wrote this obviously did intend it to return values in [0.0,1.0]
 but this makes it totally useless for generating uniform random ranges
 in the usual way, since random() * N will return N with probability 2^-31.
 The documentation is sufficiently imprecise about this to cause confusion
 (seen in questions asked on the IRC channel), and the problem can't be
 worked around at the application level without knowing the value of
 MAX_RANDOM_VALUE in order to correct the range to [0.0,1.0).

Because random returns a double, I think it is very possible that we
could return 1 due to rounding, and I see no way to avoid that.  I think
re-running random if it returns 1 is likely to return even less random
values.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread Jeremy Drake

On Thu, 19 Jan 2006, Jim C. Nasby wrote:

 Do you still have that patch that folks could look at? ISTM that this
 technique would be rather dependant on your actual workload, and as such
 could result in a big win for certain types of queries.


It is not a patch, per se.  It is a c language function which calls some
of the nbtree functions to return things from the index.  The syntax for
calling it is rather obtuse, since those of us who don't understand the
parser are doomed to attempt circumventing it ;P.

I tarred up the code, and put it on a web server so that interested
parties can play with it.  The url is
http://linux.apptechsys.com/~jeremyd/postgresql/fakeidxscan.tar.gz

It is very hackish, so definately do not assume that it is in any way
correct, rather assume the opposite.  I have run it on x86 and x86_64
boxes, and it compiles and runs on those.

Here is an example of its usage, so you can see the nasty syntax required
and perhaps grok how to use it better.


create table test_table (a integer, b integer);
create index test_table_a_b_idx on test_table (a, b);
insert into test_table (a, b) select a, b from generate_series(1,100) a,
generate_series(1,100) b;

select * from fakeidxrowscomposite(
'test_table', -- relation
'test_table_a_b_idx', -- index
1, --number of scan keys
ARRAY[1, 2]::smallint[], -- numbers of the index attributes to return
ARRAY[1]::smallint[], -- numbers of the attrs the scankeys apply to
ARRAY['=(integer,integer)'::regoperator], -- operators for the scankeys
ARRAY[3]::smallint[], -- btree strategy for the scankeys
(42,0) -- values for the scankeys to compare against (if there is only
   -- one, you have to put a fake one in since otherwise the parser
   -- does not think it is a record)
) AS (a integer, b integer); -- tell the parser what columns to expect



This example returns 100 rows in which the first column contains 42 and
the second column contains the numbers between 1 and 100, in order.

Feel free to do whatever with this, it's pretty fast for tables where
seeks to validate tuples would hurt, but you do get back dead things...


-- 
When you know absolutely nothing about the topic, make your forecast by
asking a carefully selected probability sample of 300 others who don't
know the answer either.
-- Edgar R. Fiedler

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?

2006-01-19 Thread Larry Rosenman

Greetings,
I've got a fast FreeBSD/amd64 server available to run Buildfarm on.

However, I see we already have a couple of others running it.

My questions are:
1) do we need another one?
2) if yes, what options need coverage?

Thanks,
LER


-- 
Larry Rosenman  
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX  78727-6531 

Tel: 512.231.6173
Fax: 512.231.6597
Email: [EMAIL PROTECTED]
Web: www.pervasive.com 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] suppress output for benchmarking

On Thu, Jan 19, 2006 at 11:46:29PM +0200, Hannu Krosing wrote:
 ??hel kenal p??eval, K, 2006-01-18 kell 22:35, kirjutas uwcssa:
   I am testing the performance of postgresql on a set of workloads.
  However,
  the output significantly affects the performance evaluation. Is there
  a way
  to by-pass all output of select statements so the timing reflects only
  the 
  query evaluation process?
 
 If you do EXPLAIN ANALYSE QUERY instead of just QUERY, then the backend
 discards all rows returned and just gives back performance data

The flipside is that EXPLAIN ANALYZE adds it's own (non-trivial)
overhead to the query.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] [PATCH] Better way to check for getaddrinfo function.

2006-01-19 Thread Bruce Momjian


Where are we on this?  Rajesh, I think we are waiting for more
information from you.

---

R, Rajesh (STSD) wrote:
 
 That was very much situation specific.
 But the bottomline is the default test does not include netdb.h in the
 test code.
 So, pg uses getaddrinfo.c.And the getaddrinfo.c does not work for me. 
 Ipv6 client authenciation fails.
 
 I have modified the patch.
 
 $ diff -r configure.in configure.in.new
 918a919
  AC_MSG_CHECKING([for getaddrinfo])
 920c921,926
AC_REPLACE_FUNCS([getaddrinfo])
 ---
   AC_TRY_LINK([#include netdb.h #include assert.h],
  [char (*f)();f=getaddrinfo;],
ac_cv_func_getaddrinfo=yes, ac_cv_func_getaddrinfo=no)
  if test x$ac_cv_func_getaddrinfo = xyes; then
AC_DEFINE(HAVE_GETADDRINFO,1,[Define if you have the getaddrinfo
 function])
  fi
 923a930
  AC_MSG_RESULT([$ac_cv_func_getaddrinfo])
 
 
 Rajesh R
 --
 This space intentionally left non-blank. 
 
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: Monday, January 16, 2006 11:28 PM
 To: R, Rajesh (STSD)
 Cc: pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
 Subject: Re: [GENERAL] [PATCH] Better way to check for getaddrinfo
 function. 
 
 R, Rajesh (STSD) [EMAIL PROTECTED] writes:
  Just thought that the following patch might improve checking for 
  getaddrinfo function (in configure.in)
 
 Since AC_TRY_RUN tests cannot work in cross-compilation scenarios, you
 need an *extremely* good reason to put one in.  I thought this might
 improve things doesn't qualify.  Exactly what problem are you trying to
 solve and why is a run-time test necessary?  Why doesn't the existing
 coding work for you?
 
   regards, tom lane configure-in.patch 

Content-Description: configure-in.patch

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] questions about varlist and Query

2006-01-19 Thread John


Fellow hacker,

I am new comer to postgres development community. Currently, I am 
implementing tightly coupled machine classifiers within postgres. The 
grammer looks like Train Parameter_list (class1,class2,class3...). I have 
two major problems right now.


1. Train is a statement and it is suppose to return some parameters in the 
form of a query. (To be used by a classifier later.) How can I return a 
Query with self specified column name and data vectors?


2. class1, class2 are all supposed to be relations/query result. But what is 
a proper container to hold them. I found var_list to be a good candidate as 
it can contain unconstrained number of var_values. But var_values are of 
constant types. Can I just add query as a constant? Anyone had this kind of 
implementing experience?


Thanks,
John 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] un-vacuum?

2006-01-19 Thread Michael Fuhr

On Thu, Jan 19, 2006 at 03:54:33PM -0500, uwcssa wrote:
 I want to do this for  repeating some experiment results, not for
 tuning the db (pretty much like using an old machine to find
 performance difference for an algorithm).  so if i have a way
 of knowing which tables are storing the statistics, i guess i can
 delete all from that table to archieve this.

pg_statistic stores statistics.  I think it's safe to delete rows,
but you might want to wait for one of the developers to comment
before mucking around with the stored values, especially if you're
not familiar with reading the pg_stats view.

http://www.postgresql.org/docs/8.1/interactive/catalog-pg-statistic.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] un-vacuum?

2006-01-19 Thread Simon Riggs

On Thu, 2006-01-19 at 14:25 -0500, uwcssa wrote:
 I have a simple question here, not sure if i should posted here but
 if you have the quick answer, it helps a lot
 
 i have a table that is already vacuumed.  for some reason i want
 to un-vacuum it instead of dropping the table and recreate the table
 and indexes on it.  is there a existing command to do so?

I think your best route to experimentation is to stick to executing real
commands in as a very similar environment to actual usage. I would never
trust experimental results derived from the use of such a command,
should such a thing ever exist. Good testing takes time and care; there
are few shortcuts to good experimental results in any scientific
endeavour.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Cache-flush stress testing

On Thu, Jan 19, 2006 at 05:03:20PM -0500, Tom Lane wrote:
 I'm still bothered by the likelihood that there are cache-flush bugs in
 code paths that are not exercised by the regression tests.  The
 CLOBBER_CACHE patch is far too slow to consider enabling on any regular
 basis, but it seems that throwing in cache flushes at random intervals,
 as in the test program I posted here:
 http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php
 doesn't provide very good test coverage.  Has anyone got any ideas about
 better ways to locate such bugs?

Some of the machines in the buildfarm do nothing else useful, if this
was turned into a configure option it would be trivial to setup some of
those machines to just hammer away at this.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Bug: random() can return 1.0

2006-01-19 Thread Andrew - Supernews

src/backend/utils/adt/float.c:

/*
 *  drandom - returns a random number
 */
Datum
drandom(PG_FUNCTION_ARGS)
{
float8  result;

/* result 0.0-1.0 */
result = ((double) random()) / ((double) MAX_RANDOM_VALUE);

PG_RETURN_FLOAT8(result);
}

Whoever wrote this obviously did intend it to return values in [0.0,1.0]
but this makes it totally useless for generating uniform random ranges
in the usual way, since random() * N will return N with probability 2^-31.
The documentation is sufficiently imprecise about this to cause confusion
(seen in questions asked on the IRC channel), and the problem can't be
worked around at the application level without knowing the value of
MAX_RANDOM_VALUE in order to correct the range to [0.0,1.0).

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread Simon Riggs

On Wed, 2006-01-18 at 20:13 -0500, Tom Lane wrote:

 Come to think of it, the idea also seems to map nicely into bitmap index
 scans: the index will directly hand back a list of potential pages to
 look at, but they are all marked lossy because the index doesn't know
 exactly which tuple(s) on the target pages match the query.  The
 existing bitmap-heap-scan code can take it from there.

Yes, I've privately suggested this solution in that context.

I think there is enough meat there to make this topic worth discussing
further, but not on list again just yet.

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Bug: random() can return 1.0

Bruce Momjian pgman@candle.pha.pa.us writes:
 Because random returns a double, I think it is very possible that we
 could return 1 due to rounding,

Not unless your machine has a double type with less than 32 bits of
precision, which seems pretty unlikely.  It'd be sufficient to do

/* result 0.0 = x  1.0 */
result = ((double) random()) / ((double) MAX_RANDOM_VALUE + 1.0);


regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] PostgreSQL and shared memory.

2006-01-19 Thread Thomas Hallgren


Hi,
I'd like to get a general understanding of what kind of structures 
PostgreSQL puts in shared memory and how they are managed. I'd like some 
hints on where to start looking. Source, docs, prior discussions, 
anything is considered helpful.


TIA,
Thomas Hallgren


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] PostgreSQL and shared memory.

2006-01-19 Thread Bruce Momjian

Thomas Hallgren wrote:
 Hi,
 I'd like to get a general understanding of what kind of structures 
 PostgreSQL puts in shared memory and how they are managed. I'd like some 
 hints on where to start looking. Source, docs, prior discussions, 
 anything is considered helpful.

Have you looked on the developers page.  My internals through pictures
talk should help too.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] un-vacuum?

You could also do this by doing a filesystem copy of $PG_DATA (with
postgresql shut down), and then restoring that copy after your test. If
you used rsync (or something that allowed filesystem snapshots) this
probably wouldn't be very painful.

On Thu, Jan 19, 2006 at 03:54:33PM -0500, uwcssa wrote:
 I want to do this for  repeating some experiment results, not for
 tuning the db (pretty much like using an old machine to find
 performance difference for an algorithm).  so if i have a way
 of knowing which tables are storing the statistics, i guess i can
 delete all from that table to archieve this.
 
 
  On 1/19/06, Hannu Krosing [EMAIL PROTECTED] wrote:
   ?hel kenal p?eval, N, 2006-01-19 kell 14:25, kirjutas uwcssa:
I have a simple question here, not sure if i should posted here but
if you have the quick answer, it helps a lot
   
i have a table that is already vacuumed.  for some reason i want
to un-vacuum it instead of dropping the table and recreate the table
and indexes on it.  is there a existing command to do so?
  
   What exactly are you tryingto achieve ?
  
   -
   Hannu
  
  
  
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] No heap lookups on index

On Thu, Jan 19, 2006 at 02:50:39PM -0800, Jeremy Drake wrote:
 On Thu, 19 Jan 2006, Jim C. Nasby wrote:
 
  Do you still have that patch that folks could look at? ISTM that this
  technique would be rather dependant on your actual workload, and as such
  could result in a big win for certain types of queries.
...
 Feel free to do whatever with this, it's pretty fast for tables where
 seeks to validate tuples would hurt, but you do get back dead things...

How'd you then weed out the dead tuples?

Basically, numbers talk. If there were convincing numbers for something
that wasn't a corner-case that showed a marked improvement then there'd
be much more interest in getting this into the backend in some fashion.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

Simon Riggs [EMAIL PROTECTED] writes:
 This seems to lead to a super-geometric progression in the number of
 files required,

But we double the number of batches at each step, so there are going to
be at most 20 or so levels, and that's only assuming a *horridly* wrong
initial guess by the planner.  In practice I think it's reasonable to
assume at most a couple rounds of doubling.  If you have more than that,
the extra data-shuffling is going to exhaust your patience anyway.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] No heap lookups on index

Jim C. Nasby [EMAIL PROTECTED] writes:
 Basically, numbers talk. If there were convincing numbers for something
 that wasn't a corner-case that showed a marked improvement then there'd
 be much more interest in getting this into the backend in some fashion.

I've got no doubt that there are *some* non corner cases for which
this would be a win.  The lower the update load on the database, the
better it's going to look.  The issue really is where does it start
being a loss, and can you convince us that those cases are all corner
ones?

(The subtext here, of course, is the assumption that it's an
all-or-nothing choice.  I'm of the opinion that supporting both options
would be infeasible from a code complexity and maintenance standpoint;
but a simple patch that did both would of course prove that opinion
wrong ...)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-19 Thread Simon Riggs

On Tue, 2006-01-17 at 21:43 +, Simon Riggs wrote:
 On Tue, 2006-01-17 at 09:52 -0500, Tom Lane wrote:
  I was thinking along the lines of having multiple temp files per hash
  bucket.  If you have a tuple that needs to migrate from bucket M to
  bucket N, you know that it arrived before every tuple that was
  assigned
  to bucket N originally, so put such tuples into a separate temp file
  and process them before the main bucket-N temp file.  This might get a
  little tricky to manage after multiple hash resizings, but in
  principle
  it seems doable.

 You can manage that with file naming. Rows moved from batch N to batch M
 would be renamed N.M, so you'd be able to use file ordering to retrieve
 all files for *.M
 That scheme would work for multiple splits too, so that filenames could
 grow yet retain their sort order and final target batch properties.

This seems to lead to a super-geometric progression in the number of
files required, if we assume that the current batch could be
redistributed to all future batches each of which could be similarly
redistributed.

batches
1   no files
2   1 file
4   7 files
8   64 files
16  64,000 files
32  4 billion files ish

So it does seem important whether we demand sorted input or not.

Or at least requires us to provide the executor with a starting point
for the number of batches, so we could manage that.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] un-vacuum?

2006-01-19 Thread Michael Fuhr

On Thu, Jan 19, 2006 at 04:54:21PM -0600, Jim C. Nasby wrote:
 You could also do this by doing a filesystem copy of $PG_DATA (with
 postgresql shut down), and then restoring that copy after your test. If
 you used rsync (or something that allowed filesystem snapshots) this
 probably wouldn't be very painful.

Hmmm...wouldn't using a template database work the same way?  Doesn't
CREATE DATABASE simply do a recursive copy of the template database's
directory?  I'm thinking you could

1. Set up the initial test conditions in some database.  This could
   include creating unanalyzed tables in dire need of vacuuming.

2. Use createdb or CREATE DATABASE to create a new database using
   the database in (1) as the template.

3. Run tests in the new database.

4. Repeat (2) and (3) as necessary.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?

Larry Rosenman [EMAIL PROTECTED] writes:
 I've got a fast FreeBSD/amd64 server available to run Buildfarm on.

 However, I see we already have a couple of others running it.

 My questions are:
 1) do we need another one?
 2) if yes, what options need coverage?

Looks like we're fairly well covered on freebsd already.  Are you
willing to consider running some less-popular OS on it?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread Christopher Kings-Lynne


Yes. Representation of the DNA is probably best. But - that's a lot of
data to use as a key in multiple tables. :-)


No then you have problems with identical twins :)

Chris


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] No heap lookups on index

2006-01-19 Thread Jeremy Drake

On Thu, 19 Jan 2006, Jim C. Nasby wrote:

  Feel free to do whatever with this, it's pretty fast for tables where
  seeks to validate tuples would hurt, but you do get back dead things...

 How'd you then weed out the dead tuples?

I didn't get that far with it.  The purpose of this function was to
quickly put something together to demonstrate that the overhead of seeking
to the proper tuples in the heap to determine their visibility was the
main component of the time being spent to satisfy our queries.

 Basically, numbers talk. If there were convincing numbers for something
 that wasn't a corner-case that showed a marked improvement then there'd
 be much more interest in getting this into the backend in some fashion.

I could get some numbers of how much time validating tuples adds to a
query, but I don't think that that would be horribly novel.  BTW,
hopefully I did not make you think that I intended to get this into
the official backend.  This function was only meant to demonstrate to the
people around here that the visibility check was the bottleneck we were
seeing.  The function may also be interesting as a demonstration of how
indexes are handled in postgres, as you can see when tuples are flagged as
no longer valid and when they are not.  I have put xmin into an index so
that I could use this function to better visualize when index tuples are
left behind (I tried to put xmax in there too, but I never saw them
change, after checking the code it turns out that the index is never told
about changes in xmax).


We were seeing this case: All rows in our table are visible (we are the
only transaction on the machine and we did a VACUUM FULL ANALYZE before).
We rebooted to ensure no caching.  We were seeing times which, upon
division by the number of rows returned by the index scan, were remarkably
close to the average seek time listed on the specs for the hard drive in
the testing box.  This was about 5ms, which doesn't sound like much, but
given a large enough number of rows and a few joins, 5ms per tuple adds up
quickly.  This implies that we were seeing approximately the worst case as
far as the distribution of the relevant tuples on pages, ie each tuple we
wanted was on a different heap page.

Digging back to some times we had collected from this experiment,
apparently we were taking about 15 to 20 seconds to run a particular
query, and when we used the function I previously posted those times were
reduced to 5 seconds.  This was a while ago, however, so these times are
probably not very accurate and we probably made other tweaks to speed
things up since then.  But it gives an idea.  We could come up with more
absolute numbers, but I think people already know what they would look
like.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-19 Thread Michael Glaesemann



On Jan 20, 2006, at 10:50 , Christopher Kings-Lynne wrote:

Yes. Representation of the DNA is probably best. But - that's a  
lot of

data to use as a key in multiple tables. :-)


No then you have problems with identical twins :)


And, looking forward, clones.

Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] Questions about varlist and Query

2006-01-19 Thread John






  Fellow 
  hacker,I am new comer to postgres development community. Currently, I 
  am implementing tightly coupled machine classifiers within postgres. The 
  grammer looks like Train Parameter_list (class1,class2,class3...). I have 
  two major problems right now.1. Train is a statement and it is 
  suppose to return some parameters in the form of a query. (To be used by a 
  classifier later.) How can I return a Query with self specified column 
  name and data vectors?2. class1, class2 are all supposed to be 
  relations/query result. But what is a proper container to hold them. I 
  found var_list to be a good candidate as it can contain unconstrained 
  number of var_values. But var_values are of constant types. Can I just add 
  query as a constant? Anyone had this kind of implementing 
  experience?Thanks,John

Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?

2006-01-19 Thread Larry Rosenman

Tom Lane wrote:
 Larry Rosenman [EMAIL PROTECTED] writes:
 I've got a fast FreeBSD/amd64 server available to run Buildfarm
 on. 
 
 However, I see we already have a couple of others running it.
 
 My questions are:
 1) do we need another one?
 2) if yes, what options need coverage?
 
 Looks like we're fairly well covered on freebsd already.  Are you
 willing to consider running some less-popular OS on it? 

Not particularly, as I want it to do other things.  However, I might be
willing
To get VMWare running on it, and run something else in a VM.

What were you thinking?

LER

-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3683 US


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?