Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-27 Thread Dennis Bjorklund
On Wed, 26 May 2004, Tom Lane wrote:

 if you have to do an UPDATE that affects every row of a large table
 
   UPDATE tab SET col = col + 1
 
 which leaves you with N live rows, N dead rows, and lots of pain to get
 back down to a less-than-twice-normal-size table.  (Traditional way is
 VACUUM FULL; CLUSTER is better, but still painful.)  As of 7.5 you could
 hack this with
 
   ALTER TABLE tab ALTER COLUMN col TYPE same-type USING col + 1
 
 which will have the effect of rewriting a fresh table file containing
 the updated rows, and dropping the old file at commit.

 I'm not real sure where to document this
 trick but it seems like we ought to mention it someplace.

Isn't it better to detect a UPDATE without a where and do that update in 
the same way as the alter table above? Then we don't need to document and 
learn a new non standard way of doing an update.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Nested xacts: looking for testers and review

2004-05-27 Thread Stephan Szabo
On Thu, 27 May 2004, Alvaro Herrera wrote:

 On Wed, May 26, 2004 at 04:35:52PM -0700, Stephan Szabo wrote:

  On Wed, 26 May 2004, Alvaro Herrera wrote:
 
   I'm missing one item: deferred triggers.  The problem with this is that
   the deftrig queue is not implemented using normal Lists, so there's no
   efficient way to reassign to the parent when the subtransaction commits.
   Also I'm not sure what should happen to the immediate pointer --- a
   subtransaction should have it's own private copy, or it should inherit
   the parent's?  Please whoever implemented this speak up (Stephan
   Szabo?), as I'm not sure of the semantics.
 
  The immediate pointer basically points at the end of the queue from the
  last scanning of the trigger queue (since any immediate triggers from
  before that would have been run at that time there's no reason to scan
  from the beginning).

 Hmm.  You assume correctly that a subtransaction has (or will have) a
 private queue.  But we do not consider a subtransaction to be somewhat a
 separate entity -- the principle is that the transaction has to feel
 just like the BEGIN wasn't there.  So

 BEGIN;
   UPDATE foo ...
   UPDATE bar ...
 COMMIT

 has to be exactly the same as

 BEGIN;
   BEGIN;
   UPDATE foo ...
   COMMIT;
   BEGIN;
   UPDATE bar ...
   COMMIT;
 COMMIT;

 Now, with that in mind: is it correct that the immediate pointer
 points to the beginning of the subtransaction's private deferred trigger
 queue, at subtransaction's start?

AFAIR you can set it to NULL because that means scan the entire list.

 Now, at subtransaction end, lets suppose I concatenate the list the
 original transaction had with the subtransaction's private list.  What
 should the immediate pointer be?

I'd say pointing at the last item that was added to the queue.

 When is the immediate pointer advanced?  I know it's during scanning of
 the list, but when is this?  At the end of each query?

At the point after triggers fire after a statement.  It's the actual
scanning of the trigger queue to see what to run that changes it unless
I'm misremembering.

  If one sets a constraint to immediate in a subtransaction, does/should
  it cause the immediate check of pending events from its parent?  And
  does that revert when the transaction ends?

 Yes, I think it should fire all events, including the parent's.  Good
 point; it means there has to be a way of getting the whole list, from
 the topmost transaction, in order :-(

Yeah...  Right now we don't need to do something special because resetting
the immediate pointer basically does what we want (re-scan the entire set
looking for non-run things that are now immediate).

 I'm not sure what you mean by reverting though.

The state about whether a trigger is actually deferred or immediate.  I
believe it basically works as:
begin;
 set constraints all immediate;
 -- here any deferrable constraints are treated as immediate
end;
begin;
 -- here any deferrable constraints are in their default state
end;

So, if you have
begin;
 -- 1
 begin;
  set constraints all immediate;
 end;
 -- 2
end;

Do 1 and 2 see the same constraint checking mode or is 2 at immediate?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] How can I have 2 completely seperated databases in

2004-05-27 Thread Rod Taylor
On Wed, 2004-02-11 at 16:36, [EMAIL PROTECTED] wrote:
 Hi, all
 
 What should I do if I want to have 2 completely seperated databases in
 PostgreSQL? I want each database to have its own data, log and
 everything needed to access that database. I don't want them to share
 anything. Has anyone done this before? Or,
 could anyone give me some clue of how to do this?

You will need to initdb 2 locations and start 2 instances on separate ports.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-27 Thread Zeugswetter Andreas SB SD

 What would be useful for that (as of 7.5) is a dummy ALTER TABLE TYPE,
 viz
 
   ALTER TABLE tab ALTER COLUMN col TYPE same-type-it-already-has

I think we should add special syntax for this purpose, since I would like to
(or someone else later on) see all possible cases of alter column short 
circuited.

The syntax I would see fit is something like:
ALTER TABLE tab [MOVE] TABLESPACE xy;

For the above special case the tablespace would be the same as before.

Andreas

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread Andreas Pflug
James Robinson wrote:

Users are primarily, if not stupid, ignorant. They will read the 
absolute
minimum needed to achieve a goal and little else. I say this with the
utmost respect, because I and probably everyone else on this group is
guilty of the same thing. So, the preferred installation procedure, 
i.e.
the one with the easy to follow directions, should showcase features the
user should know, and leave the user in a good place. IMHO, the user's
database on one volume and pg_xlog on another is a better starting 
place.

Yes, that is generally the case (prefer pg_xlog on separate spindle), 
but no
need to *forcibly* overcomplicate things if the box has only one spindle,
or if they have only one single RAID'd partition configured. We should
continue to err on the side of keeping the path to a functional system
nice and simple, yet still offering superb functionality. Oracle gets 
this
wrong. pg_autovacuum is another good step in this direction.

In the age of inexpensive RAID, tablespaces have more or less lost their 
relevance regarding performance. pgsql's philosophy respects this by 
leaving the storage work up to the OS and disk subsystem. Even having 
the xlog on a different spindle won't help too much; you'll probably be 
better off if you stuff all your spindles in one raid on most systems. 
For worse, splitting two disks into separate storage areas to have xlog 
separated would degrade safety for very little performance gain. So the 
advise is: one disk, no alternative. 2 to 20 disks: use a single raid. 
more disks: examine your access patterns carefully before you believe 
you can do the job better than your raid controller.

This leaves table spaces as a mere administrative feature, many (most) 
installations will happily live without that.

Regards,
Andreas

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread pgsql
 Yes, that is generally the case (prefer pg_xlog on separate spindle),
 but no
 need to *forcibly* overcomplicate things if the box has only one
 spindle,
 or if they have only one single RAID'd partition configured. We should
 continue to err on the side of keeping the path to a functional system
 nice and simple, yet still offering superb functionality. Oracle gets
 this
 wrong. pg_autovacuum is another good step in this direction.

 
 James Robinson
 Socialserve.com

I wouldn't say forcibly.

While I think everyone would agree, Oracle is the most arcane and
difficult system ever devised by man, one can look at it and bring away
the far too few good things it does do. Similarly, MSSQL likes to user
data and system data separated. Setting user expectation is important,
setting baseline usage by recommendation is important.

One of the places PostgreSQL errs is its default configuration. It is far
to conservitive. The default setup is typically how most users operate
their software. Oracle, while being the worst, has one benefit, it has no
recognizable default, every DBA must slog through endless options to set
it up. In the process they are made aware of the various tuning options
and the flexability of the system.

PostgreSQL, on the other hand, installs easily, and can be used almost
immediately. This is not a problem, per se', but the user is not exposed
to how to make the system perform well unless they read the manual beyond
the installation. (which we know they won't)  Compounding the problem,
various parameters in the default configuration file are too small for
serious servers.

I would love to re-write the installation chapter and perhaps add a
section on generic database tuning. Specifically, like my PostgreSQL for
Windows CD a couple years ago, I like to see three sections: Personal,
Workgroup, and server configuration tips. I think we've all seen bad
reviews because people used PostgreSQL's default configuration.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread pgsql
 James Robinson wrote:



 Users are primarily, if not stupid, ignorant. They will read the
 absolute
 minimum needed to achieve a goal and little else. I say this with the
 utmost respect, because I and probably everyone else on this group is
 guilty of the same thing. So, the preferred installation procedure,
 i.e.
 the one with the easy to follow directions, should showcase features
 the
 user should know, and leave the user in a good place. IMHO, the user's
 database on one volume and pg_xlog on another is a better starting
 place.


 Yes, that is generally the case (prefer pg_xlog on separate spindle),
 but no
 need to *forcibly* overcomplicate things if the box has only one
 spindle,
 or if they have only one single RAID'd partition configured. We should
 continue to err on the side of keeping the path to a functional system
 nice and simple, yet still offering superb functionality. Oracle gets
 this
 wrong. pg_autovacuum is another good step in this direction.


 In the age of inexpensive RAID, tablespaces have more or less lost their
 relevance regarding performance. pgsql's philosophy respects this by
 leaving the storage work up to the OS and disk subsystem. Even having
 the xlog on a different spindle won't help too much; you'll probably be
 better off if you stuff all your spindles in one raid on most systems.
 For worse, splitting two disks into separate storage areas to have xlog
 separated would degrade safety for very little performance gain. So the
 advise is: one disk, no alternative. 2 to 20 disks: use a single raid.
 more disks: examine your access patterns carefully before you believe
 you can do the job better than your raid controller.

 This leaves table spaces as a mere administrative feature, many (most)
 installations will happily live without that.

 Regards,
 Andreas

I would say that this is almost completely misinformed. Depending on the
OS and the hardware, of course, a write on one spindle may not affect the
performance of another.

There are so many great things that happen when you have separate
spindles. The OS manages the file systems separately, the device drivers
may be separate, and if the low-level I/O device driver is even different,
then you get your own bus mastering I/O buffers. All around good things
happen when you have separate spindles.

A single postgresql process may not see much benefit, because it does not
do background I/O, but multiple postgresql processes will perform better
because multiple I/O requests can be issued and processed simultaneously.
If you got good SMP in your kernel, even better.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-27 Thread Tom Lane
Dennis Bjorklund [EMAIL PROTECTED] writes:
 On Wed, 26 May 2004, Tom Lane wrote:
 I'm not real sure where to document this
 trick but it seems like we ought to mention it someplace.

 Isn't it better to detect a UPDATE without a where and do that update in 
 the same way as the alter table above? Then we don't need to document and 
 learn a new non standard way of doing an update.

No, because the locking implications are completely different.  I don't
want UPDATE to suddenly decide it needs an exclusive lock on the table
based on the shape of the WHERE clause.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-27 Thread David Blasby
I did a CLUSTER - it took just under 12 hours.  The original load of the 
table (including parsing all the geometries) took about 4 hours (+4 to 
build the indexes and vacuum analyse).  I dont know why the cluster is 
so much slower...

My SELECT * FROM table LIMIT 1; is now fast.
Thanks for the help!
As a side note, I think the behavior of the Database is a bit 
counter-intuitive.  If I were to do a:

UPDATE table SET i = i + 1;
My table will leak 10Gb of disk space, make queries extreamly slow, 
and require taking the database off-line for 12 hours to fix it!

Both the VACUUM and the UPDATE manual should be updated to say that this 
is whats happening.  If I do my above query 100 times, will I be leaking 
a Terrabyte of disk space?

Will increasing my max_fsm_pages to 2,000,000 solve my problem?
dave
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread Josh Berkus
Mohawksoft:

 I forgot to specify that tablepaces should be on separate volumes. (sorry)
 If all they have is one volume, no worries, but instructing the use of
 alternate volumes for system and data will improve performance by
 separating WAL and data operations.

... and the place for this is the documentation, maybe with a nice script to 
help automate it.   Twisting users' arms will just get us a lot of angry 
e-mail.  Plus force separation of tablespaces is not appropriate for many 
kinds of installations:
-- the 1MB 2-table database of someone's DVD collection;
-- the 700GB database running off a $75,000 NAS (which appears to the OS as a 
single volume)

Also, you're getting confused here ... Tablespaces has nothing to do with the 
location of pg_xlog.

 Tablespaces are a familiar construct to experienced DBAs who may not be
 familiar with PostgreSQL. PostgreSQL being similar to other databases will
 have it better make sense to new users.

I'll have to disagree with you there.  I used to be a certified MSSQL admin, 
and I can tell you that not one in 25 members of MSDN Database forum had any 
idea how to use the analogous feature on MSSQL -- despite it being operative 
since 1998.   So forcing new users to deal with tablespaces, even if they 
don't need them, is a great way to get new users to adopt MySQL.

 So, the preferred installation procedure, i.e.
 the one with the easy to follow directions, should showcase features the
 user should know, and leave the user in a good place. 

No, the preferred newbie installation is the one that gets them up and 
running and playing with PostgreSQL in the minimum amount of time.  Setup is 
boring, confusing, and often frustrating, and each step we add to the 
required minimum setup loses us another dozen newbies who weren't sure if 
they are ready to upgrade from MS Access or MySQL.  Heck, for the CDs we're 
making to hand out at conventions, we're running PostgreSQL on Knoppix so 
that users don't have to install *anything*.

Now, if you want to add a power user setup to the Tutorial in our official 
docs, please do!   We could use more guides.   But don't force the guy with 
the personal DVD database to set things up like he's running Ameritrade.

Also, consider that about half our users install from packages: RPMs and Deb 
packages (and soon MSI as well).   Those users aren't going to be going 
through a manual installation procedure at all, so your efforts to educate 
them through proper database setup won't get very far.

 IMHO, the user's
 database on one volume and pg_xlog on another is a better starting place.

For some setups, yes.   For others, no.  It depends on your hardware and 
application.   And, as I said above, Tablespaces will not determine the 
location of pg_xlog AFAIK.

 BTW: Is there a public spec on what will be tablespace compatible and how?
 For instance: will is be possible to create a table on a separate
 tablespace than the DB? Will it be possible to create an index on a
 separate tablespace than the table?

This is in the archives of this list. The whole point of tablespaces is to 
allow placing individual tables and indexes on seperate volumes.   AFAIR, 
we're not that concerned about whole databases, which have always been easily 
re-locatable via symlinks and/or mounts.

P.S. if you signed your e-mails, I'd stop calling you mohawksoft.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread Andreas Pflug
[EMAIL PROTECTED] wrote:
James Robinson wrote:
   

Users are primarily, if not stupid, ignorant. They will read the
absolute
minimum needed to achieve a goal and little else. I say this with the
utmost respect, because I and probably everyone else on this group is
guilty of the same thing. So, the preferred installation procedure,
i.e.
the one with the easy to follow directions, should showcase features
the
user should know, and leave the user in a good place. IMHO, the user's
database on one volume and pg_xlog on another is a better starting
place.
   

Yes, that is generally the case (prefer pg_xlog on separate spindle),
but no
need to *forcibly* overcomplicate things if the box has only one
spindle,
or if they have only one single RAID'd partition configured. We should
continue to err on the side of keeping the path to a functional system
nice and simple, yet still offering superb functionality. Oracle gets
this
wrong. pg_autovacuum is another good step in this direction.
 

In the age of inexpensive RAID, tablespaces have more or less lost their
relevance regarding performance. pgsql's philosophy respects this by
leaving the storage work up to the OS and disk subsystem. Even having
the xlog on a different spindle won't help too much; you'll probably be
better off if you stuff all your spindles in one raid on most systems.
For worse, splitting two disks into separate storage areas to have xlog
separated would degrade safety for very little performance gain. So the
advise is: one disk, no alternative. 2 to 20 disks: use a single raid.
more disks: examine your access patterns carefully before you believe
you can do the job better than your raid controller.
This leaves table spaces as a mere administrative feature, many (most)
installations will happily live without that.
Regards,
Andreas
   

I would say that this is almost completely misinformed. Depending on the
OS and the hardware, of course, a write on one spindle may not affect the
performance of another.
There are so many great things that happen when you have separate
spindles. The OS manages the file systems separately, the device drivers
may be separate, and if the low-level I/O device driver is even different,
then you get your own bus mastering I/O buffers. All around good things
happen when you have separate spindles.
A single postgresql process may not see much benefit, because it does not
do background I/O, but multiple postgresql processes will perform better
because multiple I/O requests can be issued and processed simultaneously.
If you got good SMP in your kernel, even better.
 

There are good white papers about DB IO performance, e.g from Microsoft. 
They are not read very often...
If you dedicate drives to services, it's your responsibility to size 
everything to have a balanced load. You'll probably end with some drives 
being the bottleneck, while others are still almost idle. That's why 
RAID shoud be used in the first and second place, it will distribute the 
workload on all spindles equally until saturated. The recommendation to 
use separate disks for this and that originates from ancient days where 
performance had to be achieved by application level programming and 
configuration, implementing own file systems on raw devices. pgsql 
deliberately doesn't work like this.

If you may use lets say 10 disks, you'd probably something similar like
2x RAID1 for /
2x RAID1 for /tmp + swap
2x RAID1 for xlog
4x RAID5 for data
I bet you get better performance with all disks in one RAID5, because 
now the system disks not only have no negative impact on DB transfer 
performance, but add additional seek bandwidth to DB traffic.

Regards,
Andreas


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread pgsql
 In the age of inexpensive RAID, tablespaces have more or less lost their
 relevance regarding performance. pgsql's philosophy respects this by
 leaving the storage work up to the OS and disk subsystem. Even having
 the xlog on a different spindle won't help too much; you'll probably be
 better off if you stuff all your spindles in one raid on most systems.
 For worse, splitting two disks into separate storage areas to have xlog
 separated would degrade safety for very little performance gain. So the
 advise is: one disk, no alternative. 2 to 20 disks: use a single raid.
 more disks: examine your access patterns carefully before you believe
 you can do the job better than your raid controller.

 This leaves table spaces as a mere administrative feature, many (most)
 installations will happily live without that.

 Regards,
 Andreas

Sorry, I just can't leave this one alone. Having multiple spindles i.e.
separate data paths to separate disks and disk systems makes a big
difference. Take this simple program:

 testio.c 
#include unistd.h
#include stdio.h
/* gcc testio.c -o testio */
int main(int argc, char **argv)
{
int i;
int blocks;
FILE *files[16];
int block[512];
int foo[512];
int nblocks = atoi(argv[1]);

printf(Using %d 2K blocks, total file size %d\n,
  nblocks, sizeof(block)*nblocks);
for(i=2; i  argc; i++)
files[i]=fopen(argv[i], w+b);
for(blocks=0; blocks  nblocks; blocks++)
for(i=2; i  argc; i++)
fwrite(block, 1, sizeof(block), files[i]);
for(i=2; i  argc; i++)
fseek(files[i], 0, SEEK_SET);
for(blocks=0; blocks  nblocks; blocks++)
for(i=2; i  argc; i++)
fread(foo, 1, sizeof(foo), files[i]);
}


The purpose of the program is to write out files, and read them back in.
It is crude, obviously, and not a good general test, but it does show the
effect of which I am writing.

On my test system, I have a bunch of disks, but I'll use /home and
/vol01 as examples: vol01 is ext2 and home is reiserfs, and both are IBM
ultra SCSI 10K RPM disks.

[EMAIL PROTECTED] markw]# time ./testio 10 /home/tmp/test.dat
Using 10 2K blocks, total file size 20480

real0m6.790s
user0m0.290s
sys 0m4.120s
[EMAIL PROTECTED] markw]# time ./testio 10 /vol01/tmp/test.dat
Using 10 2K blocks, total file size 20480

real0m7.274s
user0m0.210s
sys 0m1.940s

As you can see, they are fairly well matched +- filesystem issues. Now,
lets run the program across two disks:

[EMAIL PROTECTED] markw]# time ./testio 10 /vol01/tmp/test.dat
/home/tmp/test.dat
Using 10 2K blocks, total file size 20480

real0m12.012s
user0m0.610s
sys 0m6.820s

As you can see, it looks like almost double the time, and you might be
looking at this as proof that you are right. Actually, it is mostly an
artifact of program.

Now, look ate the results if I write two files to the same volume:

[EMAIL PROTECTED] markw]# time ./testio 10 /vol01/tmp/test.dat
/vol01/tmp/test.dat1
Using 10 2K blocks, total file size 20480

real0m19.298s
user0m0.680s
sys 0m3.990s

As you can see, it takes a LOT longer. That's because the disk has to do a
lot more work, the hardware data path has to carry twice as much data, and
you have to manage multiple accesses to a single object, the disk.

It doesn't take much imagination to see what this would mean for pg_xlog.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread pgsql
 [EMAIL PROTECTED] wrote:

James Robinson wrote:



Users are primarily, if not stupid, ignorant. They will read the
absolute
minimum needed to achieve a goal and little else. I say this with the
utmost respect, because I and probably everyone else on this group is
guilty of the same thing. So, the preferred installation procedure,
i.e.
the one with the easy to follow directions, should showcase features
the
user should know, and leave the user in a good place. IMHO, the user's
database on one volume and pg_xlog on another is a better starting
place.


Yes, that is generally the case (prefer pg_xlog on separate spindle),
but no
need to *forcibly* overcomplicate things if the box has only one
spindle,
or if they have only one single RAID'd partition configured. We should
continue to err on the side of keeping the path to a functional system
nice and simple, yet still offering superb functionality. Oracle gets
this
wrong. pg_autovacuum is another good step in this direction.


In the age of inexpensive RAID, tablespaces have more or less lost their
relevance regarding performance. pgsql's philosophy respects this by
leaving the storage work up to the OS and disk subsystem. Even having
the xlog on a different spindle won't help too much; you'll probably be
better off if you stuff all your spindles in one raid on most systems.
For worse, splitting two disks into separate storage areas to have xlog
separated would degrade safety for very little performance gain. So the
advise is: one disk, no alternative. 2 to 20 disks: use a single raid.
more disks: examine your access patterns carefully before you believe
you can do the job better than your raid controller.

This leaves table spaces as a mere administrative feature, many (most)
installations will happily live without that.

Regards,
Andreas



I would say that this is almost completely misinformed. Depending on the
OS and the hardware, of course, a write on one spindle may not affect the
performance of another.

There are so many great things that happen when you have separate
spindles. The OS manages the file systems separately, the device drivers
may be separate, and if the low-level I/O device driver is even
 different,
then you get your own bus mastering I/O buffers. All around good things
happen when you have separate spindles.

A single postgresql process may not see much benefit, because it does not
do background I/O, but multiple postgresql processes will perform better
because multiple I/O requests can be issued and processed simultaneously.
If you got good SMP in your kernel, even better.



 There are good white papers about DB IO performance, e.g from Microsoft.

Do not trust *anything* from Microsoft, they have very dubious motives and
research practices. They are compulsive liars and everything published by
them is specifically edited to present their agenda.


 They are not read very often...
  If you dedicate drives to services, it's your responsibility to size
 everything to have a balanced load. You'll probably end with some drives
 being the bottleneck, while others are still almost idle. That's why
 RAID shoud be used in the first and second place, it will distribute the
 workload on all spindles equally until saturated. The recommendation to
 use separate disks for this and that originates from ancient days where
 performance had to be achieved by application level programming and
 configuration, implementing own file systems on raw devices. pgsql
 deliberately doesn't work like this.

 If you may use lets say 10 disks, you'd probably something similar like
 2x RAID1 for /
 2x RAID1 for /tmp + swap
 2x RAID1 for xlog
 4x RAID5 for data

 I bet you get better performance with all disks in one RAID5, because
 now the system disks not only have no negative impact on DB transfer
 performance, but add additional seek bandwidth to DB traffic.

 Regards,
 Andreas

You are absolutely wrong on all accounts here. A RAID5 system is slower
than a single spindle as it is only as fast as the slowest disk in the
stripe and the overhead of the RAID. RAID[3,5] are slower on writes
because they have to calculate the parity and write it to the parity disk.

One of the things you are failing to note is that different disks can
operate in parallel on an SMP box with good CPU interupt management. Two
writes to two different disks can take place at the same time. Two writes
to a single disk (or disk system) must happen serially.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread pgsql
 Mohawksoft:

 I forgot to specify that tablepaces should be on separate volumes.
 (sorry)
 If all they have is one volume, no worries, but instructing the use of
 alternate volumes for system and data will improve performance by
 separating WAL and data operations.

 ... and the place for this is the documentation, maybe with a nice script
 to
 help automate it.   Twisting users' arms will just get us a lot of angry
 e-mail.  Plus force separation of tablespaces is not appropriate for many
 kinds of installations:
 -- the 1MB 2-table database of someone's DVD collection;
 -- the 700GB database running off a $75,000 NAS (which appears to the OS
 as a
 single volume)

 Also, you're getting confused here ... Tablespaces has nothing to do with
 the
 location of pg_xlog.

I'm not confused but, it is an inverse logic thing. By persuading users
to create databases on a separate tablespace than the system (which
contains pg_xlog), you are by definition separating database and pg_xlog.



 Tablespaces are a familiar construct to experienced DBAs who may not be
 familiar with PostgreSQL. PostgreSQL being similar to other databases
 will
 have it better make sense to new users.

 I'll have to disagree with you there.  I used to be a certified MSSQL
 admin,
 and I can tell you that not one in 25 members of MSDN Database forum had
 any
 idea how to use the analogous feature on MSSQL -- despite it being
 operative
 since 1998.   So forcing new users to deal with tablespaces, even if they
 don't need them, is a great way to get new users to adopt MySQL.

That's food for thought. That's different than my experience. I've set up
a few MSSQL systems and recall it saying that you should create a
different database file from the system, but if you say that it will
confuse new users, that is something that should be considered.



 So, the preferred installation procedure, i.e.
 the one with the easy to follow directions, should showcase features the
 user should know, and leave the user in a good place.

 No, the preferred newbie installation is the one that gets them up and
 running and playing with PostgreSQL in the minimum amount of time.  Setup
 is
 boring, confusing, and often frustrating, and each step we add to the
 required minimum setup loses us another dozen newbies who weren't sure if
 they are ready to upgrade from MS Access or MySQL.  Heck, for the CDs
 we're
 making to hand out at conventions, we're running PostgreSQL on Knoppix so
 that users don't have to install *anything*.

That's cool, PostgreSQL on knoppix? How do you do that? RAM disk?


 Now, if you want to add a power user setup to the Tutorial in our
 official
 docs, please do!   We could use more guides.   But don't force the guy
 with
 the personal DVD database to set things up like he's running Ameritrade.

 Also, consider that about half our users install from packages: RPMs and
 Deb
 packages (and soon MSI as well).   Those users aren't going to be going
 through a manual installation procedure at all, so your efforts to
 educate
 them through proper database setup won't get very far.

I agree that there is a section missing in the manual for this sort of
information.


 IMHO, the user's
 database on one volume and pg_xlog on another is a better starting
 place.

 For some setups, yes.   For others, no.  It depends on your hardware and
 application.   And, as I said above, Tablespaces will not determine the
 location of pg_xlog AFAIK.

See above/.


 BTW: Is there a public spec on what will be tablespace compatible and
 how?
 For instance: will is be possible to create a table on a separate
 tablespace than the DB? Will it be possible to create an index on a
 separate tablespace than the table?

 This is in the archives of this list. The whole point of tablespaces
 is to
 allow placing individual tables and indexes on seperate volumes.   AFAIR,
 we're not that concerned about whole databases, which have always been
 easily
 re-locatable via symlinks and/or mounts.

 P.S. if you signed your e-mails, I'd stop calling you mohawksoft.

 --
 Josh Berkus
 Aglio Database Solutions
 San Francisco



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread Andreas Pflug
[EMAIL PROTECTED] wrote:
As you can see, it takes a LOT longer. That's because the disk has to do a
lot more work, the hardware data path has to carry twice as much data, and
you have to manage multiple accesses to a single object, the disk.
It doesn't take much imagination to see what this would mean for pg_xlog.
 

Well, writing off-reality benchmarks on hardware setups I'd never 
suggest for db server usage proves... what?
Additionally, do you care about safety?

Regards,
Andreas
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-27 Thread Manfred Koizar
On Wed, 26 May 2004 18:17:55 -0400, Tom Lane [EMAIL PROTECTED] wrote:
(Some days I think we should dump
VACUUM FULL, because it's optimized for a case that's no longer very
interesting...)

So we still have to stick with VACUUM FULL for some time, right?

The next set of compatibility breakers I'm currently working on requires
a change in VACUUM FULL behaviour.  I would only move tuples that are
visible to all running transactions.  OTOH I wouldn't stop at the first
unmovable tuple.

With
X  active tuple
.  free space or dead tuple
y  new tuple, not yet visible to a running transaction
z  deleted tuple, still visible to a running transaction

the current implementation transforms this relation

    XXyX  XzXX 

into

 XzXX  XXyX

The new implementation would leave it as

   XX.. ..y.  .z..

If there are concurrent long-running transactions, the new VACUUM FULL
wouldn't truncate the relation as aggressively as it does now.  It could
leave the relation with lots of free space near the end.  This was
absolutely unacceptable at the time when VACUUM FULL was designed.  But
now we could use lazy VACUUM as an excuse for VACUUM FULL not working so
hard.  After the transaction still seeing z terminates, VACUUM (without
FULL) can truncate the relation to

   XX.. ..y.

and when y is updated the new version will be stored in a lower block
and plain VACUUM can truncate the relation again:

   XXY.

AFAICS this would make vacuum.c much simpler (no more chain moves).
Clearly this change alone doesn't have any merit.  But would such a
patch have any chance of being accepted, if it facilitates improvements
in other areas?

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-27 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
    XX.. ..y.

 and when y is updated the new version will be stored in a lower block

Oh?  What makes you think that?  I see no guarantee of it.

 AFAICS this would make vacuum.c much simpler (no more chain moves).

How will it do that?  I think you'd have to not move *any* updated
tuples to be sure you don't need the chain-move mechanism.  Not moving
the outdated tuple isn't sufficient, you can't move the one it points at
either.

 Clearly this change alone doesn't have any merit.  But would such a
 patch have any chance of being accepted, if it facilitates improvements
 in other areas?

I'm disinclined to mess with VACUUM FULL without a clearer explanation
of where you're headed.  So far as I can see, you're simply going to
make VACUUM FULL less effective with no stated benefit.

(BTW, it now occurs to me that CLUSTER and ALTER TABLE in their present
forms may be broken, because they only copy rows that are valid
according to SnapshotNow; this means that rows that are still visible to
old transactions could be lost.  The lack of any attempt to preserve
update chain relationships seems ungood as well, if we might have old
transactions come along and try to update the table later.)

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] list rewrite committed

2004-05-27 Thread Neil Conway
Jeff wrote:
Do we have any numbers as to how much this will help things?
No, I haven't done any benchmarking yet (I might do some before I 
leave for the summer, but it's not a priority...)

FWIW, the performance improvement from this patch won't be as large 
as it might be, since Tom already replaced some lappend() hot spots 
with the FastList code. The new list API makes that optimization 
global, so we'll fix anywhere that fell through the cracks.

If not, would something like a pg_bench exercise the new code enough to 
see the results?
Something like TPC-H would be better, I'd think.
-Neil
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-27 Thread Manfred Koizar
On Thu, 27 May 2004 14:23:07 -0400, Tom Lane [EMAIL PROTECTED] wrote:
 and when y is updated the new version will be stored in a lower block

Oh?  What makes you think that?  I see no guarantee of it.

You're right, I see only a tendency, because the majority of free space
is before the last block (obviously).  But don't we try to store the new
version on the same block as the old version?  That'd weaken my argument
a bit.

I think you'd have to not move *any* updated
tuples to be sure you don't need the chain-move mechanism.

Yes, per definitionem (move only tuples that are visible to all).

I'm disinclined to mess with VACUUM FULL without a clearer explanation
of where you're headed.

Have no fear.  I won't change anything in the near term.  As you were
talking about the future of VACUUM FULL,  I thought this might be a good
opportunity to ask.  The fact that you didn't outright reject the idea
is good enough for now.

I have no clear explanation at the moment, just some fuzzy ideas that
are beginning to crystallise.  I'm messing around with heap tuple
headers again, and the Xvac field seems to get in the way, unless I can
cut down the number of different scenarios where it is needed.

Servus
 Manfred

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] How to get the Oid of the Primary key relation ??

2004-05-27 Thread Ramanujam H S Iyengar
Can some one help me in the question !!!
Given the Oid of a relation .. how to get the Oid of the Primary key index 
on that relation ?

Thanks in adv,
-Ramu
_
Post Classifieds on MSN classifieds. http://go.msnserver.com/IN/44045.asp 
Buy and Sell on MSN Classifieds.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread pgsql
 [EMAIL PROTECTED] wrote:


As you can see, it takes a LOT longer. That's because the disk has to do
 a
lot more work, the hardware data path has to carry twice as much data,
 and
you have to manage multiple accesses to a single object, the disk.

It doesn't take much imagination to see what this would mean for pg_xlog.



 Well, writing off-reality benchmarks on hardware setups I'd never
 suggest for db server usage proves... what?
 Additionally, do you care about safety?

 Regards,
 Andreas


It was a demostration of the effect that multiple drives has. There is a
very real and measurable I/O bandwidth advantage to be gained by putting
concurrently accessed data on separate data channels. Any test that is
capable fo utilizing multiple I/O channels will show it.

This is not the place to really discuss this, and if you want to persue
this discussion, lets take it off line.

As for RAID5, it has its advantages, as does RAID0 and RAID3, and yes,
RAID1 and higher are pretty safe. A *good* RAID system goes for thousands
or tens of thousands of dollars. A 24x7 redundant storage system may not
be required by everyone. With new IDE with DMA interface cards, it should
be possible to create a very high performance system. A system which will
perform better if the pg_xlog is on a different disk than the data.

As for do I care about safety? Yes, yes I do, but safe and paranoid
are two different things. There is always a cost/benefit analysis for any
system. RAID5 and RAID3 are calculated risks. The bet is that no more than
one drive will fail at any one time. It is a good bet, but not 100%. I've
seen RAID systems have two drive failures at the same time, during the
auto-rebuild to the spare, a second drive dies. Game over.

Since RAID5 is a risk, maybe we should mirror, RAID0+1. That's a risk too,
what if the two mirrored drives die? Yikes, then you're screwed again.

If you want to be safe, you may want RAID5+1, where you mirror two RAID5
systems. That's really safe. You should have each RAID5 system on its own
controller with its own independent battery backed up redundant power
supply. That will be safe. Wait.. What about earth quakes? Shock mounted
racks. Lightning? Surge suppressors on the SCSI cables. Flood? Wrap it in
a zip-lock bag. Al Qaeda? Wrap it in a kevlar!!

There is a cost/benefit analysis for everything. RAIDs are very good and
reasonably safe devices, but the argument that the performance will be the
same as multiple individual disks (which are equivilent to the ones in the
RAID) preferably on different I/O channels is proveable nonsense and you
should know that.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Win32, PITR, nested transactions, tablespaces

2004-05-27 Thread Bruce Momjian
Nearing our June 1 feature freeze, we now have patches for all our major
outstanding features.  Their status is:

Win32 - almost done, fsync needed, bug fixes, etc
PITR - Simon is away until Sunday and we are working on it
Nested transactions - patch needs review, trigger issues
Tablespaces - patch needs review

Congratulations to all the developers who finished up some very complex
patches.  I realize the patches will need adjustments, but it is amazing
they all got done so quickly.

Tom and I will be focused on Win32 and PITR for the next few days it
would help if others could review the other two patches for adjustments.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-05-27 Thread Alvaro Herrera
On Thu, May 27, 2004 at 09:52:30PM +0200, Manfred Koizar wrote:

 I have no clear explanation at the moment, just some fuzzy ideas that
 are beginning to crystallise.  I'm messing around with heap tuple
 headers again, and the Xvac field seems to get in the way, unless I can
 cut down the number of different scenarios where it is needed.

Now you are on the subject, can I ask you to take a peek at what I did
regarding tuple headers?

At first I thought I'd have to add back Xmax as a field on its own, but
later (in chat with Bruce) I arrived to the conclusion that it isn't
really necessary, and I only added a bit to the infomask to flag when
the Cmin is overridden with Xmax.

However I'm not convinced that this is enough --- is there a situation
on which we should need to peek at Cmin after setting Xmax for a
particusar tuple?

The problem was

BEGIN;
insert into foo values (1)
begin
delete from foo
rollback
-- at this point the tuple shold be visible,
-- but it has my Xid as Xmin and Cmin was 
-- overriden with Xmax
commit

I'd appreciate your (Manfred's and Tom's) comments on the topic.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
And as an added bonus, now my computer goes to the toilet for me, leaving me
free to spend time on more useful activities! yay slug codefests! (C. Parker)


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread pgsql
 [EMAIL PROTECTED] wrote:

You are absolutely wrong on all accounts here. A RAID5 system is slower
than a single spindle as it is only as fast as the slowest disk in the
stripe and the overhead of the RAID.

 Huh, what kind of controller do you use... Sounds like some value IDE
 one. I'd never suggest IDE raid5 for DBMS purposes anyway.

Actually, my RAID system, currently on my test system, is fully UWLVD SCSI
with fast spindles.

Here is a logical factual question for you to answer: how can a set of
disks, lets say 7, 6 data drives with one parity, deliver results faster
than the slowest drive in the stripe?

If you say predictive and intelligent caching, yea, maybe, but *all* disks
today have caching, but the initial request still has to wait for the
longest seek time across all spindles and the slowest spindle position.
I've been dealing with RAID systems for almost a decade now, and they are
not a magic bullet.

RAID systems are always slower than their compnent disks. This is the
drawback to using them and a fundimental limitation. A single disk will
average 1/2 spindle seek, assuming its initial head placement is random,
and average 1/2 spindle revolution to track, assuming no out of order
sector access. A RAID system has to wait for the slowest disk, thus while
a single disk can average 1/2 seek and rotation, two disks will not. So,
your raid disk access will ALWAYS be slower or as slow as a single disk
access not including the additional RAID processing.

The advantage to a RAID is that a number of smaller disks can look like a
big disk with some redundency. The advantage to a RAID controller is that
the RAID processing and parity generation overhead is done on an external
device. Using a RAID controller that presents a SCSI LUN is great because
you don't need to trust third party drivers. All in all, RAID is a good
idea, but it isn't faster.

As for IDE RAID, IDE RAID is an awesome idea. SCSI disks are just too
expensive. Infortrend has a cool IDE to SCSI or Fibre RAID system that
rocks.




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread Darcy Buskermolen
On May 27, 2004 01:38 pm, [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] wrote:
 As you can see, it takes a LOT longer. That's because the disk has to do
  a
 lot more work, the hardware data path has to carry twice as much data,
  and
 you have to manage multiple accesses to a single object, the disk.
 
 It doesn't take much imagination to see what this would mean for pg_xlog.
 
  Well, writing off-reality benchmarks on hardware setups I'd never
  suggest for db server usage proves... what?
  Additionally, do you care about safety?
 
  Regards,
  Andreas

 It was a demostration of the effect that multiple drives has. There is a
 very real and measurable I/O bandwidth advantage to be gained by putting
 concurrently accessed data on separate data channels. Any test that is
 capable fo utilizing multiple I/O channels will show it.

 This is not the place to really discuss this, and if you want to persue
 this discussion, lets take it off line.

 As for RAID5, it has its advantages, as does RAID0 and RAID3, and yes,
 RAID1 and higher are pretty safe. A *good* RAID system goes for thousands
 or tens of thousands of dollars. A 24x7 redundant storage system may not
 be required by everyone. With new IDE with DMA interface cards, it should
 be possible to create a very high performance system. A system which will
 perform better if the pg_xlog is on a different disk than the data.

 As for do I care about safety? Yes, yes I do, but safe and paranoid
 are two different things. There is always a cost/benefit analysis for any
 system. RAID5 and RAID3 are calculated risks. The bet is that no more than
 one drive will fail at any one time. It is a good bet, but not 100%. I've
 seen RAID systems have two drive failures at the same time, during the
 auto-rebuild to the spare, a second drive dies. Game over.

 Since RAID5 is a risk, maybe we should mirror, RAID0+1. That's a risk too,
 what if the two mirrored drives die? Yikes, then you're screwed again.

 If you want to be safe, you may want RAID5+1, where you mirror two RAID5
 systems. That's really safe. You should have each RAID5 system on its own
 controller with its own independent battery backed up redundant power
 supply. That will be safe. Wait.. What about earth quakes? Shock mounted
 racks. Lightning? Surge suppressors on the SCSI cables. Flood? Wrap it in
 a zip-lock bag. Al Qaeda? Wrap it in a kevlar!!

 There is a cost/benefit analysis for everything. RAIDs are very good and
 reasonably safe devices, but the argument that the performance will be the
 same as multiple individual disks (which are equivilent to the ones in the
 RAID) preferably on different I/O channels is proveable nonsense and you
 should know that.

One other huge advantage that tablespaces will bring, it the ability to place 
data based on cost ie, you can put your 10 most used tables on fast disk 
(or perhaps solid state devices), and move the seldom used data off onto the 
slower (lower cost) disks/storage array. 



-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] How to get the Oid of the Primary key relation ??

2004-05-27 Thread John Hansen
On Fri, 2004-05-28 at 06:07, Ramanujam H S Iyengar wrote:
 Can some one help me in the question !!!
 
 Given the Oid of a relation .. how to get the Oid of the Primary key index 
 on that relation ?
 
Attached, a couple of function I wrote that might do what you're after.
 
 Thanks in adv,
 
 -Ramu
 
 _
 Post Classifieds on MSN classifieds. http://go.msnserver.com/IN/44045.asp
 Buy and Sell on MSN Classifieds.
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

Regards,

John
--
-- PostgreSQL database dump
--

SET SESSION AUTHORIZATION 'postgres';

--
-- select pk_column('tablename');
-- returns the name of the  primary key column
--
CREATE FUNCTION pk_column(text) RETURNS text
AS '
select attname::text
	from pg_class, pg_constraint, pg_attribute
	where pg_class.oid = conrelid
	and contype=''p'' 
	and attrelid = pg_class.oid 
	and attnum in (array_to_string(conkey,'',''))
	and relname=$1;
'
LANGUAGE sql STABLE;


--
-- select last_insert_id('tablename','columnname');
-- returns currval of the sequence assigned as default to columnname
--
CREATE FUNCTION last_insert_id(text, text) RETURNS bigint
AS '
select currval(
(select 
split_part(adsrc,,2) as seq
from pg_class
join pg_attribute on (pg_class.oid = pg_attribute.attrelid)
join pg_attrdef 
on (pg_attrdef.adnum = pg_attribute.attnum 
and pg_attrdef.adrelid = pg_attribute.attrelid)
where pg_class.relname = $1
and pg_attribute.attname = $2)
);
'
LANGUAGE sql;


--
-- select last_insert_id('tablename');
-- returns currval of the sequence assigned as default to the primary key column
--
CREATE FUNCTION last_insert_id(text) RETURNS bigint
AS '
select last_insert_id($1,pk_column($1));
'
LANGUAGE sql;

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy

Hackers

Are there any default query rewrite rules that kick in, in the absence
of any user-defined RULE or VIEW ?

Also, is there any place that lists any interesting query rewrite
that PG does on queries for perf. improvement ? 

For instance, in the presence of a view or a subquery, does PG do a
subquery to join transformation ?

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] On query rewrite

2004-05-27 Thread Alvaro Herrera
On Thu, May 27, 2004 at 05:14:48PM -0700, Sailesh Krishnamurthy wrote:

 Are there any default query rewrite rules that kick in, in the absence
 of any user-defined RULE or VIEW ?
 
 Also, is there any place that lists any interesting query rewrite
 that PG does on queries for perf. improvement ? 
 
 For instance, in the presence of a view or a subquery, does PG do a
 subquery to join transformation ?

Yes, there are transformations of this sort, but they are not called
query rewrite in the code's terminology, but optimization -- rewrite
(rules and views) happens to the parsed statement, and the optimizer
works on the output of rewriting.  So actually the optimizations happen
whether there were or not rules or views.

The query's path is
SQL - parse - rewrite - optimize - execute

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente (UPM, 1972)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] tablespaces and DB administration

2004-05-27 Thread pgsql

 One other huge advantage that tablespaces will bring, it the ability to
 place
 data based on cost ie, you can put your 10 most used tables on fast disk
 (or perhaps solid state devices), and move the seldom used data off onto
 the
 slower (lower cost) disks/storage array.

Great idea.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Win32, PITR, nested transactions, tablespaces

2004-05-27 Thread Tatsuo Ishii
Hi Bruce,

Is that core's decision that:

o below all four major features will be incorporated into 7.5
o beta freeze will be June 1

In the upcoming big event Linux World (in Tokyo, from June 2 to 4) I
need to present about current status of PostgreSQL development. Even
if cores have not decided yet, kind of information it's likely
happen... would be very valuable for the audience. Suggestion?
--
Tatsuo Ishii

From: Bruce Momjian [EMAIL PROTECTED]
Subject: [HACKERS] Win32, PITR, nested transactions, tablespaces
Date: Thu, 27 May 2004 16:31:57 -0400 (EDT)
Message-ID: [EMAIL PROTECTED]

 Nearing our June 1 feature freeze, we now have patches for all our major
 outstanding features.  Their status is:
 
   Win32 - almost done, fsync needed, bug fixes, etc
   PITR - Simon is away until Sunday and we are working on it
   Nested transactions - patch needs review, trigger issues
   Tablespaces - patch needs review
 
 Congratulations to all the developers who finished up some very complex
 patches.  I realize the patches will need adjustments, but it is amazing
 they all got done so quickly.
 
 Tom and I will be focused on Win32 and PITR for the next few days it
 would help if others could review the other two patches for adjustments.
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy
 Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes:

 For instance, in the presence of a view or a subquery, does PG
 do a subquery to join transformation ?

Alvaro Yes, there are transformations of this sort, but they are
Alvaro not called query rewrite in the code's terminology, but
Alvaro optimization -- rewrite (rules and views) happens to the
Alvaro parsed statement, and the optimizer works on the output of
Alvaro rewriting.  So actually the optimizations happen whether
Alvaro there were or not rules or views.

Interesting .. so these are rule-based then ? Not cost-based ?

I understand that there is a cost-based optimizer anyway that does the
planning and selects the right plan .. but does this come _after_ all
these transformations ? Or does it happen along with the
transformations ? 

Alvaro The query's path is SQL - parse - rewrite - optimize -
Alvaro execute

Can you please point me to the code that indeed does such
transformations ? 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Win32, PITR, nested transactions, tablespaces

2004-05-27 Thread Christopher Kings-Lynne
Nearing our June 1 feature freeze, we now have patches for all our major
outstanding features.  Their status is:
Win32 - almost done, fsync needed, bug fixes, etc
PITR - Simon is away until Sunday and we are working on it
Nested transactions - patch needs review, trigger issues
Tablespaces - patch needs review
Congratulations to all the developers who finished up some very complex
patches.  I realize the patches will need adjustments, but it is amazing
they all got done so quickly.
I should point out to interested testers that the tablespace patch is 
the most mature out of the above list.  It's basically finished.

Chris
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[HACKERS] pg_dump --comment?

2004-05-27 Thread Chris Campbell
I've encountered a situation where I'd like to store some information 
about the database when I do a pg_dump. For instance, the timestamp of 
the dump. And some other information that I pull from the database.

If pg_dump had a --comment flag that allowed me to pass a string that 
would be stored inside the dumpfile, that I could then retrieve in some 
way (without doing a full restore of the dump), that would meet my 
needs. In discussing this idea with other people, it sounded like a 
general-use feature that mankind as a whole could benefit from. :)

Here's what I'm envisioning:
   pg_dump --comment 'This is a comment' more pg_dump args
That would store the comment ('This is a comment') in the dump file 
somehow. The definition of somehow would vary depending on the output 
format (text, tar, or custom). Initially, since I only use the custom 
format, I would only focus on getting it to work with that. But for the 
text format, there could be a SQL comment at the top of the file with

-- COMMENT: This is a comment
or something. In the tar format, there could be a comment file in the 
archive that contains the text This is a comment.

For the custom format...I haven't looked at the format specification, so 
I don't know exactly where the comment would go. It could go at the very 
top of the file, and have a special delimiter after it. pg_restore would 
just skim over the file until the delimiter is reached, and then go on 
about its business. The benefit of this scheme is that any program could 
read the comment -- just open a file and read the bytes until the delimiter.

There could also be a pg_dump or pg_restore option that prints out the 
comment stored in a given dump file, or another binary (pg_comment?) 
that does that.

Is this a desirable feature? Should I work it up like described and 
submit a patch? Any comments/suggestions?

Thanks!
- Chris


smime.p7s
Description: S/MIME Cryptographic Signature


[HACKERS] Dynamic fieldname with NEW structure in trigger function?

2004-05-27 Thread Erwin Brandstetter
Hi List!
I have discussed the following problem with pl/pgsql on irc://freenode/postgresql. 
irc://freenode/postgresql
Christopher Kings-Lynne has asked me to post the problem here as a feature request, so 
here goes:
I think it is best formulated in plpgsql code below.
Kris Jurka has mentioned that something like this might work with pl/tcl and pl/python.
I am not experienced with those however, so I can't comment on that

CREATE FUNCTION trg_log_up_other() RETURNS trigger
  AS '
DECLARE
  sql_txt text;
  up_table text;
  up_field text;
BEGIN
  up_table := TG_ARGV[0];
  up_field := TG_ARGV[1];
  sql_txt := ''UPDATE ''
  || quote_ident(up_table)
  || '' SET somefield = somevalue''
  || '' WHERE ''
  ||  quote_ident(up_field)
  || '' =  ''
  || NEW.staticname_id;
-- ^ !!! last line = EVIL HACK !!!
-- I want a dynamic name passed by TG_ARGV[2] instead of staticname_id.
-- But NEW is not known inside EXECUTE, so it throws an error and does not evaluate.
-- But how can i evaluate a dynamic field if not with EXECUTE?
-- How do i get to the value of the field?
  EXECUTE sql_txt;
  return NEW;
END;
'  LANGUAGE plpgsql;

Thanx for considering.
Please cc me on replies.
Regards
Erwin Brandstetter
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] On query rewrite

2004-05-27 Thread Alvaro Herrera
On Thu, May 27, 2004 at 06:27:47PM -0700, Sailesh Krishnamurthy wrote:
  Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes:
 
  For instance, in the presence of a view or a subquery, does PG
  do a subquery to join transformation ?
 
 Alvaro Yes, there are transformations of this sort, but they are
 Alvaro not called query rewrite in the code's terminology, but
 Alvaro optimization -- rewrite (rules and views) happens to the
 Alvaro parsed statement, and the optimizer works on the output of
 Alvaro rewriting.  So actually the optimizations happen whether
 Alvaro there were or not rules or views.
 
 Interesting .. so these are rule-based then ? Not cost-based ?

 I understand that there is a cost-based optimizer anyway that does the
 planning and selects the right plan .. but does this come _after_ all
 these transformations ? Or does it happen along with the
 transformations ? 

No, there's no rules optimizer, only the cost-based one you already know
of.

 Alvaro The query's path is SQL - parse - rewrite - optimize -
 Alvaro execute
 
 Can you please point me to the code that indeed does such
 transformations ? 

Sorry, I don't know the optimizer code.  You can find a lot of detail in
backend/optimizer/README.  Probably you want to look at what happens to
JOIN_IN nodes, for example, regarding the conversion of a 

WHERE foo IN (SELECT bar FROM ...)

into some kind of join.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
La espina, desde que nace, ya pincha (Proverbio africano)


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy
 Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes:

 I understand that there is a cost-based optimizer anyway that
 does the planning and selects the right plan .. but does this
 come _after_ all these transformations ? Or does it happen
 along with the transformations ?

Alvaro No, there's no rules optimizer, only the cost-based one
Alvaro you already know of.

Okay ...

Alvaro The query's path is SQL - parse - rewrite - optimize -
Alvaro execute
  Can you please point me to the code that indeed does such
 transformations ?

Alvaro Sorry, I don't know the optimizer code.  You can find a
Alvaro lot of detail in backend/optimizer/README.  Probably you
Alvaro want to look at what happens to JOIN_IN nodes, for
Alvaro example, regarding the conversion of a

Couldn't find the README but I'm operating on an older souce base. 

Anyway, thanks for the tips. I think I found what I'm looking for: the
function is probably pull_up_subqueries .. and what it tries to do is
check if a subquery is simple or not .. simple means not having
Aggs or something more complicated. If that's the case, and if some
NULLable conditions are safe then, the subquery gets pulled up -
essentially, a subquery to join transformation.


Now my next question is more subtle .. 

Are these alternatives (pulling up vs not pulling up subqueries)
considered in different plans ? 

Because, if not, it seems that this is really just a query rewrite
.. it's just that it happens as part of the optimizer component. In
fact, there is an implicit rule here in operation (by rule, I don't
mean a pgsql RULE).

Are more such transformations spread around the optimizer component ?
Is there any reason to have it integrated with the planner as opposed
to having it be part of the rewrite component (apart from historical
.. plus the code is solid and works etc.) ? 

Sorry for all the questions .. as I stated before I'm working on a
chapter of a text book that is a case-study of pgsql (the 4th ed
contains case studies of Oracle, DB2 and MSSQL) and the 5th ed is
gonna have pgsql. 

Another question about regular RULE processing .. suppose after
applying a rule the resultant query tree is eligible for another rule,
does pgsql's rule system keep iterating over and over until it reaches
a fixed point or is there some heuristic in operation (just apply the
rules twice ..) ? From my cursory inspection of the code it looks like
the latter, but I'd like to know for sure. 

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Win32, PITR, nested transactions, tablespaces

2004-05-27 Thread Bruce Momjian
Tatsuo Ishii wrote:
 Hi Bruce,
 
 Is that core's decision that:
 
 o below all four major features will be incorporated into 7.5

We don't know.  If they can be added to CVS without major changes, they
will be in 7.5.  As far as I can tell:

Win32 has 98% of its code in CVS, so it will make it
Tablespaces - Christopher says it is ready, and has run tests
PITR - some functionality might be in 7.5, but we aren't sure
Nested transactions - Alvaro thinks it is close, but we don't know
 
 o beta freeze will be June 1

Yes, I think so.  That was Marc's decision actually.

 In the upcoming big event Linux World (in Tokyo, from June 2 to 4) I
 need to present about current status of PostgreSQL development. Even
 if cores have not decided yet, kind of information it's likely
 happen... would be very valuable for the audience. Suggestion?

I think some of them will make it, not sure about all.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump --comment?

2004-05-27 Thread Christopher Kings-Lynne
I've encountered a situation where I'd like to store some information 
about the database when I do a pg_dump. For instance, the timestamp of 
the dump. And some other information that I pull from the database.
I think every dump should dump the timestamp regardless...
Chris
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[HACKERS] hash, rtree and gist ScanLists

2004-05-27 Thread Alvaro Herrera
Hackers,

I am updating the ScanLists so they can be dropped at subtransaction
abort.  Currently they are stored with custom next pointers; however
it would be much cleaner to deal with deleting single items if they were
Lists.  Is it OK if I change them to be so?

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Uno combate cuando es necesario... ¡no cuando está de humor!
El humor es para el ganado, o para hacer el amor, o para tocar el
baliset.  No para combatir.  (Gurney Halleck)


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Win32, PITR, nested transactions, tablespaces

2004-05-27 Thread Bruce Momjian
Tatsuo Ishii wrote:
 Ok, BTW have we decided that next verison is marked as 7.5 or 8.0?

No. Some argue that an 8.0 should only be done if we break backward
compatibility, while others feel major features should cause an 8.0.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] hash, rtree and gist ScanLists

2004-05-27 Thread Christopher Kings-Lynne
I am updating the ScanLists so they can be dropped at subtransaction
abort.  Currently they are stored with custom next pointers; however
it would be much cleaner to deal with deleting single items if they were
Lists.  Is it OK if I change them to be so?
Alvaro,
Have I mentioned that you're a total legend recently!?!  Many thanks for 
working on all this stuff - I'm sure the other hackers would agree!

Chris
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] On query rewrite

2004-05-27 Thread Alvaro Herrera
On Thu, May 27, 2004 at 07:35:56PM -0700, Sailesh Krishnamurthy wrote:

 Anyway, thanks for the tips. I think I found what I'm looking for: the
 function is probably pull_up_subqueries .. and what it tries to do is
 check if a subquery is simple or not .. simple means not having
 Aggs or something more complicated. If that's the case, and if some
 NULLable conditions are safe then, the subquery gets pulled up -
 essentially, a subquery to join transformation.

Hmm, this code has been heavily hacked during the last few versions so
if you want to know the state of the art be sure to check a recent
version (for example if you don't see pull_up_IN_clauses() you are
missing some of the fun.)

 Are these alternatives (pulling up vs not pulling up subqueries)
 considered in different plans ? 

Yes, AFAIU.

 Are more such transformations spread around the optimizer component ?

Yes, AFAIU.

 Is there any reason to have it integrated with the planner as opposed
 to having it be part of the rewrite component (apart from historical
 .. plus the code is solid and works etc.) ? 

The current code uses cost estimation to determine whether to apply them
or not; in some situations they would lead to a more expensive (==
slower) plan, or to using huge amounts of memory (for example hash based
aggregation).

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Dios hizo a Adán, pero fue Eva quien lo hizo hombre.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] On query rewrite

2004-05-27 Thread Tom Lane
Sailesh Krishnamurthy [EMAIL PROTECTED] writes:
 Now my next question is more subtle .. 

 Are these alternatives (pulling up vs not pulling up subqueries)
 considered in different plans ? 

That particular choice is not --- we do it if we can, else not.
Comparisons between different alternatives are always handled by
computing cost estimates and choosing the lowest.  In most cases
that's mechanized as generating Paths for all alternatives and letting
the fittest Path survive.  I think there are one or two places where
there's a more hard-wired cost comparison, because there are only a
couple of possibilities.

 Is there any reason to have it integrated with the planner as opposed
 to having it be part of the rewrite component (apart from historical

Yes --- the rewriter generally does not have as much semantic or
statistical information available as the planner does.

 Another question about regular RULE processing .. suppose after
 applying a rule the resultant query tree is eligible for another rule,
 does pgsql's rule system keep iterating over and over until it reaches
 a fixed point or is there some heuristic in operation (just apply the
 rules twice ..) ?

As of recent releases it expands till it runs out of rules or detects
infinite recursion.  You may be looking at a version that had a
give-up-after-N-levels heuristic instead of actual recursion detection.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Win32, PITR, nested transactions, tablespaces

2004-05-27 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tatsuo Ishii wrote:
 Ok, BTW have we decided that next verison is marked as 7.5 or 8.0?

 No. Some argue that an 8.0 should only be done if we break backward
 compatibility, while others feel major features should cause an 8.0.

In any case, it's premature to have that discussion until we know for
sure which of these features will make it into the release.  (If they
all do, I think there'd be a case for calling it 8.0.)

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org