Re: [pgsql-advocacy] [HACKERS] Changing the default configuration

2003-02-14 Thread Daniel Kalchev
Jason Hihn said:
  Pardon my ignorance, but there's no way to auto-tune? Ship it with a thread
  that gathers statistics and periodically re-tunes the database parameters.
  Of course, be able to turn it off. People that actually take the time to run
  tune manually will turn it off as to not have the overhead or interruption.
  Those that don't care about pg_tune shouldn't care about having a thread
  around retuning. Those that will care will tune manually.

This is related to my proposition, but trouble is, there is not such thing as 
'well tuned database' that will suit all queries. You can tune the database to 
the hardware for example (still remember that old argument on random access 
and fast disks).

It seems the system could 'self-tune' itself on minor choices. I believe it 
does this today for a number of things already. More significant changes 
require the DBA consent and choice - but they need to be well informed of the 
current usage statistics when making the choice.

Daniel


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [pgsql-advocacy] [HACKERS] Changing the default configuration

2003-02-14 Thread Daniel Kalchev
Josh Berkus said:
  How about we take this discussion to the Performance List, where it belongs?

I believe the design and addition of code that collects and outputs the usage patterns 
of the database (statistics) belongs here.

If we take the approach to providing information to tune PostgreSQL based on 
real-world usage, I guess we need at least the following:

- Usage statistics on a per-database or even per-table level. I believe we already 
collect some;
- Statistics analysis tool/routine/process to produce suggestions;
- Of course lots of real-world data to justify the suggestions;

- Can we provide more knobs for tunable parameters that can be applied on a 
per-database or even per-table level. One first candidate might be the FSM?
- Can some of these parameters (when available) to auto-tune? 

Of course, this could move out of pgsql-hackers :)

Daniel


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Michael Paesold
Hannu Krosing [EMAIL PROTECTED] wrote:

 could we just not lock (for more than just to ensure atomic writes) the
 page but instead increment a page version on each write to detect
 changes?

Sounds like Index MVCC..., very nice. ;-)
(Of course I have no clue about feasibility, just liked the idea)

Best Regards,
Michael Paesold

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Tom Lane wrote:
 Kevin Brown [EMAIL PROTECTED] writes:
  I agree with your assessment for the most part, except for PGDATA.
  There's no good reason I can think of for the postmaster to look at
  it.
 
 The other side of that coin is, what's the good reason to remove it?
 There's a long way between I don't want my setup to depend on PGDATA
 and I don't think your setup should be allowed to depend on PGDATA.
 If you don't want to use it, then don't use it.  Why do you need to
 tell me how I'm allowed to run my installation?

I'm not talking about getting rid of ALL dependency on PGDATA in our
entire distribution, only postmaster's.

Recall that the main purpose of making any of these changes at all is
to make life easier for the guys who have to manage the systems that
will be running PostgreSQL.  Agreed?

So: imagine you're the newly-hired DBA and your boss points you to the
system and says administrate the database on that.  You go over to
the computer and start looking around.

You do a ps and see a postmaster process running.  You know that
it's the process that is listening for connections.  The ps listing
only says /usr/bin/postmaster.  No arguments to clue you in,
nothing.  Where do you look to figure out where the data is?  How do
you figure out what port it's listening on?

Well, we're already agreed on how to deal with that question: you look
in /etc/postgresql, and because this is a relatively new install (and
the PostgreSQL maintainers, who are very wise and benevolent, made
that the default location for configs :-), it has a postgresql.conf
file with a line that says data_directory = /var/lib/pgsql.  It
doesn't mention a port to listen to so you know that it's listening on
port 5432.  As a DBA, you're all set.

Now let's repeat that scenario, except that instead of seeing one
postmaster process, you see five.  And they all say
/usr/bin/postmaster in the ps listing.  No arguments to clue you
in or anything, as before.  You might be able to figure out where one
of them is going by looking at /etc/postgresql, but what about the
rest?  Now you're stuck unless you want to do a find (time consuming
and I/O intensive -- a good way to slow the production database down a
bit), or you're knowledgeable enough to use 'lsof' or black magic like
digging into kernel memory to figure out where the config files and
data directories are, or you have enough knowledge to pore through the
startup scripts and understand what they're doing.

Lest you think that this is an unlikely scenario, keep in mind that
most startup scripts, including pg_ctl, currently start the postmaster
without arguments and rely on PGDATA, so a shop that hasn't already
been bitten by this *will* be.  Right now shops that wish to avoid the
trap I described have to go to *extra* lengths: they have to make
exactly the same kinds of changes to the scripts that I'm talking
about us making (putting an explicit '-D $PGDATA' where none now
exists) or they have to resort to tricks like renaming the postmaster
executable and creating a shell script in its place that will invoke
the (renamed) postmaster with '-D $PGDATA'.

It's not so bad if only a few shops have to make those changes.  But
what if it's thousands?  Yeah, the distribution guys can patch the
scripts to do this, but why should they have to?  They, and the shops
that run PostgreSQL, are our customers.


All of that is made possible because the postmaster can use an
inherited PGDATA for the location of the config files and (if the
config files don't say differently in our new scheme) the data
directory, and pg_ctl takes advantage of that fact (as do most startup
scripts that I've seen, that don't just invoke pg_ctl).

I'm not arguing that we should remove the use of PGDATA *everywhere*,
only in postmaster (and then, only postmaster's use of an *inherited*
PGDATA.  It should still set PGDATA so its children can use it).  It
means changing pg_ctl and the startup scripts we ship.  The earlier we
make these changes, the less overall pain there will be in the long
run.


  ... people who are starting things by hand hopefully aren't so
  inflexible as to demand that PGDATA remain treated as-is.
 
 Yes, I could reconfigure my scripts to not depend on this.  You have
 not given me an adequate argument why I should have to.

[By this I'm assuming you're referring to the scripts you use for
testing, and not the ones that ship with the distribution]

I'm not arguing that you should get rid of all the references to
PGDATA in your scripts or anything crazy like that.  The changes I'm
talking about are minor: where you see postmaster without any -D
arguments, you simply add '-D $PGDATA' to it, before any other
arguments that you might also be passing.  That's it.  Nothing else
should be needed.

The reason for removing postmaster's use of an inherited PGDATA is the
same as the reason for making the other changes we already agree
should be made: to make things easier for the guys in the field who
have to 

Re: [HACKERS] Offering tuned config files

2003-02-14 Thread Manfred Koizar
On Fri, 14 Feb 2003 14:12:50 +0800, Christopher Kings-Lynne
[EMAIL PROTECTED] wrote:
Here's a stab at some extra conf files.  Feel free to shoot them down.

No intent to shoot anything down, just random thoughts:

effective_cache_size = 2 (~ 160 MB) should be more adequate for a
256 MB machine than the extremely conservative default of 1000.  I
admit that the effect of this change is hard to benchmark.  A way too
low (or too high) setting may lead the planner to wrong conclusions.

More parameters affecting the planner:
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025

Are these still good defaults?  I have no hard facts, but ISTM that
CPU speed is increasing more rapidly than disk access speed.

In postgresql.conf.sample-writeheavy you have:
commit_delay = 1

Is this still needed with ganged WAL writes?  Tom?

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Offering tuned config files

2003-02-14 Thread Daniel Kalchev
Manfred Koizar said:
  effective_cache_size = 2 (~ 160 MB) should be more adequate for a
  256 MB machine than the extremely conservative default of 1000.  I
  admit that the effect of this change is hard to benchmark.  A way too
  low (or too high) setting may lead the planner to wrong conclusions.

The default on BSD systems is 10% of the total RAM, so on a 256MB machine this 
would be ~26MB or effective_cache_size = 32000.

One could always modify the kernel to support much larger value, but I doubt 
this is done in many cases and the usefulness of larger buffer cache is not 
obvious in the presence of many fsync calls (which might be typicall). I could 
be wrong, of course :)

In any case, the default is indeed low and would prevent using indexes on 
larger tables, where they are most useful.

Daniel


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [MLIST] Re: [HACKERS] location of the configuration files

2003-02-14 Thread David Walker
In reference to determining what port postgres or any program is listening on
On my Redhat Linux machines
netstat --inet -nlp
when run as root
produces a nice list of all programs listening on the network with IP and port 
number the process is listening on, the name of the process and the pid.

The environment used to start each of these postmasters can be found at
cat /proc/${POSTMASTER-PID}/environ | tr \000 \n

I'm not arguing one way or the other on your issue, just hope these tips make 
the black magic a little easier to use.

On Friday 14 February 2003 04:58 am, Kevin Brown wrote:
 Now let's repeat that scenario, except that instead of seeing one
 postmaster process, you see five.  And they all say
 /usr/bin/postmaster in the ps listing.  No arguments to clue you
 in or anything, as before.  You might be able to figure out where one
 of them is going by looking at /etc/postgresql, but what about the
 rest?  Now you're stuck unless you want to do a find (time consuming
 and I/O intensive -- a good way to slow the production database down a
 bit), or you're knowledgeable enough to use 'lsof' or black magic like
 digging into kernel memory to figure out where the config files and
 data directories are, or you have enough knowledge to pore through the
 startup scripts and understand what they're doing.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Offering tuned config files

2003-02-14 Thread Manfred Koizar
On Fri, 14 Feb 2003 14:24:23 +0200, Daniel Kalchev [EMAIL PROTECTED]
wrote:
The default [cache] on BSD systems is 10% of the total RAM, so on a 256MB machine 
this 
would be ~26MB or effective_cache_size = 32000.

I was a bit too Linux-minded, where every peace of memory not needed
for anything else can be used as cache.  Thanks for the clarification.
And sorry for my ignorance.

BTW 26MB ~ effective_cache_size = 3200.

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 If you want ps to display the data dir, you should use -D.  Remember, it
 is mostly important for multiple postmaster, so if you are doing that,
 just use -D, but don't prevent single-postmaster folks from using
 PGDATA.

Perhaps the best compromise would be to change pg_ctl so that it uses
-D explicitly when invoking postmaster.  That's an easy change.

Could you describe how you and other developers use PGDATA?  I'm quite
interested in knowing why there seems to be so much resistance to
removing the potential_DataDir = getenv(PGDATA); line from
postmaster.c.



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 If you want ps to display the data dir, you should use -D.  Remember, it
 is mostly important for multiple postmaster, so if you are doing that,
 just use -D, but don't prevent single-postmaster folks from using
 PGDATA.

Perhaps another reasonable approach would be to put an #ifdef/#endif
around the potential_DataDir = getenv(PGDATA); line in postmater.c
and create a configure option to enable it.  That way you guys get the
behavior you want for testing but production builds could disable it
if that's viewed as desirable.  You'd want to make the error message
that's produced when no data directory is specified depend on the same
#ifdef variable, of course.

Then the group would get to fight it out over whether the configure
default should be enable or disable.  :-)


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Martin Coxall

  If you are interested in reading a contrary position, you can read
  Berstein's arguments for his recommended way to install services at:
  http://cr.yp.to/unix.html

But since DJB is a class-A monomaniac, he may not be the best person to
listen to. /var/qmail/control for qmail configuration files? Yeah, good
one, DJB.

-- 
Martin Coxall [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Martin Coxall

 Generally things that live in /etc are owned and operated by the OS.  
 Postgresql, by it's definition is a userspace program, not an OS owned 
 one.

Partially true. The FHS specifies that the /etc top layer is for system-own3d 
stuff, but the subdirectories off it are explicitly used for user space programs
and, well, everything. (/etc/apache, /etc/postgres, /etc/tomcat3, /etc/tomcat4...)

Martin Coxall


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Martin Coxall
On Thu, 2003-02-13 at 20:28, Steve Crawford wrote:
 I don't see why we can't keep everyone happy and let the users choose the 
 setup they want. To wit, make the following, probably simple, changes:
 
 1) Have postgresql default to using /etc/postgresql.conf

/etc/postgres/postgresql.conf, if we want to be proper FHS-bitches.

 2) Add a setting in postgresql.conf specifying the data directory
 3) Change the meaning of -D to mean use this config file
 4) In the absence of a specified data directory in postgresql.conf, use the 
 location of the postgresql.conf file as the data directory

Shouldn't it in that case default to, say /var/lib/postgres?

-- 
Martin Coxall [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 I just set PGDATA in my login and I don't have to deal with it
 again.

Hmm...you don't use pg_ctl to start/stop/whatever the database?  You
invoke the postmaster directly (I can easily see that you would, just
asking if you do)?



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[HACKERS] unsubscribe

2003-02-14 Thread John Liu
unsubscribe


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 I just set PGDATA in my login and I don't have to deal with it
 again.

DuhI just realized a reason you guys might care about this so
much.

It's because you want to be able to start the postmaster from within a
debugger (or profiler, or whatever), and you don't want to have to
mess with command line options from there, right?


Sounds like fixing pg_ctl to use -D explicitly when invoking the
postmaster is the right change to make here, since that's probably how
the majority of the production shops are going to be starting the
database anyway.  Takes care of the majority of the visibility
problem and leaves PGDATA intact.  Thoughts?



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Vince Vielhaber
On 14 Feb 2003, Martin Coxall wrote:


   If you are interested in reading a contrary position, you can read
   Berstein's arguments for his recommended way to install services at:
   http://cr.yp.to/unix.html

 But since DJB is a class-A monomaniac, he may not be the best person to
 listen to. /var/qmail/control for qmail configuration files? Yeah, good
 one, DJB.

I'm guessing that rather than reading it the above mentioned link you
chose to waste our time with this instead.  Good one, MC.

Vince.
-- 
 Fast, inexpensive internet service 56k and beyond!  http://www.pop4.net/
   http://www.meanstreamradio.com   http://www.unknown-artists.com
 Internet radio: It's not file sharing, it's just radio.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Martin Coxall
On Fri, 2003-02-14 at 14:21, Vince Vielhaber wrote:
 On 14 Feb 2003, Martin Coxall wrote:
 
 
If you are interested in reading a contrary position, you can read
Berstein's arguments for his recommended way to install services at:
http://cr.yp.to/unix.html
 
  But since DJB is a class-A monomaniac, he may not be the best person to
  listen to. /var/qmail/control for qmail configuration files? Yeah, good
  one, DJB.
 
 I'm guessing that rather than reading it the above mentioned link you
 chose to waste our time with this instead.  Good one, MC.

Yeah, I've read it several times, and have often linked to it as an
example of why one should be wary of DJB's software. It seems to me that
since DJB doesn't follow his own advice regarding the filesystem
hierarchy (see both qmail and djbdns), it'd be odd for him to expect
anyone else to. *Especially* seing as he's a bit mental. (I'm not going
to take this any more. I demand cross-platform compatibility!)

-- 
Martin Coxall [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 OK, here is an updated proposal.  I think we have decided:
 
   Moving postmaster.pid and postmaster.opts isn't worth it.
 
   We don't want per-file GUC variables, but assume it is in
   the same config directory as postgresql.conf.  I don't
   see any valid reason they would want to put them somewhere
   different than postgresql.conf.
 
 So, we add data_dir to postgresql.conf, and add -C/PGCONFIG to
 postmaster.

Agreed.  One additional thing: when pg_ctl invokes the postmaster, it
should explicitly specify -C on the postmaster command line, and if it
doesn't find a data_dir in $PGCONFIG/postgresql.conf then it should
explicitly specify a -D as well.  Pg_ctl is going to have to be
modified to take a -C argument anyway, so we may as well go all the
way to do the right thing here.

This way, people who start the database using the standard tools we
supply will know exactly what's going on when they get a ps listing.



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 
 Give it up.  As long as we have -D, we will allow PGDATA. If you don't
 want to use it, don't use it.

Agreed.

I'm not sure I see how this diminishes the argument for fixing pg_ctl
so that it passes an explicit -D option to the postmaster when
invoking it...



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
  This way, people who start the database using the standard tools we
  supply will know exactly what's going on when they get a ps listing.
 
 No.  If you want ps to display, don't use environment variables. Many
 don't care --- especially those with only one postmaster.

You know that the code in pg_ctl doesn't send an explicit -D to the
postmaster even if pg_ctl itself is invoked with a -D argument, right?
The only way to make pg_ctl do that is by using the -o option.

A typical vendor-supplied install is going to invoke pg_ctl to do the
dirty work.  That's why I'm focusing on pg_ctl.

I completely understand your need for keeping PGDATA in postmaster.  I
don't understand why pg_ctl *shouldn't* be changed to invoke
postmaster with an explicit -D option.  It might be desirable for ps
to not show any arguments to postmaster in some circumstances (I have
no idea what those would be), but why in the world would you want that
to be the *default*?  Why would we want the default behavior to make
things harder on administrators and not easier?


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The other side of that coin is, what's the good reason to remove it?
 There's a long way between I don't want my setup to depend on PGDATA
 and I don't think your setup should be allowed to depend on PGDATA.
 If you don't want to use it, then don't use it.  Why do you need to
 tell me how I'm allowed to run my installation?

 I'm not talking about getting rid of ALL dependency on PGDATA in our
 entire distribution, only postmaster's.

We're obviously talking past each other.  You are arguing that under
circumstances X, Y, or Z, depending on a PGDATA setting is a bad idea.
You are then drawing the conclusion that I should not be allowed to
depend on PGDATA, whether or not I care about X, Y, or Z.

I am happy to design an arrangement that allows you not to depend on
PGDATA if you don't want to.  But I don't see why you need to break
my configuration procedures in order to fix yours.  As I outlined last
night, it's possible to do what you want without breaking backwards
compatibility for those that like PGDATA.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Vince Vielhaber
On 14 Feb 2003, Martin Coxall wrote:

 On Fri, 2003-02-14 at 14:21, Vince Vielhaber wrote:
  On 14 Feb 2003, Martin Coxall wrote:
 
  
 If you are interested in reading a contrary position, you can read
 Berstein's arguments for his recommended way to install services at:
 http://cr.yp.to/unix.html
  
   But since DJB is a class-A monomaniac, he may not be the best person to
   listen to. /var/qmail/control for qmail configuration files? Yeah, good
   one, DJB.
 
  I'm guessing that rather than reading it the above mentioned link you
  chose to waste our time with this instead.  Good one, MC.

 Yeah, I've read it several times, and have often linked to it as an
 example of why one should be wary of DJB's software. It seems to me that
 since DJB doesn't follow his own advice regarding the filesystem
 hierarchy (see both qmail and djbdns), it'd be odd for him to expect
 anyone else to. *Especially* seing as he's a bit mental. (I'm not going
 to take this any more. I demand cross-platform compatibility!)

I seriously doubt your ability to judge anyone's mental stability.
I can also see that you prefer cross-platform INcompatibility.  Your
position and mindset are now crystal clear.

Vince.
-- 
 Fast, inexpensive internet service 56k and beyond!  http://www.pop4.net/
   http://www.meanstreamradio.com   http://www.unknown-artists.com
 Internet radio: It's not file sharing, it's just radio.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Offering tuned config files

2003-02-14 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 In postgresql.conf.sample-writeheavy you have:
   commit_delay = 1
 Is this still needed with ganged WAL writes?  Tom?

I doubt that the current options for grouped commits are worth anything
at the moment.  Chris, do you have any evidence backing up using
commit_delay with 7.3?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Tom Lane wrote:
 I am happy to design an arrangement that allows you not to depend on
 PGDATA if you don't want to.  But I don't see why you need to break
 my configuration procedures in order to fix yours.  As I outlined last
 night, it's possible to do what you want without breaking backwards
 compatibility for those that like PGDATA.

Yes, I agree.  I hadn't really thought of all the possible benefits of
PGDATA.  Sorry.  :-(

Would you agree that it would be a beneficial change to have pg_ctl
pass explicit arguments to postmaster?  It would go a long way towards
eliminating most of the situations I described.

A warning in the documentation about the consequences of using PGDATA
might not be a bad idea, either...


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes:
 I'm quite interested in knowing why there seems to be so much resistance to
 removing the potential_DataDir = getenv(PGDATA); line from
 postmaster.c.

Backwards compatibility.  Also, you still haven't explained why
I don't want to use PGDATA should translate to no one should
be allowed to use PGDATA.  If you don't set PGDATA, what problem
is there for you in that line being there?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Incremental backup

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 
 OK, once we have PITR, will anyone want incremental backups?

None of my database references (Date's Introduction to Database
Systems and Garcia-Molina's Database Systems - The Complete Book,
in particular) seem to talk about PITR at all.  At least, there's no
index entry for it.  And a google search for point in time recovery
yields mostly marketing fluff.

Is there a good reference for this that someone can point me to?  I'm
interested in exactly how it'll work, especially in terms of how logs
are stored versus the main data store, effects on performance, etc.

Thanks, and sorry for the newbie question.  :-(


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Tom Lane
Martin Coxall [EMAIL PROTECTED] writes:
 Partially true. The FHS specifies that the /etc top layer is for system-own3d
 stuff, but the subdirectories off it are explicitly used for user space programs
 and, well, everything. (/etc/apache, /etc/postgres, /etc/tomcat3,
 /etc/tomcat4...)

FHS or no FHS, I would think that the preferred arrangement would be to
keep Postgres' config files in a postgres-owned subdirectory, not
directly in /etc.  That way you need not be root to edit them.  (My idea
of an editor, Emacs, always wants to write a backup file, so I dislike
having to edit files that live in directories I can't write.)

Here's a pretty topic for a flamewar: should it be /etc/postgres/ or
/etc/postgresql/ ?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Robert Treat
On Thu, 2003-02-13 at 19:22, Adam Haberlach wrote:
 On Thu, Feb 13, 2003 at 05:59:17PM -0500, Robert Treat wrote:
  On Thu, 2003-02-13 at 15:08, mlw wrote:
   Stephan Szabo wrote:
   
   On Thu, 13 Feb 2003, mlw wrote:
  Personally I think a postgresql installation is much more like an apache
  installation, which generally contains all of the files (data and
  config) under /usr/local/apache. Maybe someone can dig more to see if
  that system is more appropriate a comparison than something like bind.
 
   I think you are making a pretty uninformed, if not just plain wrong 
 generalization.  I've run exactly one system with apache configuration 
 files in /usr/local/apache, and even then, the data was not there.

Uh... the last time I built apache from source, it stuck everything
under /usr/local/apache. It uses a conf directory for the config files,
and htdocs for the data files...  That is it's default configuration.

snip stories of all the different ways people run apache

You know, this is why I actually suggested looking closer at apache. By
default, everything is crammed in one directory, but if you want to, you
can configure it six different ways to sunday. That seems to be a big
plus IMO

 
 What does this mean?
 
 People will put things in different places, and there are typically
 very good reasons for this.  This is ESPECIALLY true when one wants to
 have configuration files, at least the base ones in a common place such
 as /etc or /usr/local/etc in order to make backup of configuration easy
 and clean, while leaving data somewhere else for performance or magnitude
 of partition reasons.  It just makes sense to ME to have postgresql.conf
 reside in /etc, yet put my data in /var/data/postgresql, yet retain the
 option to put my data in /raid/data/postgresql at a later date, when the
 new hardware comes in.

Is anyone arguing against this? I'm certainly not. But maybe my needs
are more varied than yours. On my local development box, I run multiple
versions of apache, compiled with different versions of php. It really
helps to keep all of apache's stuff centralized, and using things like
rpms actually overly complicates this. Now sure, that's a development
machine, but on the phppgadmin demo server, which is essentially a
production system, I run three different versions of postgresql. In
fact, I need to upgrade two of those (to 7.2.4 and 7.3.2), I shudder to
think about doing that if postgresql forced me to use the /etc/
directory for all of my config files. Now sure, this probably isn't
typical use, but I would say that when it comes time to upgrade major
versions, unless you running an operation where you can have large
amounts of downtime, postgresql needs to have the ability to have
multiple versions install that don't conflict with each other, and it
needs to do this easily. The upgrade process is hard enough already.

snip 
 However, this seems, to me, to be a very small addition that has some real-world
 (and yes, we need to start paying attention to the real world) advantages.
 
 And finally, don't go telling me that I'm wrong to put my data and config files
 where I am.  You can offer advice, but I'm probably going to ignore it because
 I like where they are and don't need to explain why.
 

Have I wronged you in some former life? I've very little concern for
where you put your data files, and have no idea why you'd think I'd
criticize your setup. 

Robert Treat


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Curtis Faith
tom lane wrote:

 How does that help?  The left-moving indexscan still has no 
 way to recover.  It can't go back to the page it was on 
 before and try to determine which entries you added there, 
 because it has no reliable reference point to do so.  The 
 entry it previously returned might not be there anymore, and 
 in a non-unique index key comparisons won't help. And even 
 more to the point, how would it know you've changed the left 
 page?  It has no idea what the previous page version on the 
 left page was, because it was never there before.

Another way this could be handled is by not merging onto one of the
existing pages but to a brand new page, a kind of special case shadow
index page. That way the sibling pointers, and leaf page pointer in the
parent could all be updated atomically to point to the new page. 

In-process index scans would still reference the merged pages which
would not be deleted but marked as dead using a mechanism like you
proposed for marking empty pages dead with the next-transaction-ID
counter. 

Merging could be done after a VACUUM pass that performs deletion of
empty pages in order to provide a pool of empty pages to use for the
merge. This would keep the index from temporarily growing during the
merge process.


A similar approach could be used to reorder the index pages in the
background. An index that was reordered to fairly closely reflect the
tree as a breadth first traversal would provide much faster index scans
if the file is not heavily fragmented.

- Curtis



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 My point is that folks with multiple postmasters may not want to use
 PGDATA, but for folks who have single postmasters, it makes things
 easier and less error-prone.

Actually, for multi postmasters too.  I have little shell-environment
config files that switch my entire world view between different source
trees and installation trees, for example this one sets me up to mess
with the 7.2 branch:

STDPATH=${STDPATH:-$PATH}
STDMANPATH=${STDMANPATH:-$MANPATH}

PGSRCROOT=$HOME/REL7_2/pgsql
PGINSTROOT=$HOME/version72
PATH=$PGINSTROOT/bin:/opt/perl5.6.1/bin:$STDPATH
MANPATH=$PGINSTROOT/man:$STDMANPATH
PGLIB=$PGINSTROOT/lib
PGDATA=$PGINSTROOT/data
PMOPTIONS=-p 5472 -i -F
PMLOGFILE=server72.log

export PGSRCROOT PGINSTROOT PATH MANPATH PGLIB PGDATA STDPATH STDMANPATH
export PMOPTIONS PMLOGFILE

After sourcing one of these, I can use pg_ctl as well as a half dozen
other convenient little scripts that do things like remake and reinstall
the backend:

#!/bin/sh

pg_ctl -w stop

cd $PGSRCROOT/src/backend

make install-bin

startpg

or this one that fires up gdb on a crashed backend:

#!/bin/sh

# Usage: gdbcore

cd $HOME

CORES=`find $PGDATA/base -name core -type f -print`

if [ x$CORES != x ]
then
ls -l $CORES
fi

if [ `echo $CORES | wc -w` -eq 1 ]
then
exec gdb $PGINSTROOT/bin/postgres $CORES
else
exec gdb $PGINSTROOT/bin/postgres
fi

This is vastly less error-prone than keeping track of the various
related elements in my head.

Now, it's certainly true that I could still make this work if I had
to explicitly say -D $PGDATA to the postmaster.  But that would clutter
my ps display.  I am happy with -p as the ps indicator of which
postmaster is which; I don't want more stuff in there.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Manfred Koizar
On Wed, 12 Feb 2003 17:42:44 -0500, Tom Lane [EMAIL PROTECTED]
wrote:
Instead of an actively maintained freelist on disk as per Alvaro Herrera's
patch, I plan to use the FSM to remember where recyclable pages are, much
as we do for tables.

Given that we have a mostly empty metapage per index, and the metapage
is in memory most of the time, using it for the freelist looks almost
like a free lunch.

I've picked up Alvaro's patch and played around a bit.  Reviewing my
changes it turns out that most of them deal with the freelist.  If you
are interested I can send my version of the patch to you or to the
list.

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [pgsql-advocacy] [HACKERS] Changing the default configuration

2003-02-14 Thread Robert Treat
On Fri, 2003-02-14 at 03:00, Daniel Kalchev wrote:
 Jason Hihn said:
   Pardon my ignorance, but there's no way to auto-tune? Ship it with a thread
   that gathers statistics and periodically re-tunes the database parameters.
   Of course, be able to turn it off. People that actually take the time to run
   tune manually will turn it off as to not have the overhead or interruption.
   Those that don't care about pg_tune shouldn't care about having a thread
   around retuning. Those that will care will tune manually.
 
 This is related to my proposition, but trouble is, there is not such thing as 
 'well tuned database' that will suit all queries. You can tune the database to 
 the hardware for example (still remember that old argument on random access 
 and fast disks).
 
 It seems the system could 'self-tune' itself on minor choices. I believe it 
 does this today for a number of things already. More significant changes 
 require the DBA consent and choice - but they need to be well informed of the 
 current usage statistics when making the choice.
 

I agree. Given that we don't have solid explanations on telling people
how to tune the different parameters, nor do we have enough mechanisms
for actually giving people the information they need to determine the
changes they need, a complete auto-tune seems premature.

Robert Treat



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Shridhar Daithankar[EMAIL PROTECTED]
On Friday 14 Feb 2003 9:05 pm, you wrote:
 Martin Coxall [EMAIL PROTECTED] writes:
 Here's a pretty topic for a flamewar: should it be /etc/postgres/ or
 /etc/postgresql/ ?

I vote for /etc/pgsql. Keeping in line of unix philosophy of cryptic and short 
names. Who wants a descriptive names anyway..:-)

Seriously, the traffic on last three days ahd very high noise ratio. 
Especially the whole discussion of PGDATA stuff fails to register as 
significant IMO. Right now, I can do things the way I want to do and I guess 
it is pretty much same with everyone else. Is it last topic left to improve?

Keep it simple and on tpoic guys. This is hackers. Keep it low volume 
otherwise, two years down the lines, archives will be unsearchable..

 Shridhar

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread scott.marlowe
On Thu, 13 Feb 2003, mlw wrote:

 
 
 scott.marlowe wrote:
 
 These are not issues at all. You could put the configuration file 
 anywhere, just as you can for any UNIX service.
 
 postmaster --config=/home/myhome/mydb.conf
 
 I deal with a number of PG databases on a number of sites, and it is a 
 real pain in the ass to get to a PG box and hunt around for data 
 directory so as to be able to administer the system. What's really 
 annoying is when you have to find the data directory when someone else 
 set up the system.
 
 
 
 Really?  I would think it's easier to do this:
 
 su - pgsuper
 cd $PGDATA
 pwd
 
 Than to try to figure out what someone entered when they ran ./configure 
 --config=...
   
 
 Why do you think PGDATA would be set for root?

Did you not notice the su - pgsuper line above?  You know, the one where 
you become the account that runs that instance of the database.  Again, I 
ask you, isn't that easier than trying to find out what someone typed when 
they typed ./configure --config=?

 Configuring postgresql via a configuration file which specifies all the 
 data, i.e. data directory, name of other configuration files, etc. is 
 the right way to do it. Even if you have reasons against it, even if you 
 think it is a bad idea, a bad standard is almost always a better 
 solution than an arcane work of perfection.
 
 
 
 Wrong, I strongly disagree with this sentament.  Conformity to standards 
 for simple conformity's sake is as wrong as sticking to the old way 
 because it's what we're all comfy with.  
 
 It isn't conformity for conformitys sake. It is following an established 
 practice, like driving on the same side of the road or stopping at red 
 lights.

But this isn't the same thing at all.  Apache, when built from a tar ball, 
goes into /usr/local/apache/ and ALL it's configuration files are there.  
When installed as a package, my OS manufacturer decides where that goes.  
Those are the two standard ways of doing things.  I like that postgresql 
installs into the /usr/local/pgsql directory from a tar ball.  I like the 
fact that it uses $PGDATA to tell it where the cluster is, so that all my 
scripts, like pg_ctl, just know where it is without a -D switch each time.



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 Given that we have a mostly empty metapage per index, and the metapage
 is in memory most of the time, using it for the freelist looks almost
 like a free lunch.

No, because of locking.  Every time you write-lock the metapage to add
or remove freelist entries, you are denying all other processes the
ability to start an index scan.  Check the btree literature ---
exclusive locks near the root of the tree are death for concurrent
performance, and must be avoided as much as possible.

If I were planning to use a freelist I would keep it in a different page
so as not to need to lock the metapage for freelist manipulations.  But
I don't see the value of having one at all.  It just adds that much more
disk traffic (and WAL traffic) for each page split or merge.  There are
also atomicity concerns --- is addition/removal of a freelist entry an
atomic part of the page merge or split operation, or is it a separate
atomic operation with its own WAL record?  If the former, you have
deadlocking problems; if the latter, post-crash-consistency problems.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Offering tuned config files

2003-02-14 Thread Rod Taylor
On Fri, 2003-02-14 at 07:41, Manfred Koizar wrote:
 On Fri, 14 Feb 2003 14:24:23 +0200, Daniel Kalchev [EMAIL PROTECTED]
 wrote:
 The default [cache] on BSD systems is 10% of the total RAM, so on a 256MB machine 
this 
 would be ~26MB or effective_cache_size = 32000.
 
 I was a bit too Linux-minded, where every peace of memory not needed
 for anything else can be used as cache.  Thanks for the clarification.
 And sorry for my ignorance.

I think you're getting the two confused. I'm not sure about linux, but
on BSD (FreeBSD) the cache and buffer are mostly unrelated.

   Cache: number of pages used for VM-level disk caching
 
   Buf:   number of pages used for BIO-level disk caching
-- 
Rod Taylor [EMAIL PROTECTED]

PGP Key: http://www.rbt.ca/rbtpub.asc



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] location of the configuration files

2003-02-14 Thread scott.marlowe
On Fri, 14 Feb 2003, Curt Sampson wrote:

 On Thu, 13 Feb 2003, scott.marlowe wrote:
 
  If I do a .tar.gz install of apache, I get /usr/local/apache/conf, which
  is not the standard way you're listing.
 
 I'm going to stay out of this argument from now on, but this struck a sore
 point.
 
 /usr is designed to be a filesystem that can be shared. Is the stuff in
 /usr/local/apache/conf really supposed to be shared amongst all machines
 of that architecture on your site that run apache?

Interesting.  I've always viewed usr EXCEPT for local this way.  In 
fact, on most of my boxes I create a seperate mount point for /usr/local 
so it's easier to backup and maintain, and it doesn't fill up the /usr 
directory.

Asking for everything in a directory with the name local in it to be 
shared is kind of counter intuitive to me.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] postgresql 7.3 versus 7.2

2003-02-14 Thread Ulf Rehmann

Hi!

Is there any way to adjust dumps from postgresql 7.3 for use by
version 7.2 and vice versa ?

Or is there a way to transport a full database via csv or in any other
kind to 7.2 and vice versa ?

We are running a couple of servers/mirrors at various locations in the
world, handling data based on postgresql, and now we have trouble to
transport data from and to sites with different versions.

(E.g. http://genealogy.math.ndsu.nodak.edu/
  http://genealogy.mathematik.uni-bielefeld.de/ )

Some have upgraded and others cannot immediately follow, since an
upgrade from 7.2 to 7.3 seems to require an upgrade of php4 and maybe
even of apache, which takes time etc.

So it would be helpful to have an easy tool to convert between data
for both versions.

Any help or recommendation is very much appreciated!

Best regards,
Ulf Rehmann

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Problems with insert rule called from plpython

2003-02-14 Thread Pavel Hanak
Hello,

I've noticed one problem by upgrading from postgresql-7.3.1 to 7.3.2.
The example of this problem looks like this:

   create table test (a int, b text);
   create view testview as select * from test;

   create or replace rule testview_ins as
   on insert to testview do instead
   (
   insert into test values (1, 'a');
   insert into test values (2, 'b');
   );

   create or replace function testfun() returns text as
   '
   plpy.execute(insert into testview values (3, ''c''))
   return test
   ' language plpython;

Now calling select testfun() shows this fatal error:

   FATAL:  SPI: improper call to spi_dest_setup
   server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
   The connection to the server was lost. Attempting reset: Succeeded.


This example worked in postgresql-7.3.1. In postgresql-7.3.2 works only
one command used in instead:

   create or replace rule testview_ins as
   on insert to testview do instead
   (
   insert into test values (1, 'a');
   );

Can you explain me this problem? Is it possible to reply also to my
email address, since I am not member of any postgresql mailing list.
Thanks.

-- 
Pavel Hanak


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Views and unique indicies optimisation

2003-02-14 Thread Robert Osowiecki
Here's an example:

create view master_detail
as select * from master 
left join detail on (master.detail_key=detail.key);

I noticed that PostgreSQ performs join even when I execute

select some_master_fields from master_detail;

but there exists an unique index on detail.key, so above query
gives the same result as

select some_master_fields from master;

but with much loss in performance, of course.

This problem emerged when I replaced renundant table in poorly
designed database with two tables, view and set of rules.
I thought that I was very clever, because I didn't have to 
replace any code in applications using this database, but
it seems that there's always a catch...

May I have hope that such optimisation will be included in
TODO list in near future?

-- 
   
 [EMAIL PROTECTED]  
 http://rainbow.mimuw.edu.pl/~robson 
   

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Changing the default configuration (was Re: [pgsql-advocacy]

2003-02-14 Thread Jason Hihn
Nutshell:
   Easy to install but is horribly slow.

   or

   Took a couple of minutes to configure and it rocks!

Since when is it easy to install on win32?
The easiest way I know of is through Cygwin, then you have to worry about
installing the IPC service (an getting the right version too!) I've
installed versions 6.1 to 7.1, but I almost gave up on the windows install.
At least in 6.x you had very comprehensive installation guide with a TOC.

Versus the competition which are you going to choose if you're a wanna-be
DBA? The one with all he hoops to jump through, or the one that comes with a
setup.exe?

Now I actually am in support of making it more aggressive, but it should
wait until we too have a setup.exe for the native windows port. (Changing it
on *n*x platforms is of little benefit because most benchmarks seem to run
it on w32 anyway :-( )

Just my $.02. I reserve the right to be wrong.
-J


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Incremental backup

2003-02-14 Thread Christopher Browne
A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Kevin Brown) wrote:
 Bruce Momjian wrote:
 
 OK, once we have PITR, will anyone want incremental backups?

 None of my database references (Date's Introduction to Database
 Systems and Garcia-Molina's Database Systems - The Complete Book,
 in particular) seem to talk about PITR at all.  At least, there's no
 index entry for it.  And a google search for point in time recovery
 yields mostly marketing fluff.

Well, from an academic DBMS standpoint, it isn't terribly
interesting, since it involves assumptions of messy imperfection that
academics prefer to avoid.  

And that's not intended to insult the academics; it is often
reasonable to leave that out of scope much as an academic OS
researcher might prefer to try to avoid putting attention on things
like binary linkers, text file editors, and SCM systems like CVS,
which, while terribly important from a practical standpoint, don't
make for interesting OS research.

 Is there a good reference for this that someone can point me to?
 I'm interested in exactly how it'll work, especially in terms of how
 logs are stored versus the main data store, effects on performance,
 etc.

 Thanks, and sorry for the newbie question.  :-(

Unfortunately, the best sources I can think of are in the O-Word
literature, and the /practical/ answers require digging into really
messy bits of the documentation.

What it amounts to is that anyone that isn't a near-O*-guru would be
strongly advised not to engage in PITR activity.  

It doesn't surprise me overly that the documentation is poor: those
that can't figure it out despite the challenges almost surely
shouldn't be using the functionality...

What PITR generally consists of is the notion that you want to recover
to the state at a particular moment in time.

In O*-nomenclature, this means that you recover as at some earlier
moment for which you have a good backup, and then re-apply changes,
which in their terms, are kept in archive logs, which are somewhat
analagous to WAL files.
-- 
If this was helpful, http://svcs.affero.net/rm.php?r=cbbrowne rate me
http://www3.sympatico.ca/cbbrowne/x.html
We blew it -- too big, too slow... - Bill Gates talking about NT, as
noted by Steven McGeady of Intel during a meeting with Gates

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 The big question is whether PGDATA is still our driving config variable,
 and PGCONFIG/-C is just an additional option, or whether we are moving
 in a direction where PGCONFIG/-C is going to be the driving value, and
 data_dir is going to be read as part of that.

I'm actually leaning towards PGCONFIG + PGDATA.

Yeah, it may be a surprise given my previous arguments, but I can't
help but think that the advantages you get with PGDATA will also exist
for PGCONFIG.

My previous arguments for removing PGDATA from postmaster can be dealt
with by fixing pg_ctl to use explicit command line directives when
invoking postmaster -- no changes to postmaster needed.  PGCONFIG
would be no different in that regard.


Sorry if I seem a big gung-ho on the administrator point of view, but
as a system administrator myself I understand and feel their pain.
:-)



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruno Wolff III
On Thu, Feb 13, 2003 at 11:53:26 -0500,
  mlw [EMAIL PROTECTED] wrote:
 
 Where, specificaly are his arguements against a configuration file 
 methodology?

I don't think he is argueing against a configuration methodology, but
rather against the methodology being used in Unix distributions.
In particular he doesn't file the Linux File Standard because it
puts the same software in different places depending on whether the
vendor or using installed it.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Greg Stark

scott.marlowe [EMAIL PROTECTED] writes:

 But this isn't the same thing at all.  Apache, when built from a tar ball, 
 goes into /usr/local/apache/ and ALL it's configuration files are there.  

Two comments:

1) Even in that case the config files go into /usr/local/apache/conf and the
   other kinds of files like data logs and cache files, all go in other
   subdirectories.

2) What you describe is only true if you configure with the default
   --with-layout=Apache. The naming should perhaps be a clue that this isn't
   a conventional layout. If you configure with --with-layout=GNU you get the
   conventional Unix layout in /usr/local, If you use --with-layout=RedHat you
   get the conventional layout in /usr directly which is mainly useful for
   distribution packagers.

Putting stuff in a subdirectory like /usr/local/apache or /usr/local/pgsql is
unfortunately a widespread practice. It does have some advantages over the
conventional layout in /usr/local/{etc,bin,...} directly. But the major
disadvantage is that users can't run programs without adding dozens of entries
to their paths, can't compile programs without dozens of -L and -I lines, etc.

GNU autoconf script makes it pretty easy to configure packages to work either
though, and /usr/local is the purview of the local admin. As long as it's easy
to configure postgres to install properly with --prefix=/usr/local it won't
be any more of an offender than lots of other packages like apache, kde, etc.

Though I'll mention, please make it $prefix/etc not $prefix/conf. No need to
be gratuitously non-standard on an arbitrary name, and no need to pollute
/usr/local with multiple redundant directories.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Do we always need the socket file?

2003-02-14 Thread Greg Stark
Michael Brusser [EMAIL PROTECTED] writes:

 We're trying to avoid creating any unnecessary files, especially outside
 of the product installation tree. Look at this as a policy.
 Experience shows that sooner or later some of your customers ask you:
 what is this /tmp/.s.PGSQL.xxx file is? What do I need it for?
 Also some admins known to periodically clean out /tmp, /var/run, etc.

Well if you clean out files programs are using you should expect those
programs to break. Other programs that create sockets in /tmp include screen
and X for example.

Unix domain sockets are sometimes more efficient and sometimes more secure
than TCP/IP connections. So preferring TCP/IP just to avoid /tmp pollution
might be a bit of a loss for aesthetic value.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruno Wolff III
On Fri, Feb 14, 2003 at 02:58:49 -0800,
  Kevin Brown [EMAIL PROTECTED] wrote:
 
 Lest you think that this is an unlikely scenario, keep in mind that
 most startup scripts, including pg_ctl, currently start the postmaster
 without arguments and rely on PGDATA, so a shop that hasn't already
 been bitten by this *will* be.  Right now shops that wish to avoid the
 trap I described have to go to *extra* lengths: they have to make
 exactly the same kinds of changes to the scripts that I'm talking
 about us making (putting an explicit '-D $PGDATA' where none now
 exists) or they have to resort to tricks like renaming the postmaster
 executable and creating a shell script in its place that will invoke
 the (renamed) postmaster with '-D $PGDATA'.

On at least some systems ps will dump process' environment and could be
easily used to check PGDATA.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Incremental backup

2003-02-14 Thread Kevin Brown
Christopher Browne wrote:
 What PITR generally consists of is the notion that you want to recover
 to the state at a particular moment in time.
 
 In O*-nomenclature, this means that you recover as at some earlier
 moment for which you have a good backup, and then re-apply changes,
 which in their terms, are kept in archive logs, which are somewhat
 analagous to WAL files.

Yeah, that's pretty much what I figured.

Oracle has something they call rollback segments which I assume are
separate bits of data that have enough information to reverse changes
that were made to the database during a transaction, and I figured
PITR would (or could) apply particular saved rollback segments to the
current state in order to roll back a table, tablespace, or database
to the state it was in at a particular point in time.

As it is, it sounds like PITR is a bit less refined than I expected.


So the relevant question is: how is *our* PITR going to work?  In
particular, how is it going to interact with our WAL files and the
table store?  If I'm not mistaken, right now (well, as of 7.2 anyway)
we round robin through a fixed set of WAL files.  For PITR, I assume
we'd need an archivelog function that would copy the WAL files as
they're checkpointed to some other location (with destination names
that reflect their order in time), just for starters.


It'd be *awfully* nice if you could issue a command to roll a table
(or, perhaps, a tablespace, if you've got a bunch of foreign keys and
such) back to a particular point in time, from the command line, with
no significant advance preparation (so long as the required files are
still around, and if they're not then abort the operation with the
appropriate error message).  But it doesn't sound like that's what
we're talking about when we talk about PITR...



I wouldn't expect the O* docs to be particularly revealing about
how the database manages PITR at the file level, but if it does, would
you happen to know where so I can look at it?  What I've seen so far
is very basic and not very revealing at all...



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Kevin Brown
Bruce Momjian wrote:
 I see your point --- pg_ctl does a PGDATA trick when passed -D:
 
 -D)
 shift
 # pass environment into new postmaster
 PGDATA=$1
 export PGDATA
 
 It should pass -D just like it was given.

Yes, exactly.

Now, the more interesting question in my mind is: if pg_ctl isn't
passed -D but inherits PGDATA, should it nonetheless pass -D
explicitly to the postmaster?  We can make it do that, and it would
have the benefit of making transparent what would otherwise be opaque.

I'm inclined to answer yes to that question, but only because
someone who *really* doesn't want the postmaster to show up with a -D
argument in ps can start the postmaster directly without using
pg_ctl at all.  Tom made a good argument for sometimes wanting to keep
the ps output clean, but it's not clear to me that it should
necessarily apply to pg_ctl.

But you guys might have a different perspective on that.  :-)



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] help me!!

2003-02-14 Thread XIE, Rong



Hello, 

My name is rong xie. I am a Student at 
TU-Munich.
I have a question to Postgresql an Linux. 

e.g:
for IBM DB2: I can write a test.sql 
file.

--test.sql
connect to database1;
set schema xie;
select * from table1;
insert table1 value('rong','xie',22);

select * from table1;
terminate;

Then I can execute the file "test.sql" in 
terminal.
db2 -tvf test.sql

In mysql: I can execute it so.
mysql  test.sql

How can I execute it in Postgresql?
is there similar command in linux for 
Postgresql?

Thank you very mach!

Best wish!

rong xie 


Re: [HACKERS] location of the configuration files

2003-02-14 Thread Steve Crawford
On Friday 14 February 2003 6:07 am, Martin Coxall wrote:
 On Thu, 2003-02-13 at 20:28, Steve Crawford wrote:
  I don't see why we can't keep everyone happy and let the users choose the
  setup they want. To wit, make the following, probably simple, changes:
 
  1) Have postgresql default to using /etc/postgresql.conf

 /etc/postgres/postgresql.conf, if we want to be proper FHS-bitches.

  2) Add a setting in postgresql.conf specifying the data directory
  3) Change the meaning of -D to mean use this config file
  4) In the absence of a specified data directory in postgresql.conf, use
  the location of the postgresql.conf file as the data directory

 Shouldn't it in that case default to, say /var/lib/postgres?

Idea 4 was just a way to preserve current behaviour for those who desire. 
Moving postgresql.conf requires adding the data directory info into 
postgresql.conf or specifying it in some other way. If, in the absence of any 
specification in postgresql.conf, postgres just looks in the same directory 
as postgresql.conf then it will be almost identical to the current setup.

Cheers,
Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Curtis Faith
tom lane wrote:
 Hmmm ... that might be made to work, but it would complicate 
 inserts. By the time an insert navigates to the page it 
 should insert on, it might find the page is dead, and then it 
 would have no easy way to get to the replacement page (unless 
 you want to dedicate another link field in every index page 
 for that one use).  I suppose it could recover by starting 
 the search over again.

Inserts could reread just the parent page if they encountered a dead
leaf since the parent would have been correctly updated.

 Another problem is that the two dead pages are now outside of 
 the btree structure and so their left and right links won't 
 get updated in the face of subsequent splits and merges of 
 the nearby pages.

That seems like a show stopper that just defers the problem. A split
of the left sibling would still screw up a scan that was using the old
left leaf page and wanted to move left.

Oh, well, the idea seemed worth exploring.

- Curtis




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] stats_command_string default?

2003-02-14 Thread Kevin Brown

One of the functions of the DBA is to monitor what people are doing to
the database.  My experience is that ps is often sorely lacking in
this regard: its output is somewhat limited, from what I've seen, and
in any case the DBA's domain is the database itself: that's the
environment he's going to be most proficient in.

The ability to select the list of current connections from
pg_stat_activity is really handy for monitoring the database, but the
default configuration disables stats_command_string -- so the
current_query winds up being blank by default.

That's not exactly the most useful configuration for the DBA.

Would it make more sense to enable stats_command_string by default?
It could be a problem if doing so would have a significant impact on
performance, but that's the only reason I can think of for not doing
it.  Are there others?


It would also be handy if users could see their own queries while the
rest remain blank.  That would require changing
pg_stat_get_backend_activity() so that it returns a value if the user
is the superuser or if the user asking for the answer is the same as
the user who owns the backend entry being looked up.  Are there any
pitfalls to implementing that?




-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] stats_command_string default?

2003-02-14 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes:
 Would it make more sense to enable stats_command_string by default?

I'd vote against it.  If we turn it on by default, people are paying
for a feature they may not even know exists.  Once they find out about
it and decide they want it, they can turn it on easily enough.

If you can show that the overhead is unmeasurable, that'd indicate that
this argument is bogus; but I suspect it's not negligible, at least on
simple queries.

 It would also be handy if users could see their own queries while the
 rest remain blank.  That would require changing
 pg_stat_get_backend_activity() so that it returns a value if the user
 is the superuser or if the user asking for the answer is the same as
 the user who owns the backend entry being looked up.  Are there any
 pitfalls to implementing that?

Seems reasonable offhand ...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] client_encoding directive is ignored in postgresql.conf

2003-02-14 Thread Bruce Momjian
Tom Lane wrote:
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  + /* Flag to we need to initialize client encoding info */
  + static bool need_to_init_client_encoding = -1;
 
 Surely that should be int, not bool.
 
  !   if (!PQsendQuery(conn, begin; select 
pg_client_encoding(); commit))
 
 Doesn't this break compatibility with pre-7.2 databases?  AFAICT that
 function was introduced in 7.2.

I haven't seen this patch applied.  If this is the best way to fix the
bug, we may as well break libpq for pre-7.2 clients.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] plpython trigger code is wrong (Re: [GENERAL] Potential bug -- script that drops postgres server)

2003-02-14 Thread Bradley McLean

* Tom Lane ([EMAIL PROTECTED]) [030214 19:35]:
 Lance Thomas [EMAIL PROTECTED] writes:
  Below is something that may be of interest -- a short, 7-statement script
  that seems to drop my postgres server.
 
 It appears that the plpython trigger implementation assumes that any
 given procedure will be used as a trigger for only one relation.  The
 reason it crashes is it's trying to use the rowtype info of the relation
 it was first compiled for with the other relation.
 
 Probably the easiest fix is to include the relation OID as part of the
 Python name of a trigger procedure, so that a separate copy is compiled
 for each relation the procedure is used with.

Interesting idea.  I had been taking the approach of applying the cache
to just the python compilation, and not the rowtype info.  This has a
substantial performance penalty, which I'd been addressing by eliminating
some unneeded parameter processing that doesn't apply in the trigger
case, and considering a separate cache for each rowtype.

 Any plpython users want to step forward and fix this?  I have other
 things on my plate ...

I'm looking at the bug right now.  Patch in a day or so.

-Brad


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Tom Lane
Curtis Faith [EMAIL PROTECTED] writes:
 Stored in the index?  And how will you do that portably?  

 Sorry for the lack of rigorous language. I meant that there would be one
 mutex per index stored in the header or internal data structures
 associated with each index somewhere. Probably in the same structure the
 root node reference for each btree is stored.

Hm.  A single lock that must be grabbed for operations anywhere in the
index is a concurrency bottleneck.  But maybe we could avoid that.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian

I don't want to over-engineer this.  Propogating -D into postmaster
makes sense, but grabbing PGDATA doesn't to me.

---

Kevin Brown wrote:
 Bruce Momjian wrote:
  I see your point --- pg_ctl does a PGDATA trick when passed -D:
  
  -D)
  shift
  # pass environment into new postmaster
  PGDATA=$1
  export PGDATA
  
  It should pass -D just like it was given.
 
 Yes, exactly.
 
 Now, the more interesting question in my mind is: if pg_ctl isn't
 passed -D but inherits PGDATA, should it nonetheless pass -D
 explicitly to the postmaster?  We can make it do that, and it would
 have the benefit of making transparent what would otherwise be opaque.
 
 I'm inclined to answer yes to that question, but only because
 someone who *really* doesn't want the postmaster to show up with a -D
 argument in ps can start the postmaster directly without using
 pg_ctl at all.  Tom made a good argument for sometimes wanting to keep
 the ps output clean, but it's not clear to me that it should
 necessarily apply to pg_ctl.
 
 But you guys might have a different perspective on that.  :-)
 
 
 
 -- 
 Kevin Brown [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Curt Sampson
On Fri, 14 Feb 2003, scott.marlowe wrote:

 Asking for everything in a directory with the name local in it to be
 shared is kind of counter intuitive to me.

Not really. If you install a particular program that doesn't come with
the OS on one machine on your site, why would you not want to install it
separately on all of the others?

Typically, I want my favourite non-OS utilities on all machines, not
just one. (Even if I don't use them on all machines.) Thus /usr/local is
for site-local stuff.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Berkeley and CMU classes adopt/extend PostgreSQL

2003-02-14 Thread Bruce Momjian

Any chance of giving them all separate TODO items?  That way, we would
get more items completed;  greedy request, I know.  ;-)

---

Anastassia Ailamaki wrote:
 Hi everyone,
 
   with some frequency.  We have the usual 180 students  we get every 
   semester (yep: 180!), but this year we've instituted 2 changes:
 
 We're looking at 100 students taking the class here every year.
 
  Double cool.  I'm just down the road, if Natassa needs a visiting
  lecturer.
 
 Tom - that's really super-cool! Tom, let's take it offline to schedule a
 visit.
 We will be delighted to have you lecture.
 
  Yes.  As of CVS tip, we have hash-based grouping but it doesn't spill
  to disk.  Want to ask them to start from CVS tip and fix that little
  detail?  Or fix the various other loose ends that have been mentioned
  lately?  (make it work with DISTINCT, improve the estimation logic,
  some other things I'm forgetting)
 
 As Joe said, this is what we are doing. We intend to use your todo-list to 
 design projects for future semesters... so all such suggestions
 are greatly appreciated.
  
 Natassa
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Curtis Faith
I previously wrote:
 5) A mutex/spinlock that was stored in the index could be 
 acquired by the scan code like this:
 
   buf = ReadBuffer(rel, blkno);   /* pin next page
 */
 
   SpinLockAcquire( indexSpecificMutex );/* lock the index
 reorg mutex */
 
   LockBuffer(buf, BUFFER_LOCK_UNLOCK);/* release lock on
 current page */
   LockBuffer(buf, BT_READ); /* lock next page */
 
   SpinLockRelease( indexSpecificMutex );/* unlock the index
 reorg mutex */
  
   ReleaseBuffer(buf);   /* now release pin on
 previously current page */
 
 6) The same index specific mutex/spinlock could be used for 
 the merge code surrounding only the acquisition of the four 
 page locks. This would obviate any problems with scans and 
 page merges, since the lock acquisition for the merge could 
 never occur while a scan was between pages.
 
 Further, with the reordering, the spinlock for the scan code 
 doesn't seem particularly onerous since it would be 
 surrounding only two LWLock calls.  To reduce the overhead to 
 an absolute minimum for the scan case these could be pushed 
 down into a new IW call (probably necessary since the 
 LockBuffer, ReleaseBuffer code does some error checking and 
 such that one wouldn't want in code guarded by a mutex.

I forgot to mention that the mutex would have to be release in the event
the next page lock could not be immediately acquired just after the
addition of the scan process to the lock waiters list to avoid blocking
all scans and probably causing severe deadlock problems.

- Curtis



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] pg_dump and inserts

2003-02-14 Thread Bruce Momjian

Attached is a patch the more clearly handles autocommit in pg_dump.  I
had already fixed pg_dump for doing autocommit while dumping, but didn't
handle setting autocommit on restore.

I focused on the initial script file startup, every \\connect,
pg_restore, and pg_dumpall.  I think I got them all.

New pg_dump output is:

--
-- PostgreSQL database dump
--

SET autocommit TO 'on';

\connect - postgres

SET autocommit TO 'on';

---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  I just checked and pg_dump -d _doesn't_ place the INSERT's in a
  transsaction.  Seems it should,
 
 I think this is a bad idea.  If one were after speed, one would have
 used the COPY format in the first place.  If one uses INSERTs, there
 may be a reason for it --- like, say, wanting each row insertion to
 succeed or fail independently.  Put a begin/end around it, and you
 lose that.
 
  and perhaps add a:
  SET autocommit TO 'on'
  as well.
 
 This is probably a good idea, since pg_dump scripts effectively assume
 that anyway.
 
  Of course, that SET would fail when restoring to prior
  releases,
 
 Irrelevant; current pg_dump scripts already issue a SET that pre-7.3
 servers won't recognize (search_path).  A failed SET is harmless anyway,
 or should be.  (What we really need is for someone to fix pg_restore to
 not abort on SQL errors...)
 
   regards, tom lane
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

Index: src/bin/pg_dump/pg_backup_archiver.c
===
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.67
diff -c -c -r1.67 pg_backup_archiver.c
*** src/bin/pg_dump/pg_backup_archiver.c1 Feb 2003 22:06:59 -   1.67
--- src/bin/pg_dump/pg_backup_archiver.c14 Feb 2003 19:39:08 -
***
*** 206,212 
sav = SetOutput(AH, ropt-filename, ropt-compression);
  
ahprintf(AH, --\n-- PostgreSQL database dump\n--\n\n);
! 
/*
 * Drop the items at the start, in reverse order
 */
--- 206,213 
sav = SetOutput(AH, ropt-filename, ropt-compression);
  
ahprintf(AH, --\n-- PostgreSQL database dump\n--\n\n);
!   ahprintf(AH, SET autocommit TO 'on';\n\n);
!   
/*
 * Drop the items at the start, in reverse order
 */
***
*** 2109,2115 
  dbname ? fmtId(dbname) : -);
appendPQExpBuffer(qry,  %s\n\n,
  fmtId(user));
! 
ahprintf(AH, qry-data);
  
destroyPQExpBuffer(qry);
--- 2110,2117 
  dbname ? fmtId(dbname) : -);
appendPQExpBuffer(qry,  %s\n\n,
  fmtId(user));
!   appendPQExpBuffer(qry, SET autocommit TO 'on';\n\n);
!   
ahprintf(AH, qry-data);
  
destroyPQExpBuffer(qry);
Index: src/bin/pg_dump/pg_backup_db.c
===
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_db.c,v
retrieving revision 1.45
diff -c -c -r1.45 pg_backup_db.c
*** src/bin/pg_dump/pg_backup_db.c  13 Feb 2003 04:54:15 -  1.45
--- src/bin/pg_dump/pg_backup_db.c  14 Feb 2003 19:39:09 -
***
*** 213,218 
--- 213,233 
if (password)
free(password);
  
+   /* check for version mismatch */
+   _check_database_version(AH, true);
+ 
+   /* Turn autocommit on */
+   if (AH-public.remoteVersion = 70300)
+   {
+   PGresult   *res;
+ 
+   res = PQexec(AH-connection, SET autocommit TO 'on');
+   if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
+   die_horribly(AH, NULL, SET autocommit TO 'on' failed: %s,
+ PQerrorMessage(AH-connection));
+   PQclear(res);
+   }
+ 
PQsetNoticeProcessor(newConn, notice_processor, NULL);
  
return newConn;
Index: src/bin/pg_dump/pg_dumpall.c
===
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_dumpall.c,v
retrieving revision 1.13
diff -c -c -r1.13 pg_dumpall.c
*** src/bin/pg_dump/pg_dumpall.c16 Jan 2003 15:27:59 -  1.13
--- src/bin/pg_dump/pg_dumpall.c14 Feb 2003 19:39:09 -
***
*** 190,195 
--- 190,196 

Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Tom Lane
Curtis Faith [EMAIL PROTECTED] writes:
 4) This could easily be reordered into:

   buf = ReadBuffer(rel, blkno);   /* pin next page
 */
   LockBuffer(buf, BUFFER_LOCK_UNLOCK);/* release lock on
 current page */
   LockBuffer(buf, BT_READ); /* lock next page */
   ReleaseBuffer(buf);   /* now release pin on
 previously current page */

without affecting the logic of the code or causing any deadlock
 problems since the release still occurs before the lock of the next
 page.

Sorry, that *does* create deadlocks.  Remember the deletion process is
going to need superexclusive lock (not only a BT_WRITE buffer lock,
but no concurrent pins) in order to be sure there are no scans stopped
on the page it wants to delete.  (In the above pseudocode, the fact that
you still hold a pin on the previously-current page makes you look
exactly like someone who's in the middle of scanning that page, rather
than trying to leave it.)  The same would be true of both pages
if it's trying to merge.

 5) A mutex/spinlock that was stored in the index could be acquired by
 the scan code like this:

Stored in the index?  And how will you do that portably?  But it
doesn't matter, because the approach deadlocks.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane writes:
 I would favor a setup that allows a -C *directory* (not file) to be
 specified as a postmaster parameter separately from the -D directory;

 A directory is not going to satisfy people.

Why not?  Who won't it satisfy, and what's their objection?

AFAICS, you can either set -C to /etc if you want your PG config files
loose in /etc, or you can set it to /etc/postgresql/ if you want them
in a privately-owned directory.  Which other arrangements are needed?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Peter Eisentraut
Tom Lane writes:

 I would favor a setup that allows a -C *directory* (not file) to be
 specified as a postmaster parameter separately from the -D directory;

A directory is not going to satisfy people.

 I don't see any great value in a separate postgresql.conf parameter for
 each secondary config file; that just means clutter to me,

Not to other people.

 1. No -C switch, no -D switch, no PGDATA found in environment: seek
 postgresql.conf in the default -C directory established at configure
 time.  Use the 'datadir' specified therein as -D.  Fail if postgresql.conf
 doesn't define a datadir value.

OK.

 2. No -C switch, no -D switch, PGDATA found in environment: use $PGDATA
 as both -C and -D.

This behavior would be pretty inconsistent.  But maybe it's the best we
can do.

 3. No -C switch, -D switch on command line: use -D value as both -C and -D,
 proceed as in case 2.

Same as above.

 4. -C switch, no -D switch on command line: seek postgresql.conf in
 -C directory, use the datadir it specifies.

OK.

 5. -C and -D on command line: seek postgresql.conf in -C directory,
 use -D as datadir overriding what is in postgresql.conf (this is just
 the usual rule that command line switches override postgresql.conf).

But that usual rule seems to be in conflict with cases 2 and 3 above.
(The usual rule is that a command-line option overrides a postgresql.conf
parameter.  The rule in 3, for example is, that a command-line option (the
same one!) overrides where postgresql.conf is in the first place.)

 I would venture that the configure-time-default for -C should be
 ${prefixdir}/etc if configure is not told differently,

Yeah, we already have that as --sysconfdir.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  I think the big question is whether we want the default to install the
  configs in a separate directory, pgsql/etc, or just allow the
  specification of a separate location.  Advantages of pgsql/etc are
  initdb-safe, and easier backups.
 
 I don't see why we don't just let initdb install suggested config files
 into the new $PGDATA directory, same as it ever did.  Then (as long as
 we don't use relative paths in the config files) people can move them
 somewhere else if they like, or not if they prefer not to.  Adding more
 mechanism than that just adds complexity without buying much (except the
 possibility of initdb overwriting your old config files, which is
 exactly what I thought we wanted to avoid).
 
  The big question is whether PGDATA is still our driving config variable,
  and PGCONFIG/-C is just an additional option, or whether we are moving
  in a direction where PGCONFIG/-C is going to be the driving value, and
  data_dir is going to be read as part of that.
 
 I thought the idea was to allow both approaches.  We are not moving in
 the direction of one or the other, we are giving people a choice of how
 they want to drive it.

That's where I am unsure.  Is the initdb-safe and backup advantages
enough to start to migrate those out to data/?  I need to hear
comments on that.

One new idea is to move the config files into data/etc.  That makes it
clear which are config files, and makes backup a little easier.  It
would make -C more logical because you are not moving a clear directory.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] psql and readline

2003-02-14 Thread Patrick Welche
On Thu, Feb 13, 2003 at 10:25:52AM -0500, Tom Lane wrote:
 Patrick Welche [EMAIL PROTECTED] writes:
  The sad thing is that my readline wrapper for libedit doesn't wrap
  replace_history_entry,
 
 Well, is that a bug in your wrapper?  Or must we add a configure test
 for the presence of replace_history_entry()?

Good question. Easiest for now for me would be add a configure test. Long
term libedit needs tweeking... In fact for now, I just comment out the call
the replace_history_entry, as I am more than happy with the rest of the
readline behaviour (as implemented in libedit).

Patrick

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Tom Lane
Curtis Faith [EMAIL PROTECTED] writes:
 Another way this could be handled is by not merging onto one of the
 existing pages but to a brand new page, a kind of special case shadow
 index page.

Hmmm ... that might be made to work, but it would complicate inserts.
By the time an insert navigates to the page it should insert on, it
might find the page is dead, and then it would have no easy way to get
to the replacement page (unless you want to dedicate another link field
in every index page for that one use).  I suppose it could recover by
starting the search over again.

Another problem is that the two dead pages are now outside of the btree
structure and so their left and right links won't get updated in the
face of subsequent splits and merges of the nearby pages.  I spent quite
a bit of time sweating over the recovery navigation details in my
original proposal; I can assure you it's not easy to get right.  You can
chain right from a dead page with some reliability, but left is another
matter.  There's no good way to know, when following a left link,
whether you've arrived at the proper place or need to chain right to
make up for a subsequent split.  The recovery procedure I proposed works
for removing single pages, but it won't work with substituted pages.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [pgsql-advocacy] [HACKERS] Changing the default configuration

2003-02-14 Thread Curt Sampson
On Thu, 13 Feb 2003, Bruce Momjian wrote:

 To auto-tune, you would need to monitor swap usage and other stuff that
 may vary too much based on load from other systems.  Only the admin
 knows how to answer some of those questions.

No, to auto-tune many parameters that currently require manual
configuration, you merely need to change things so that you work with
the OS, rather than beside it.

There are a lot of things we could mmap that would remove the need for
tuning certain things altogether. The buffer cache is the biggest one;
mmaping that would let the OS take care of it all, and a few hundred KB
of shared memory would be all postgres would need. (Or none at all, if
you used mmap'd memory regions where for that part of your IPC as well.)

You could probably also make sort_mem need a lot less tuning if you
sorted in mmap'd memory and let the OS deal with paging only when and
if it needed it (as well as asking the OS not to page memory you're
randomly accessing, since it really destroys your peformance when you do
that).

I'm not sure if you could get rid of semaphores or not, but if you
could somehow do that, that would be another limited resource that you
wouldn't have to deal with.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] psql and readline

2003-02-14 Thread Tom Lane
Patrick Welche [EMAIL PROTECTED] writes:
 On Thu, Feb 13, 2003 at 10:25:52AM -0500, Tom Lane wrote:
 Well, is that a bug in your wrapper?  Or must we add a configure test
 for the presence of replace_history_entry()?

 Good question. Easiest for now for me would be add a configure test.

Okay with me --- Ross, can you handle that?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Incremental backup

2003-02-14 Thread Martin Marques
On Jue 13 Feb 2003 16:38, Bruce Momjian wrote:
 Patrick Macdonald wrote:
  Bruce Momjian wrote:
   Someone at Red Hat is working on point-in-time recovery, also known as
   incremental backups.
 
  PITR and incremental backup are different beasts.  PITR deals with a
  backup + logs.  Incremental backup deals with a full backup + X
  smaller/incremental backups.
 
  So... it doesn't look like anyone is working on incremental backup at the
  moment.

 But why would someone want incremental backups compared to PITR?  The
 backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
 pretty weird.  :-)

Good backup systems, such as Informix (it's the one I used) doesn't do a query 
backup, but a pages backup. What I mean is that it looks for pages in the 
system that has changed from the las full backup and backs them up.

That's how an incremental backup works. PITR is another thing, which is even 
more important. :-)

-- 
Porqué usar una base de datos relacional cualquiera,
si podés usar PostgreSQL?
-
Martín Marqués  |[EMAIL PROTECTED]
Programador, Administrador, DBA |   Centro de Telematica
   Universidad Nacional
del Litoral
-


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Incremental backup

2003-02-14 Thread Manfred Koizar
On Thu, 13 Feb 2003 19:24:13 -0500, Patrick Macdonald
[EMAIL PROTECTED] wrote:
I know Oracle and DB2 have incremental backup in their arsenal (and iirc,
SQL Server has something called differential backup).  Whatever the name,
it's a win at the enterprise level. 

A differential backup copies only the database pages that have been
modified after the last full database backup.

This could be doable using XLogRecPtr pd_lsn in the page headers, but
I don't see an easy way to do it on a live database.

Servus
 Manfred

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian
Kevin Brown wrote:
 Tom Lane wrote:
  Kevin Brown [EMAIL PROTECTED] writes:
   I agree with your assessment for the most part, except for PGDATA.
   There's no good reason I can think of for the postmaster to look at
   it.
  
  The other side of that coin is, what's the good reason to remove it?
  There's a long way between I don't want my setup to depend on PGDATA
  and I don't think your setup should be allowed to depend on PGDATA.
  If you don't want to use it, then don't use it.  Why do you need to
  tell me how I'm allowed to run my installation?
 
 I'm not talking about getting rid of ALL dependency on PGDATA in our
 entire distribution, only postmaster's.
 
 Recall that the main purpose of making any of these changes at all is
 to make life easier for the guys who have to manage the systems that
 will be running PostgreSQL.  Agreed?
 
 So: imagine you're the newly-hired DBA and your boss points you to the
 system and says administrate the database on that.  You go over to
 the computer and start looking around.
 
 You do a ps and see a postmaster process running.  You know that
 it's the process that is listening for connections.  The ps listing
 only says /usr/bin/postmaster.  No arguments to clue you in,
 nothing.  Where do you look to figure out where the data is?  How do
 you figure out what port it's listening on?

If you want ps to display the data dir, you should use -D.  Remember, it
is mostly important for multiple postmaster, so if you are doing that,
just use -D, but don't prevent single-postmaster folks from using
PGDATA.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian

OK, here is an updated proposal.  I think we have decided:

Moving postmaster.pid and postmaster.opts isn't worth it.

We don't want per-file GUC variables, but assume it is in
the same config directory as postgresql.conf.  I don't
see any valid reason they would want to put them somewhere
different than postgresql.conf.

So, we add data_dir to postgresql.conf, and add -C/PGCONFIG to
postmaster.

Regarding Tom's idea of replacing data_dir with a full path during
initdb, I think we are better having it be relative to the config
directory, that way if they move pgsql/, the system still works.
However, if the config directory is in a different lead directory path,
we should replace it with the full path, e.g. /usr/local/pgsql/data and
/usr/local/pgsql/etc use relative paths, ../data, while /etc/postgresql
and /usr/local/pgsql/data get an absolute path.

My idea is to introduce the above capabilities in 7.4, and keep the
config files in /data.  This will allow package folks to move the config
files in 7.4.

I also think we should start telling people to use PGCONFIG rather than
PGDATA.  Then, in 7.5, we move the default config file location to
pgsql/etc, and tell folks to point there rather than /data.  I think
there is major value to getting those config files out of the initdb
creation area and for backups.

I am now wondering if we should add PGCONFIG and move them out of data
all in the same release.  Not sure if delaying the split is valuable.

---

Tom Lane wrote:
 Lamar Owen [EMAIL PROTECTED] writes:
  Ok, if we're going this far already, tell me exactly why we have three config
  files.  Why not really Unify things and fulfil the full promise of Grand 
  Unified Configuration by rolling hba and ident into postgresql.conf.  Is 
  there a compelling reason not to do so?
 
 Lack of backwards compatibility; unnecessary complexity.  Unifying those
 files would create a big headache in terms of having to unify their
 syntax.  And there are some basic semantic differences too.  For
 instance, order matters in pg_hba.conf, but not in postgresql.conf.
 
 Another reason not to do it is that there are differences in the
 security requirements of these files.  postgresql.conf probably doesn't
 contain anything that needs to be hidden from prying eyes, but I'd be
 inclined to want to keep the other two mode 600.
 
 ---
 
 Okay, I've been laying low all day, but here are my thoughts on the
 discussion:
 
 I do see the value in being able to (as opposed to being forced to,
 please) keep hand-edited config files in a separate location from
 the machine-processed data files.  We have already gone some distance
 in that direction over the past few releases --- there's much less in
 the top $PGDATA directory than there once was.  It makes sense to let
 people keep hand-edited files away from what initdb will overwrite.
 
 I would favor a setup that allows a -C *directory* (not file) to be
 specified as a postmaster parameter separately from the -D directory;
 then the hand-editable config files would be sought in -C not -D.  In
 the absence of -C the config files should be sought in -D, same as they
 ever were (thus simplifying life for people like me who run many
 postmasters and don't give a darn about FHS ;-)).
 
 I don't see any great value in a separate postgresql.conf parameter for
 each secondary config file; that just means clutter to me, especially
 if we add more such files in future.  I am also distinctly not in favor
 of eliminating the PGDATA environment variable; that reads to me as
 we are going to force you to do it our way rather than the way you've
 always done it, even if you like the old way.
 
 To make the RPM packagers happy, I guess that the default -C directory
 has to be settable via configure.  We do not currently have a default
 -D directory, and I didn't hear anyone arguing in favor of adding one.
 So that leaves the following possible combinations that the postmaster
 might see at startup, for which I propose the following behaviors:
 
 1. No -C switch, no -D switch, no PGDATA found in environment: seek
 postgresql.conf in the default -C directory established at configure
 time.  Use the 'datadir' specified therein as -D.  Fail if postgresql.conf
 doesn't define a datadir value.
 
 2. No -C switch, no -D switch, PGDATA found in environment: use $PGDATA
 as both -C and -D.  (Minor detail: if the postgresql.conf in the $PGDATA
 directory specifies a different directory as datadir, do we follow that
 or raise an error?  I'd be inclined to say follow it but maybe there
 is an argument for erroring out.)
 
 (In all the following cases, any environment PGDATA value is ignored.)
 
 3. No -C switch, -D switch on command line: use -D value as both -C and -D,
 proceed as in case 2.
 
 4. -C switch, no -D switch on command line: seek postgresql.conf in
 -C directory, use the datadir it 

Re: [HACKERS] PG_TEMP_FILES_DIR

2003-02-14 Thread Bruce Momjian

It is under each database directory, per-database.

---

Sailesh Krishnamurthy wrote:
 
 Quick question for the group 
 
 I'm assuming that the PG_TEMP_FILES_DIR for BufFile temps is
 automatically under the PGDATA directory. Is that correct ?
 
 -- 
 Pip-pip
 Sailesh
 http://www.cs.berkeley.edu/~sailesh
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Incremental backup

2003-02-14 Thread Bruce Momjian

OK, once we have PITR, will anyone want incremental backups?

---

Martin Marques wrote:
 On Jue 13 Feb 2003 16:38, Bruce Momjian wrote:
  Patrick Macdonald wrote:
   Bruce Momjian wrote:
Someone at Red Hat is working on point-in-time recovery, also known as
incremental backups.
  
   PITR and incremental backup are different beasts.  PITR deals with a
   backup + logs.  Incremental backup deals with a full backup + X
   smaller/incremental backups.
  
   So... it doesn't look like anyone is working on incremental backup at the
   moment.
 
  But why would someone want incremental backups compared to PITR?  The
  backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
  pretty weird.  :-)
 
 Good backup systems, such as Informix (it's the one I used) doesn't do a query 
 backup, but a pages backup. What I mean is that it looks for pages in the 
 system that has changed from the las full backup and backs them up.
 
 That's how an incremental backup works. PITR is another thing, which is even 
 more important. :-)
 
 -- 
 Porqu? usar una base de datos relacional cualquiera,
 si pod?s usar PostgreSQL?
 -
 Mart?n Marqu?s  |[EMAIL PROTECTED]
 Programador, Administrador, DBA |   Centro de Telematica
Universidad Nacional
 del Litoral
 -
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Terrible performance on wide selects

2003-02-14 Thread Bruce Momjian

Added to TODO:

* Cache last known per-tuple offsets to speed long tuple access


---

Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  i.e. for tuple with 100 cols, allocate an array of 100 pointers, plus
  keep count of how many are actually valid,
 
  Additionally, this should also make repeted determining of NULL fields
  faster - just put a NULL-pointer in and voila - no more bit-shifting and
  AND-ing to find out if the field is null.
 
 Right, the output of the operation would be a pair of arrays: Datum
 values and is-null flags.  (NULL pointers don't work for pass-by-value
 datatypes.)
 
 I like the idea of keeping track of a last-known-column position and
 incrementally extending that as needed.
 
 I think the way to manage this is to add the overhead data (the output
 arrays and last-column state) to TupleTableSlots.  Then we'd have
 a routine similar to heap_getattr except that it takes a TupleTableSlot
 and makes use of the extra state data.  The infrastructure to manage
 the state data is already in place: for example, ExecStoreTuple would
 reset the last-known-column to 0, ExecSetSlotDescriptor would be
 responsible for allocating the output arrays using the natts value from
 the provided tupdesc, etc.
 
 This wouldn't help for accesses that are not in the context of a slot,
 but certainly all the ones from ExecEvalVar are.  The executor always
 works with tuples stored in slots, so I think we could fix all the
 high-traffic cases this way.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Oliver Elphick
On Fri, 2003-02-14 at 12:17, Bruce Momjian wrote:
 If you want ps to display the data dir, you should use -D.  Remember, it
 is mostly important for multiple postmaster, so if you are doing that,
 just use -D, but don't prevent single-postmaster folks from using
 PGDATA.

Could not the ps line be rewritten to show this, as the backend's ps
lines are rewritten?

-- 
Oliver Elphick[EMAIL PROTECTED]
Isle of Wight, UK http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
 
 God be merciful unto us, and bless us; and cause his 
  face to shine upon us.  Psalms 67:1 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian

I am not sure if it is a good idea to be mucking with it.  For backend,
we do the entire thing, so it is clear we modified something.

---

Oliver Elphick wrote:
 On Fri, 2003-02-14 at 12:17, Bruce Momjian wrote:
  If you want ps to display the data dir, you should use -D.  Remember, it
  is mostly important for multiple postmaster, so if you are doing that,
  just use -D, but don't prevent single-postmaster folks from using
  PGDATA.
 
 Could not the ps line be rewritten to show this, as the backend's ps
 lines are rewritten?
 
 -- 
 Oliver Elphick[EMAIL PROTECTED]
 Isle of Wight, UK http://www.lfix.co.uk/oliver
 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
  
  God be merciful unto us, and bless us; and cause his 
   face to shine upon us.  Psalms 67:1 
 
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Incremental backup

2003-02-14 Thread Martin Marques
On Vie 14 Feb 2003 09:52, Bruce Momjian wrote:
 OK, once we have PITR, will anyone want incremental backups?

I will probably not need it, but I know of people how have databases which 
build dumps of more then 20GB.
They are interested in live incremental backups.

-- 
Porqué usar una base de datos relacional cualquiera,
si podés usar PostgreSQL?
-
Martín Marqués  |[EMAIL PROTECTED]
Programador, Administrador, DBA |   Centro de Telematica
   Universidad Nacional
del Litoral
-


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian
Kevin Brown wrote:
 Bruce Momjian wrote:
  If you want ps to display the data dir, you should use -D.  Remember, it
  is mostly important for multiple postmaster, so if you are doing that,
  just use -D, but don't prevent single-postmaster folks from using
  PGDATA.
 
 Perhaps the best compromise would be to change pg_ctl so that it uses
 -D explicitly when invoking postmaster.  That's an easy change.
 
 Could you describe how you and other developers use PGDATA?  I'm quite
 interested in knowing why there seems to be so much resistance to
 removing the potential_DataDir = getenv(PGDATA); line from
 postmaster.c.

I just set PGDATA in my login and I don't have to deal with it again.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian
Kevin Brown wrote:
 Bruce Momjian wrote:
  I just set PGDATA in my login and I don't have to deal with it
  again.
 
 Hmm...you don't use pg_ctl to start/stop/whatever the database?  You
 invoke the postmaster directly (I can easily see that you would, just
 asking if you do)?

I can use either to start/stop it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian

Give it up.  As long as we have -D, we will allow PGDATA. If you don't
want to use it, don't use it.

---

Kevin Brown wrote:
 Bruce Momjian wrote:
  I just set PGDATA in my login and I don't have to deal with it
  again.
 
 DuhI just realized a reason you guys might care about this so
 much.
 
 It's because you want to be able to start the postmaster from within a
 debugger (or profiler, or whatever), and you don't want to have to
 mess with command line options from there, right?
 
 
 Sounds like fixing pg_ctl to use -D explicitly when invoking the
 postmaster is the right change to make here, since that's probably how
 the majority of the production shops are going to be starting the
 database anyway.  Takes care of the majority of the visibility
 problem and leaves PGDATA intact.  Thoughts?
 
 
 
 -- 
 Kevin Brown [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian

My point is that folks with multiple postmasters may not want to use
PGDATA, but for folks who have single postmasters, it makes things
easier and less error-prone.

---

Bruce Momjian wrote:
 
 Give it up.  As long as we have -D, we will allow PGDATA. If you don't
 want to use it, don't use it.
 
 ---
 
 Kevin Brown wrote:
  Bruce Momjian wrote:
   I just set PGDATA in my login and I don't have to deal with it
   again.
  
  DuhI just realized a reason you guys might care about this so
  much.
  
  It's because you want to be able to start the postmaster from within a
  debugger (or profiler, or whatever), and you don't want to have to
  mess with command line options from there, right?
  
  
  Sounds like fixing pg_ctl to use -D explicitly when invoking the
  postmaster is the right change to make here, since that's probably how
  the majority of the production shops are going to be starting the
  database anyway.  Takes care of the majority of the visibility
  problem and leaves PGDATA intact.  Thoughts?
  
  
  
  -- 
  Kevin Brown   [EMAIL PROTECTED]
  
  ---(end of broadcast)---
  TIP 2: you can get off all lists at once with the unregister command
  (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
  
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] WAL replay logic (was Re: [PERFORM] Mount options for Ext3?)

2003-02-14 Thread Bruce Momjian

Is there a TODO here, like Allow recovery from corrupt pg_control via
WAL?

---

Kevin Brown wrote:
 Tom Lane wrote:
  Kevin Brown [EMAIL PROTECTED] writes:
   One question I have is: in the event of a crash, why not simply replay
   all the transactions found in the WAL?  Is the startup time of the
   database that badly affected if pg_control is ignored?
  
  Interesting thought, indeed.  Since we truncate the WAL after each
  checkpoint, seems like this approach would no more than double the time
  for restart.  
 
 Hmm...truncating the WAL after each checkpoint minimizes the amount of
 disk space eaten by the WAL, but on the other hand keeping older
 segments around buys you some safety in the event that things get
 really hosed.  But your later comments make it sound like the older
 WAL segments are kept around anyway, just rotated.
 
  The win is it'd eliminate pg_control as a single point of
  failure.  It's always bothered me that we have to update pg_control on
  every checkpoint --- it should be a write-pretty-darn-seldom file,
  considering how critical it is.
  
  I think we'd have to make some changes in the code for deleting old
  WAL segments --- right now it's not careful to delete them in order.
  But surely that can be coped with.
 
 Even that might not be necessary.  See below.
 
  OTOH, this might just move the locus for fatal failures out of
  pg_control and into the OS' algorithms for writing directory updates.
  We would have no cross-check that the set of WAL file names visible in
  pg_xlog is sensible or aligned with the true state of the datafile
  area.
 
 Well, what we somehow need to guarantee is that there is always WAL
 data that is older than the newest consistent data in the datafile
 area, right?  Meaning that if the datafile area gets scribbled on in
 an inconsistent manner, you always have WAL data to fill in the gaps.
 
 Right now we do that by using fsync() and sync().  But I think it
 would be highly desirable to be able to more or less guarantee
 database consistency even if fsync were turned off.  The price for
 that might be too high, though.
 
  We'd have to take it on faith that we should replay the visible files
  in their name order.  This might mean we'd have to abandon the current
  hack of recycling xlog segments by renaming them --- which would be a
  nontrivial performance hit.
 
 It's probably a bad idea for the replay to be based on the filenames.
 Instead, it should probably be based strictly on the contents of the
 xlog segment files.  Seems to me the beginning of each segment file
 should have some kind of header information that makes it clear where
 in the scheme of things it belongs.  Additionally, writing some sort
 of checksum, either at the beginning or the end, might not be a bad
 idea either (doesn't have to be a strict checksum, but it needs to be
 something that's reasonably likely to catch corruption within a
 segment).
 
 Do that, and you don't have to worry about renaming xlog segments at
 all: you simply move on to the next logical segment in the list (a
 replay just reads the header info for all the segments and orders the
 list as it sees fit, and discards all segments prior to any gap it
 finds.  It may be that you simply have to bail out if you find a gap,
 though).  As long as the xlog segment checksum information is
 consistent with the contents of the segment and as long as its
 transactions pick up where the previous segment's left off (assuming
 it's not the first segment, of course), you can safely replay the
 transactions it contains.
 
 I presume we're recycling xlog segments in order to avoid file
 creation and unlink overhead?  Otherwise you can simply create new
 segments as needed and unlink old segments as policy dictates.
 
  Comments anyone?
  
   If there exists somewhere a reasonably succinct description of the
   reasoning behind the current transaction management scheme (including
   an analysis of the pros and cons), I'd love to read it and quit
   bugging you.  :-)
  
  Not that I know of.  Would you care to prepare such a writeup?  There
  is a lot of material in the source-code comments, but no coherent
  presentation.
 
 Be happy to.  Just point me to any non-obvious source files.
 
 Thus far on my plate:
 
 1.  PID file locking for postmaster startup (doesn't strictly need
   to be the PID file but it may as well be, since we're already
   messing with it anyway).  I'm currently looking at how to do
   the autoconf tests, since I've never developed using autoconf
   before.
 
 2.  Documenting the transaction management scheme.
 
 I was initially interested in implementing the explicit JOIN
 reordering but based on your recent comments I think you have a much
 better handle on that than I.  I'll be very interested to see what you
 do, to see if it's anything close to what I figure has to happen...
 
 
 -- 
 Kevin 

Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 Tom Lane kirjutas R, 14.02.2003 kell 01:13:
 How is returning the same data twice not an ill effect?

 From earlier discussions I understood that there had been some work done
 on using btrees for indexing arrays by storing each separate element in
 a löeaf node. Surely that work must deal with not returning the same
 tuple twice.

The only mechanism that exists for that is to discard tuples that meet
the qualification tests of previous indexscans.  This cannot prevent
returning the same tuple twice in one scan, if the index is so
ill-behaved as to return the same pointer twice.  I don't know what
Vadim had in mind to support multiple index entries per tuple, but
it's certainly not in the code yet.

 How will you wait for scans that you know nothing of to go past?
 Especially when they are going to be blocked by your own write lock
 on the left page?

 could we just not lock (for more than just to ensure atomic writes) the
 page but instead increment a page version on each write to detect
 changes?

How does that help?  The left-moving indexscan still has no way to
recover.  It can't go back to the page it was on before and try to
determine which entries you added there, because it has no reliable
reference point to do so.  The entry it previously returned might not be
there anymore, and in a non-unique index key comparisons won't help.
And even more to the point, how would it know you've changed the left
page?  It has no idea what the previous page version on the left page
was, because it was never there before.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian
Kevin Brown wrote:
 Bruce Momjian wrote:
  OK, here is an updated proposal.  I think we have decided:
  
  Moving postmaster.pid and postmaster.opts isn't worth it.
  
  We don't want per-file GUC variables, but assume it is in
  the same config directory as postgresql.conf.  I don't
  see any valid reason they would want to put them somewhere
  different than postgresql.conf.
  
  So, we add data_dir to postgresql.conf, and add -C/PGCONFIG to
  postmaster.
 
 Agreed.  One additional thing: when pg_ctl invokes the postmaster, it
 should explicitly specify -C on the postmaster command line, and if it
 doesn't find a data_dir in $PGCONFIG/postgresql.conf then it should
 explicitly specify a -D as well.  Pg_ctl is going to have to be
 modified to take a -C argument anyway, so we may as well go all the
 way to do the right thing here.
 
 This way, people who start the database using the standard tools we
 supply will know exactly what's going on when they get a ps listing.

No.  If you want ps to display, don't use environment variables. Many
don't care --- especially those with only one postmaster.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Incremental backup

2003-02-14 Thread Greg Copeland
On Fri, 2003-02-14 at 06:52, Bruce Momjian wrote:
 OK, once we have PITR, will anyone want incremental backups?
 
 ---
 
 Martin Marques wrote:
  On Jue 13 Feb 2003 16:38, Bruce Momjian wrote:
   Patrick Macdonald wrote:
Bruce Momjian wrote:
 Someone at Red Hat is working on point-in-time recovery, also known as
 incremental backups.
   
PITR and incremental backup are different beasts.  PITR deals with a
backup + logs.  Incremental backup deals with a full backup + X
smaller/incremental backups.
   
So... it doesn't look like anyone is working on incremental backup at the
moment.
  
   But why would someone want incremental backups compared to PITR?  The
   backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
   pretty weird.  :-)
  
  Good backup systems, such as Informix (it's the one I used) doesn't do a query 
  backup, but a pages backup. What I mean is that it looks for pages in the 
  system that has changed from the las full backup and backs them up.
  
  That's how an incremental backup works. PITR is another thing, which is even 
  more important. :-)

I do imagine for some people it will register high on their list.

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 So, we add data_dir to postgresql.conf, and add -C/PGCONFIG to
 postmaster.

Wait one second.  You are blithely throwing in a PGCONFIG variable
without any detailed proposal of exactly how it will work.  Does
that override a PGDATA environment variable?  How do they interact?

Also, please note Kevin Brown's nearby arguments against using PGDATA
at all, which surely apply with equal force to a PGCONFIG variable.
Now, I don't buy that Kevin's arguments are enough reason to break
backwards compatibility by removing PGDATA --- but I think they are
enough reason not to introduce a new environment variable.  PGCONFIG
wouldn't offer any backwards-compatibility value, and that tilts the
scales against it.

 Regarding Tom's idea of replacing data_dir with a full path during
 initdb, I think we are better having it be relative to the config
 directory, that way if they move pgsql/, the system still works.

Good thought, but you're assuming that initdb knows where the config
files will eventually live.  If we do that, then moving the config
files breaks the installation.  I think it will be fairly common to
let initdb drop its proposed config files into $PGDATA, and then
manually place them where they should go (or even more likely,
manually merge them with a prior version).  Probably better to force
datadir to be an absolute path in the config file.  (In fact, on safety
grounds I'd argue in favor of rejecting a datadir value taken from the
config file that wasn't absolute.)

 I also think we should start telling people to use PGCONFIG rather than
 PGDATA.  Then, in 7.5, we move the default config file location to
 pgsql/etc, and tell folks to point there rather than /data.

I agree with none of this.  This is not improvement, this is only change
for the sake of change.  The packagers will do what they want to do
(and are already doing, mostly) regardless.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Tom Lane
Oliver Elphick [EMAIL PROTECTED] writes:
 On Fri, 2003-02-14 at 12:17, Bruce Momjian wrote:
 If you want ps to display the data dir, you should use -D.  Remember, it
 is mostly important for multiple postmaster, so if you are doing that,
 just use -D, but don't prevent single-postmaster folks from using
 PGDATA.

 Could not the ps line be rewritten to show this, as the backend's ps
 lines are rewritten?

I for one would rather it didn't do that.  I already set my postmaster
command lines the way I want 'em, and I don't want the code overriding
that.  (I prefer to use explicit -p arguments to distinguish the various
postmasters I have running --- shorter and easier to read than explicit
-D would be.  At least for me.)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Someone's broken psql's connection-failure error reporting

2003-02-14 Thread Kurt Roeckx
On Thu, Feb 13, 2003 at 08:55:23PM -0500, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  I've done a little bit of cleanup, but that code is still a mess...
  someone should rewrite these routines.
 
  Yes, I looked at it and struggled to get both IPv4 and IPv6 cleanly
  working.  Any ideas on how to improve it?
 
 The major problem is the huge amount of #ifdefs, most of which seem to
 come from the fact that we deal with a list of possible addresses in
 one case and not the other.  It would help a lot if we fixed things so
 that we dealt with a list in either case --- only a one-element list,
 if we don't have getaddrinfo, but getaddrinfo2 could hide that and
 provide a uniform API regardless.

I'm actually working on getting rid of all those #ifdef's, but
it's going slowly.  (I have very little free time.)


Kurt


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  So, we add data_dir to postgresql.conf, and add -C/PGCONFIG to
  postmaster.
 
 Wait one second.  You are blithely throwing in a PGCONFIG variable
 without any detailed proposal of exactly how it will work.  Does
 that override a PGDATA environment variable?  How do they interact?

I am just throwing out ideas. I don't think we are near interaction
issues yet.

I think the big question is whether we want the default to install the
configs in a separate directory, pgsql/etc, or just allow the
specification of a separate location.  Advantages of pgsql/etc are
initdb-safe, and easier backups.

I do think PGCONFIG would be helpful for the same reason that PGDATA is.
However, there is clearly a problem of how does data_dir interact with
PGDATA.

The big question is whether PGDATA is still our driving config variable,
and PGCONFIG/-C is just an additional option, or whether we are moving
in a direction where PGCONFIG/-C is going to be the driving value, and
data_dir is going to be read as part of that.


 Also, please note Kevin Brown's nearby arguments against using PGDATA
 at all, which surely apply with equal force to a PGCONFIG variable.
 Now, I don't buy that Kevin's arguments are enough reason to break
 backwards compatibility by removing PGDATA --- but I think they are
 enough reason not to introduce a new environment variable.  PGCONFIG
 wouldn't offer any backwards-compatibility value, and that tilts the
 scales against it.

Weren't you just showing how you set environment variables to easily
configure stuff.  If you use a separate configure dir, isn't PGCONFIG
part of that?

  Regarding Tom's idea of replacing data_dir with a full path during
  initdb, I think we are better having it be relative to the config
  directory, that way if they move pgsql/, the system still works.
 
 Good thought, but you're assuming that initdb knows where the config
 files will eventually live.  If we do that, then moving the config
 files breaks the installation.  I think it will be fairly common to
 let initdb drop its proposed config files into $PGDATA, and then
 manually place them where they should go (or even more likely,
 manually merge them with a prior version).  Probably better to force
 datadir to be an absolute path in the config file.  (In fact, on safety
 grounds I'd argue in favor of rejecting a datadir value taken from the
 config file that wasn't absolute.)

Maybe.  Not sure.

  I also think we should start telling people to use PGCONFIG rather than
  PGDATA.  Then, in 7.5, we move the default config file location to
  pgsql/etc, and tell folks to point there rather than /data.
 
 I agree with none of this.  This is not improvement, this is only change
 for the sake of change.  The packagers will do what they want to do
 (and are already doing, mostly) regardless.

Well, it is a step forward in terms of initdb-safe and easier backups.
Several people said they liked that.  I thought you were one of them.

This is back to the big question, who drives things in the default
install, config file or pgdata.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Brain dump: btree collapsing

2003-02-14 Thread Curtis Faith
tom lane wrote:
 How does that help?  The left-moving indexscan still has no 
 way to recover.  It can't go back to the page it was on 
 before and try to determine which entries you added there, 
 because it has no reliable reference point to do so.  The 
 entry it previously returned might not be there anymore, and 
 in a non-unique index key comparisons won't help. And even 
 more to the point, how would it know you've changed the left 
 page?  It has no idea what the previous page version on the 
 left page was, because it was never there before.

Revisiting the idea I proposed in a previous email after more research,
I believe it is definitely possible to implement a mutex based scheme
that will prevent scans from being polluted by merges of index pages and
that does not result in the mutex being held for any significant
duration.

1) Current scan code does this:

bool
_bt_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
{
... definitions go here...

if (ScanDirectionIsForward(dir))
{
if (!PageIsEmpty(page)  offnum  maxoff)
offnum = OffsetNumberNext(offnum);
else
{
/* walk right to the next page with data */
for (;;)
{
/* if we're at end of scan, release the
buffer and return */
... skip code here...

/* step right one page */
blkno = opaque-btpo_next;
_bt_relbuf(rel, *bufP);
*bufP = _bt_getbuf(rel, blkno, BT_READ);

... skip rest of code...

3) Note that the calls

_bt_relbuf(rel, *bufP);
*bufP = _bt_getbuf(rel, blkno, BT_READ);

   a) appear adjacent to each other
   b) relbuf calls:

  LockBuffer(buf, BUFFER_LOCK_UNLOCK);
  ReleaseBuffer(buf);

   c) getbuf only calls:

  buf = ReadBuffer(rel, blkno);
LockBuffer(buf, access);
  
  in the case of an existing buffer, rather than a new one.

4) This could easily be reordered into:

  buf = ReadBuffer(rel, blkno); /* pin next page
*/
  LockBuffer(buf, BUFFER_LOCK_UNLOCK);  /* release lock on
current page */
LockBuffer(buf, BT_READ); /* lock next page */
  ReleaseBuffer(buf);   /* now release pin on
previously current page */

   without affecting the logic of the code or causing any deadlock
problems since the release still occurs before the lock of the next
page.

5) A mutex/spinlock that was stored in the index could be acquired by
the scan code like this:

  buf = ReadBuffer(rel, blkno); /* pin next page
*/

SpinLockAcquire( indexSpecificMutex );/* lock the index
reorg mutex */

  LockBuffer(buf, BUFFER_LOCK_UNLOCK);  /* release lock on
current page */
LockBuffer(buf, BT_READ); /* lock next page */

SpinLockRelease( indexSpecificMutex );/* unlock the index
reorg mutex */
 
  ReleaseBuffer(buf);   /* now release pin on
previously current page */

6) The same index specific mutex/spinlock could be used for the merge
code surrounding only the acquisition of the four page locks. This would
obviate any problems with scans and page merges, since the lock
acquisition for the merge could never occur while a scan was between
pages.

Further, with the reordering, the spinlock for the scan code doesn't
seem particularly onerous since it would be surrounding only two LWLock
calls.  To reduce the overhead to an absolute minimum for the scan case
these could be pushed down into a new IW call (probably necessary since
the LockBuffer, ReleaseBuffer code does some error checking and such
that one wouldn't want in code guarded by a mutex.

- Curtis



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Berkeley and CMU classes adopt/extend PostgreSQL

2003-02-14 Thread Joe Hellerstein
Hi all:
	I emailed Marc Fournier on this topic some weeks back, but haven't 
heard from him.

I am teaching the undergrad DB course at UC Berkeley, something I do 
with some frequency.  We have the usual 180 students  we get every 
semester (yep: 180!), but this year we've instituted 2 changes:

1) We changed the course projects to make the students hack PostgreSQL 
internals, rather than the minibase eduware
2) We are coordinating the class with a class at CMU being taught by 
Prof. Anastassia (Natassa) Ailamaki

Our Homework 2, which is being passed out this week, will ask the 
students to implement a hash-based grouping that spills to disk.  I 
understand this topic has been batted about the pgsql-hackers list 
recently.  The TAs who've prepared the assignment (Sailesh 
Krishnamurthy at Berkeley and Spiros Papadimitriou at CMU) have also 
implemented a reference solution to assignment.  Once we've got the 
students' projects all turned in, we'll be very happy to contribute our 
code back the PostgreSQL project.

I'm hopeful this will lead to many good things:

1) Each year we can pick another feature to assign in class, and 
contribute back.  We'll need to come up with well-scoped engine 
features that exercise concepts from the class -- eventually we'll run 
out of tractable things that PGSQL needs, but not in the next couple 
years I bet.

2) We'll raise a crop of good students who know Postgres internals.  
Roughly half the Berkeley EECS undergrads take the DB class, and all of 
them will be post-hackers!  (Again, I don't know the stats at CMU.)

So consider this a heads up on the hash-agg front, and on the future 
contributions front.   I'll follow up with another email on 
PostgreSQL-centered research in our group at Berkeley as well.

Another favor I'd ask is that people on the list be a bit hesitant 
about helping our students with their homework!  We would like them to 
do it themselves, more or less :-)

Regards,
Joe Hellerstein

--

Joseph M. Hellerstein
Professor, EECS Computer Science Division
UC Berkeley
http://www.cs.berkeley.edu/~jmh


On Tuesday, February 11, 2003, at 06:54  PM, Sailesh Krishnamurthy 
wrote:

From: Hannu Krosing [EMAIL PROTECTED]
Date: Tue Feb 11, 2003  12:21:26  PM US/Pacific
To: Tom Lane [EMAIL PROTECTED]
Cc: Bruno Wolff III [EMAIL PROTECTED], Greg Stark [EMAIL PROTECTED], 
[EMAIL PROTECTED]
Subject: Re: [HACKERS] Hash grouping, aggregates


Tom Lane kirjutas T, 11.02.2003 kell 18:39:
Bruno Wolff III [EMAIL PROTECTED] writes:

  Tom Lane [EMAIL PROTECTED] wrote:

Greg Stark [EMAIL PROTECTED] writes:

The neat thing is that hash aggregates would allow grouping on 
data types that
have = operators but no useful  operator.

Hm.  Right now I think that would barf on you, because the parser 
wants
to find the '' operator to label the grouping column with, even if 
the
planner later decides not to use it.  It'd take some redesign of the
query data structure (specifically SortClause/GroupClause) to avoid 
that.


I think another issue is that for some = operators you still might 
not
be able to use a hash. I would expect the discussion for hash joins 
in
http://developer.postgresql.org/docs/postgres/xoper-optimization.html
would to hash aggregates as well.

Right, the = operator must be hashable or you're out of luck.  But we
could imagine tweaking the parser to allow GROUP BY if it finds a
hashable = operator and no sort operator.  The only objection I can 
see
to this is that it means the planner *must* use hash aggregation, 
which
might be a bad move if there are too many distinct groups.

If we run out of sort memory, we can always bail out later, preferrably
with a descriptive error message. It is not as elegant as erring out at
parse (or even plan/optimise) time, but the result is /almost/ the 
same.

Relying on hash aggregation will become essential if we are ever going
to implement the other groupings (CUBE, ROLLUP, (), ...), so it would
be nice if hash aggregation could also overflow to disk - I suspect 
that
this will still be faster that running an independent scan for each
GROUP BY grouping and merging the results.

-
Hannu


---(end of 
broadcast)---
TIP 1: subscribe and unsubscribe commands go to 
[EMAIL PROTECTED]




--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Berkeley and CMU classes adopt/extend PostgreSQL

2003-02-14 Thread Anastassia Ailamaki
Hi everyone,

  with some frequency.  We have the usual 180 students  we get every 
  semester (yep: 180!), but this year we've instituted 2 changes:

We're looking at 100 students taking the class here every year.

 Double cool.  I'm just down the road, if Natassa needs a visiting
 lecturer.

Tom - that's really super-cool! Tom, let's take it offline to schedule a
visit.
We will be delighted to have you lecture.

 Yes.  As of CVS tip, we have hash-based grouping but it doesn't spill
 to disk.  Want to ask them to start from CVS tip and fix that little
 detail?  Or fix the various other loose ends that have been mentioned
 lately?  (make it work with DISTINCT, improve the estimation logic,
 some other things I'm forgetting)

As Joe said, this is what we are doing. We intend to use your todo-list to 
design projects for future semesters... so all such suggestions
are greatly appreciated.
 
Natassa

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Bruce Momjian
Kevin Brown wrote:
 Bruce Momjian wrote:
  The big question is whether PGDATA is still our driving config variable,
  and PGCONFIG/-C is just an additional option, or whether we are moving
  in a direction where PGCONFIG/-C is going to be the driving value, and
  data_dir is going to be read as part of that.
 
 I'm actually leaning towards PGCONFIG + PGDATA.
 
 Yeah, it may be a surprise given my previous arguments, but I can't
 help but think that the advantages you get with PGDATA will also exist
 for PGCONFIG.
 
 My previous arguments for removing PGDATA from postmaster can be dealt
 with by fixing pg_ctl to use explicit command line directives when
 invoking postmaster -- no changes to postmaster needed.  PGCONFIG
 would be no different in that regard.

I see your point --- pg_ctl does a PGDATA trick when passed -D:

-D)
shift
# pass environment into new postmaster
PGDATA=$1
export PGDATA

It should pass -D just like it was given.

 Sorry if I seem a big gung-ho on the administrator point of view, but
 as a system administrator myself I understand and feel their pain.

Making things easy for sysadmins is an important feature of PostgreSQL.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] location of the configuration files

2003-02-14 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 I think the big question is whether we want the default to install the
 configs in a separate directory, pgsql/etc, or just allow the
 specification of a separate location.  Advantages of pgsql/etc are
 initdb-safe, and easier backups.

I don't see why we don't just let initdb install suggested config files
into the new $PGDATA directory, same as it ever did.  Then (as long as
we don't use relative paths in the config files) people can move them
somewhere else if they like, or not if they prefer not to.  Adding more
mechanism than that just adds complexity without buying much (except the
possibility of initdb overwriting your old config files, which is
exactly what I thought we wanted to avoid).

 The big question is whether PGDATA is still our driving config variable,
 and PGCONFIG/-C is just an additional option, or whether we are moving
 in a direction where PGCONFIG/-C is going to be the driving value, and
 data_dir is going to be read as part of that.

I thought the idea was to allow both approaches.  We are not moving in
the direction of one or the other, we are giving people a choice of how
they want to drive it.

 Weren't you just showing how you set environment variables to easily
 configure stuff.  If you use a separate configure dir, isn't PGCONFIG
 part of that?

I'm just pointing out that there's no backward-compatibility argument
for PGCONFIG.  It should only be put in if the people who want to use
the -C-is-driver approach want it.  Kevin clearly doesn't ;-).

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] client_encoding directive is ignored in

2003-02-14 Thread Bruce Momjian
Tatsuo Ishii wrote:
  Tatsuo Ishii [EMAIL PROTECTED] writes:
   + /* Flag to we need to initialize client encoding info */
   + static bool need_to_init_client_encoding = -1;
  
  Surely that should be int, not bool.
 
 Oops. Will fix.
 
   ! if (!PQsendQuery(conn, begin; select 
pg_client_encoding(); commit))
  
  Doesn't this break compatibility with pre-7.2 databases?  AFAICT that
  function was introduced in 7.2.
 
 Yes, but there seems no other way to solve the problem and I thought we
 do not guarantee the compatibilty between 7.3 frontend and pre 7.3 servers.

Yep.  Tatsuo, you should apply the patch to fix the problem.  Shame this
didn't make it into 7.3.2.  Should we backpatch?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Linux.conf.au 2003 Report

2003-02-14 Thread Bruce Momjian

Is Linux.conf.au the event PostgreSQL should use for coverage in
Australia next year?

---

Christopher Kings-Lynne wrote:
 Linux.conf.au Report
 
 
 The Linux.conf.au is an international Linux/Open Source event that attracts
 lots of international speakers.  Total conf attendance was around 360, maybe
 even 400 I think.
 
 Gavin Sherry was speaking at this particular conf, and I attended as a
 hobbyist.
 
 PostgreSQL got a reasonable amount of attention, particularly since there
 were no representatives from other database products there.
 
 Some pics of our PostgreSQL BOF and the Perth Bell Tower:
 http://www.zip.com.au/~swm/lca2003
 (Gavin is the beardy looking dude 3rd from the left :)  I'm taking the
 photo.)
 
 These are the main questions we where asked, or features that were
 requested:
 
 * Replication, replication, replication!
 
 - We told them that there are a few solutions, none of them are particularly
 great.  Gavin got all sorts of ideas about log shipping.
 
 * IPV6 data types
 
 - Apparently there are some ISPs in some countries that have started to bill
 people for IPV6 bandwidth, and the lack of IPV6 address types is hurting
 them.
 
 * Collisions in auto-generated names.
 
 - The standard table modification tactic (that I also use) or renaming table
 to *_old and creating new one breaks because the primary key of the new
 table is assigned the same name as the PK of the old, causing CREATE TABLE
 to fail.  This is really annoying.  I think that auto-generated names should
 never collide.
 
 * Problem:  person has large database with 4 or 5 humungous tables that they
 aren't interested in backing up.  However, they want to back up the rest.
 
 - I suggested that if pg_dump could dump individual schemas, then they could
 move their 'don't backup' tables to another schema, and just dump the other
 one.
 
 We found out all sorts of interesting places that PostgreSQL is being used:
 a large Australian Telco, several restaurants in the Perth area, the Debian
 inventory system and the Katie revision control system.  It is also being
 evaluated for process control analysis at a steel plant.  Maybe we should
 chase some people for case studies?
 
 Chris Kings-Lynne
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



  1   2   >