Re: [HACKERS] RPM building fun

2003-11-20 Thread David Fetter
On Thu, Nov 20, 2003 at 11:45:06AM +0530, Shridhar Daithankar wrote:
 Joshua D. Drake wrote:
 Is there some way to remove this piece of sh^H^Hlegacy from the
 configure script?  Does anybody actually use info?
 All of GNU.
 
 Additionally it is very good resource when you use Konqueror to
 browse it as html..

Roight.  How to tell it to get the right prefix, then?

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100cell: +1 415 235 3778

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] RPM building fun

2003-11-20 Thread Shridhar Daithankar
On Thursday 20 November 2003 13:09, David Fetter wrote:
 On Thu, Nov 20, 2003 at 11:45:06AM +0530, Shridhar Daithankar wrote:
  Joshua D. Drake wrote:
  Is there some way to remove this piece of sh^H^Hlegacy from the
  configure script?  Does anybody actually use info?
  
  All of GNU.
 
  Additionally it is very good resource when you use Konqueror to
  browse it as html..

 Roight.  How to tell it to get the right prefix, then?

I didn't get you.. I type info:/autoconf and it will show me the autoconf 
page.

You mean extension of actual info files? No idea. Never seen them in 
wild...:-)

 Shridhar
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] ALTER COLUMN/logical column position

2003-11-20 Thread Hannu Krosing
Andreas Pflug kirjutas N, 20.11.2003 kell 01:38:

 Second, column type changes needing a nontrivial cast function should be 
 implemented in a way that preserve attnum. This could be done like this:
 - decompile dependent objects, and memorize them for later recreation
 - ADD tmpCol, UPDATE tmpCol=col::newtype, DROP old column, cascading to 
 dependent objects, RENAME tmpCol (known stuff)
 - restore old attnum, which is a simple UPDATE to pg_attribute at this stage

and suddenly your table is broken, as you can't retrieve the tmpCol when
the attnum points to the dropped old column which has data in the format
for old type ...

the whole point of separating attnum and attpos is that attnum is used
internally to retrieve the data and you can't change it by just
UPDATEing pg_attribute.

---
Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] ALTER COLUMN/logical column position

2003-11-20 Thread Andreas Pflug
Hannu Krosing wrote:

attnum is used internally to retrieve the data 

 

Oops...

So if an additional column number is invented, it should not be a 
logical column number, but a physical storage number for internal data 
retrieval. This way, the user interface doesn't change, and all those 
SELECT ... FROM pg_attribute ORDER BY attnum continue delivering the 
expected result.

Regards,
Andreas
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] PG7.4 ordering operator

2003-11-20 Thread strk
tgl wrote:
 strk [EMAIL PROTECTED] writes:
  Testing postgis support in PG7.4 (2003-11-11)
  I've encountered to this problem:
  ERROR:  could not identify an ordering operator for type geometry
  Previous PG versions does not show this problem.
  Any hint on what might be missing ?
 
 A default btree operator class for type geometry.  PG 7.4 no longer
 uses assumptions about operator names to determine sorting/grouping
 behavior.  If you have some operators that provide a scalar sort
 ordering on your datatype, then make a btree opclass to show that.
 See
 http://www.postgresql.org/docs/7.4/static/xindex.html#XINDEX-OPCLASS-DEPENDENCIES
 
   regards, tom lane

Thanks for the answer, I've one more question:
I've provided a default btree operator class but I'm often 
going out of memory when using DISTINCT or UNION clauses.

How can I reduce memory usage in these cases ?
Since passed argument are TOASTED, but I use only a small
initial portion if them to make the computation, can I 
avoid DETOASTING them and still reach that initial part ?

The information I need is stored at offset 40 from detoasted data
and is 6doubles long. I cannot find TOAST documentation.

thanks.

--strk;

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] tsearch2 patch for 7.4.1

2003-11-20 Thread Oleg Bartunov
Hi there,

we have a patch for contrib/tsearch2 we'd like to commit for 7.4.1.
Is it ok ?

We added support for compound words using ispell dictionaries.
It's rather important feature for agglutinative languages like
german, norwegian (tested). This work was sponsored by ABC Startsiden
(www.startsiden.no).

Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Andrew Dunstan
This whole thing is starting to make my head hurt. There has been more 
effort spent over this license issue than I would have spent if I hadn't 
taken the shortcut of using the FreeBSD code.

I think maybe the simplest thing is for me to prepare a patch that rips 
that code out and replaces it with a (slightly simpler - less umask 
hacking required, I think) piece of code that I will write.

I won't make the mistake of borrowing code again like this. (I naively 
thought using FreeBSD code would be safe, and was careful not to use any 
GPL code.)

cheers

andrew

Terry Lambert wrote:

Erik Trulsson wrote:
 

On Mon, Nov 17, 2003 at 02:48:08PM -0500, Rod Taylor wrote:
   

The PostgreSQL group has recently had a patch submitted with a snippet
of code from FreeBSDs src/bin/mkdir/mkdir.c.
http://www.freebsd.org/cgi/cvsweb.cgi/src/bin/mkdir/mkdir.c?annotate=1.27

Is this intentionally under the 4 clause license or does the copyright
from the website (2 clause) applied to everything that is non-contrib?
http://www.freebsd.org/copyright/freebsd-license.html
 

That copyright notice on the website should apply to everything that is
not under some other license.  Different parts of the system is under
different licenses and copyrights depending on who wrote it.
The mkdir.c *was* under the 4 clause license. However all material that
was part of the original BSDs and thus was copyrighted by The Regents
of the University of California has had its license changed such that
clause 3 (the advertising clause) no longer apply.
   

People seem to frequently misunderstand what a license is, and
more specifically, what the conversion from a 4 clause to a 3
clause license meant, in the case of the UCB License.
This change does not apply to derivative works, only to the
original code itself.
So if you went back and grabbed the mkdir.c code off the BSD
4.4-Lite2 CDROM, and used that, fine.
If you grabbed the mkdir.c off the FreeBSD sources, and even one
line was modified by someone, then it's a derivative work, and,
unless you can also get written permission from the contributor,
it stays under the license from which it was derived.
The announcement by the University only permits the change, it
does not mandate the change, for this very reason: otherwise
third party redistributed code would have sudddenly become
legally questionable.
By the same token, if you dual-license some code under th GPL
and another license, and someone gets the GPL'ed version, and
makes changes, unless thy specifically permit it, the code
contributed back is only licensed under the GPL.  This is why
SGI licensing the XFS code under the GPL was a stupid move: a
contributer contributing code back results in an improved code
base that can only be used under the terms of the GPL, and not
in SGI's commercial product offerings.  I believe that SGI did
not actually expect any significant or worthwhile bug fixes or
enhancements to come from the GPL'ed code using community.
In terms of getting written approval for the license change
from other contributors, this is basically the role that the
Regents of the University of California and the UCB CSRG were
fulfilling: a legal entity to whom such representations could
be made by contributors, and who could then legally forward
those representations to another.
FreeBSD has no such legal entity, at present.  The closest you
could come is perhaps the FreeBSD Foundation.  Had there been
a FreeBSD Foundation from day on, to whom rights could have
been assigned by contributors (turning it into The FreeBSD
Foundation and its Contributors), then the license would be
capable of being modified after the fact.
Without that, however, you must track down all of the individual
contributors to get the license changed.
My recommendation is to us the code off the 4.4 BSD-Lite2 CDROM,
if you can, live with the 4 clause license if the code contains
changes you need, if you can, or contact the contributors, if it
is a small enough job.  If none of those things will work for you,
then start with the 4.4 BSD-Lite2 CDROM code, convert to the 3
clause license, as permitted by the university, and then hack out
whatever modifications you ned on top of that for yourself.
-- Terry

 



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] ALTER COLUMN/logical column position

2003-11-20 Thread Dave Cramer
Which is what started the whole discussion.

Dave
On Thu, 2003-11-20 at 04:40, Andreas Pflug wrote:
 Hannu Krosing wrote:
 
 
 attnum is used internally to retrieve the data 
 
   
 
 Oops...
 
 So if an additional column number is invented, it should not be a 
 logical column number, but a physical storage number for internal data 
 retrieval. This way, the user interface doesn't change, and all those 
 SELECT ... FROM pg_attribute ORDER BY attnum continue delivering the 
 expected result.
 
 Regards,
 Andreas
 
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] ALTER COLUMN/logical column position

2003-11-20 Thread Hannu Krosing
Andreas Pflug kirjutas N, 20.11.2003 kell 11:40:
 Hannu Krosing wrote:
 
 
 attnum is used internally to retrieve the data 
 
   
 
 Oops...
 
 So if an additional column number is invented, it should not be a 
 logical column number, but a physical storage number for internal data 
 retrieval. 

You are just shifting the interface problems to a place needing way more
changes in the backend. There will be some problems either way.

also, tools needing knowledge should start using information schema as
much as they can, making internal reshufflings less of a problem.

 This way, the user interface doesn't change, and all those 
 SELECT ... FROM pg_attribute ORDER BY attnum continue delivering the 
 expected result.

Depending on what you expect ;)

If you expect the above to give you all active columns as orderd as they
are stored, then it does not give you what you expect.

Btw, most of these concerns (and more) were already iterated when DROP
column was done causing gaps in attnum. There were a lot of doomsday
profecies, but in the end it went quite smoothly. The tools needing
internal knowledge about storage (meaning any tool doing select .. from
pg_...) have always needed some upgrades for new verions.

IMHO, The only behaviour visible to common user we should worry about is
SELECT * , and a special column for solving this is _the_ easiest way to
do it.

-
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Rod Taylor
 I think maybe the simplest thing is for me to prepare a patch that rips 
 that code out and replaces it with a (slightly simpler - less umask 
 hacking required, I think) piece of code that I will write.

The FreeBSD folks sorted it out for us.

Everyones names should be in the copyright for the file. The licence
portion should be the 3 clause version -- no advertising clause.

I think borrowing should be encouraged, and now that we know what
license / copyright we need to carry over, this can be done without
worry.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] logical column position

2003-11-20 Thread Manfred Koizar
On Wed, 19 Nov 2003 19:07:23 +0100, Andreas Pflug
[EMAIL PROTECTED] wrote:
is there 
any DB system out there that allows to reshuffle the column ordering?

Firebird:
ALTER TABLE tname ALTER COLUMN cname POSITION 7;

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] question about fixes in v7.4...

2003-11-20 Thread Robert Treat
On Wed, 2003-11-19 at 22:50, Bruce Momjian wrote:
 Don Sceifers wrote:
  My company is fairly new at Postgresql, but we have hit a problem, where
  we modify a table using ALTER, and our stored procedures stop working. We
  have a grasp as to why this happens, but I was wondering if this v7.4
  upgrade fixes this issue?
 
 This is a known issue.  There is no way for us to know what stored
 tables/columns you are using in your function.  You might find that
 reconnecting to the database after the ALTER will fix the problem.  We
 do cache function code, so reconnecting will erase that cache.
 

Would it be worth putting in a partial implementation? I'm thinking
specifically about functions that return table types. Could they be
hooked up with dependency information and made to complain? 

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] question about fixes in v7.4...

2003-11-20 Thread Glenn Wiorek
  Don Sceifers wrote:
   My company is fairly new at Postgresql, but we have hit a problem,
where
   we modify a table using ALTER, and our stored procedures stop working.
We
   have a grasp as to why this happens, but I was wondering if this v7.4
   upgrade fixes this issue?

Not sure if this applies with Postgres but did you also recompile the stored
procedures?  If I remember correctly Sybase and Informix (?) required store
procedured to be recompiled after DDL to tables they used, and Oracle often
invalidates any functions, views, packages and procedures if they have any
relation to the modified table.

- Original Message - 
From: Robert Treat [EMAIL PROTECTED]
To: Bruce Momjian [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, November 20, 2003 7:52 AM
Subject: Re: [HACKERS] question about fixes in v7.4...


 On Wed, 2003-11-19 at 22:50, Bruce Momjian wrote:
  Don Sceifers wrote:
   My company is fairly new at Postgresql, but we have hit a problem,
where
   we modify a table using ALTER, and our stored procedures stop working.
We
   have a grasp as to why this happens, but I was wondering if this v7.4
   upgrade fixes this issue?
 
  This is a known issue.  There is no way for us to know what stored
  tables/columns you are using in your function.  You might find that
  reconnecting to the database after the ALTER will fix the problem.  We
  do cache function code, so reconnecting will erase that cache.
 

 Would it be worth putting in a partial implementation? I'm thinking
 specifically about functions that return table types. Could they be
 hooked up with dependency information and made to complain?

 Robert Treat
 -- 
 Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?

http://archives.postgresql.org





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] PG7.4 ordering operator

2003-11-20 Thread Tom Lane
strk [EMAIL PROTECTED] writes:
 I've provided a default btree operator class but I'm often 
 going out of memory when using DISTINCT or UNION clauses.

 How can I reduce memory usage in these cases ?
 Since passed argument are TOASTED, but I use only a small
 initial portion if them to make the computation, can I 
 avoid DETOASTING them and still reach that initial part ?

No, I don't think so, but see PG_FREE_IF_COPY.  Operators used
in btree indexes are expected not to leak memory.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Build farm

2003-11-20 Thread Andrew Dunstan
Peter Eisentraut wrote:

Andrew Dunstan writes:

 

Essentially what I have is something like this pseudocode:

 cvs update
   

Be sure check past branches as well.

 

 check if there really was an update and if not exit
   

OK.

 

 configure; get config.log
   

Ideally, you'd try all possible option combinations for configure.  Or at
least enable everything.
I have had in mind from the start doing multiple configurations and 
multiple branches.

Right now I'm working only with everything/head, but will make provision 
for multiple sets of both.

How many branches back do you think should we go? Right now I'd be 
inclined only to do REL7_4_STABLE and HEAD as a default. Maybe we could 
set the default to be gettable from the web server so that as new 
releases come along build farm members using the default wouldn't need 
to make any changes.

However, everything would also be settable locally on each build farm 
member in an options file.

 

 make 21 | make-filter makelog
 make check 21 | check-filter  checklog
   

You could also try out make distcheck.  It tries out the complete build,
installation, uninstallation, regression test, and distribution building.
 

OK.

 

 (TBD) send config status, make status, check status, logfiles
   

OK.

 

 make distclean
   

When I played around with this, always copied the CVS tree to a new
directory and deleted that one at the end.  That way, bugs in the clean
procedure (known to happen) don't trip up the whole process.
 

OK. We've also seen odd problems with cvs update, I seem to recall, 
but I'd rather avoid having to fetch the entire tree for each run, to 
keep bandwidth use down. (I believe cvs update should be fairly 
reliable if there are no local changes, which would be true in this 
instance).

 

The send piece will probably be a perl script using LWP and talking to a
CGI script.
   

That will be the difficult part to organize, if it's supposed to be
distributed and autonomous.
 

sending the results won't be a huge problem - storing and displaying 
them nicely will be a bit more fun :-)

Upload of results would be over authenticated SSL to prevent spurious 
results being fed to us - all you would need to join the build farm 
would be a username/password from the buildfarm admin.

Thanks for your input

cheers

andrew



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Andrew Dunstan
Rod Taylor wrote:

I think maybe the simplest thing is for me to prepare a patch that rips 
that code out and replaces it with a (slightly simpler - less umask 
hacking required, I think) piece of code that I will write.
   

The FreeBSD folks sorted it out for us.

Everyones names should be in the copyright for the file. The licence
portion should be the 3 clause version -- no advertising clause.


Whose names? It's not easily discoverable from browsing the CVS tree. I 
would probably spend as much time, if not more, finding out as in 
rewriting the 40 or so lines of code required.

I think borrowing should be encouraged, and now that we know what
license / copyright we need to carry over, this can be done without
worry.
 

I'm a fan of borrowing, but not if it causes headaches.

I'll hold off for a bit in case I've misunderstood something.

trollOf course, now that SCO is claiming ownership of BSD code . 
/troll

cheers

andrew

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] tsearch2 patch for 7.4.1

2003-11-20 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes:
 we have a patch for contrib/tsearch2 we'd like to commit for 7.4.1.
 Is it ok ?

 We added support for compound words using ispell dictionaries.
 It's rather important feature for agglutinative languages like
 german, norwegian (tested). This work was sponsored by ABC Startsiden
 (www.startsiden.no).

This strikes me as a feature addition, and therefore not appropriate for
the 7.4 branch.  You should only commit bug fixes into stable branches,
not feature additions.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] ALTER COLUMN/logical column position

2003-11-20 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 You are just shifting the interface problems to a place needing way more
 changes in the backend. There will be some problems either way.

Exactly.  I'm considerably more worried about breaking out-of-the-way
places in the backend than I am about what order someone's admin tool
presents the columns in.

 Btw, most of these concerns (and more) were already iterated when DROP
 column was done causing gaps in attnum. There were a lot of doomsday
 profecies, but in the end it went quite smoothly.

That is a good comparison point.  I'm inclined to think that we should
do it in a way that minimizes backend changes.  The way to do that is
to keep attnum with its current definition (physical position) and add
a new column for the logical position, which only a small number of
places will need to care about.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] ALTER COLUMN/logical column position

2003-11-20 Thread Peter Eisentraut
Tom Lane writes:

 Exactly.  I'm considerably more worried about breaking out-of-the-way
 places in the backend than I am about what order someone's admin tool
 presents the columns in.

Clearly, the effort of adding logical column numbers will consist of
making choices between physical and logical numbers in the backend in some
places.  So one option is to replace some uses of attnum by attlognum.
The other optionis to replace *all* uses of attnum by attphysnum and then
replace some uses of attphysnum by attnum.  To me, this looks like an
equal risk as far as the backend goes.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] logical column position

2003-11-20 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 At present, attnum basically does three things: identifies an column
 within a relation, indicates which columns are system columns, and
 defines the order of a relation's columns. I'd like to move this last
 functionality into a separate pg_attribute column named attpos (or
 attlogicalpos):

attpos is a horrid choice of name, because no one will be able to
remember which of attnum and attpos is which.  Pick a more distinct
name.  Offhand the best thing I can think of is attlognum or attlogpos.

  - when the table is created, attnum == attpos. System columns
have attpos  0, as with attnum. At no point will two
columns of the same relation have the same attpos.

What are you going to do with deleted columns?  I'd be inclined to give
them all attlogpos = 0, but that destroys your last comment.

   (a) ISTM this should also apply to COPY TO and COPY FROM if the user
   didn't supply a column list. Is this reasonable?

Yes, also INSERT INTO, also the implicit ordering of output columns of a
JOIN, also the matching of aliases to columns in a FROM-list alias,
probably one or two other places.  SQL exposes column ordering in more
places than just SELECT *.

   If we want to avoid this, one easy (but arguably unclean) way to
   do so would be to make the initial value of attpos == attnum *
   1000, and make attpos an int4 rather than an int2. Then, we can
   do most column reordering operations with only a single
   pg_attribute update -- in the worst-case that enough
   re-orderings are done that we overflow the 999 padding
   positions, we can just fall-back to doing multiple pg_attribute
   updates. Is this worth doing, and/or is there a better way to
   achieve the same effect?

That seems horribly messy.  Just renumber.

   (c) Do I need to consider inheritance?

Yes.  I think it'd be good if things were constrained so that columns
1..n in a parent table always matched columns 1..n in every child,
which is not true now after adding/dropping columns.  That would make it
easier/cheaper/more reliable to match up which child columns are to be
referenced in an inherited query (see adjust_inherited_attrs).  I think
the effective constraints would have to be about the same as what we now
impose on column names in an inheritance hierarchy.

You have not presented any proposal for exactly what ALTER TABLE
operations would be offered to manipulate the column positions.
My recollection is that some consensus was reached on that point
in the last thread we had on this issue --- have you consulted the
archives?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ALTER COLUMN/logical column position

2003-11-20 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane writes:
 Exactly.  I'm considerably more worried about breaking out-of-the-way
 places in the backend than I am about what order someone's admin tool
 presents the columns in.

 Clearly, the effort of adding logical column numbers will consist of
 making choices between physical and logical numbers in the backend in some
 places.  So one option is to replace some uses of attnum by attlognum.
 The other optionis to replace *all* uses of attnum by attphysnum and then
 replace some uses of attphysnum by attnum.  To me, this looks like an
 equal risk as far as the backend goes.

This would be a reasonable assessment if we had our hands on every line
of backend code that exists.  But you are neglecting the probability of
breaking user-written C functions, PL languages outside the main distro,
etc.

If we were going to go about this in a way that does not localize the
changes, I'd be inclined to use attlognum and attphysnum ... that
is, *deliberately* break every use that hasn't been looked at and
updated.  Even that would not guarantee catching all the trouble spots;
for example loop indexes and attnums passed as function parameters might
not have names that would be caught by a simplistic search-and-replace
update.

I'm for localizing the changes.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] logical column position

2003-11-20 Thread Alvaro Herrera Munoz
On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote:

(c) Do I need to consider inheritance?
 
 Yes.  I think it'd be good if things were constrained so that columns
 1..n in a parent table always matched columns 1..n in every child,
 which is not true now after adding/dropping columns.  That would make it
 easier/cheaper/more reliable to match up which child columns are to be
 referenced in an inherited query (see adjust_inherited_attrs).

No way, because of multiple inheritance.  Each child should have an
attparentnum, which would point to the parent's attnum for this to work ...

-- 
Alvaro Herrera ([EMAIL PROTECTED])
Aprender sin pensar es inĂștil; pensar sin aprender, peligroso (Confucio)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] logical column position

2003-11-20 Thread Tom Lane
Alvaro Herrera Munoz [EMAIL PROTECTED] writes:
 On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote:
 (c) Do I need to consider inheritance?
 
 Yes.  I think it'd be good if things were constrained so that columns
 1..n in a parent table always matched columns 1..n in every child,
 which is not true now after adding/dropping columns.

 No way, because of multiple inheritance.  Each child should have an
 attparentnum, which would point to the parent's attnum for this to work ...

Hm, good point.  And I think we merge identically-named columns
inherited from different parents, which would mean that attparentnum
wouldn't have a unique value anyway.

Perhaps rearranging a parent's columns shouldn't have *any* direct
effect on a child?  Seems ugly though.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [BUGS] 7.4: CHAR padding inconsistency

2003-11-20 Thread Bruce Momjian

Well, that certainly is interesting.  Oracle and MS-SQL preserve the
trailing space when concatenating.  Does anyone remember the logic for
trimming space with ||?

---

William ZHANG wrote:
 
 Bruce said:
  How do other databases handle this?
 
 I have tried on MS SQL Server 2000 and Oracle 9i for Windows.
 SQL Server doesn't like character_length and || , so use len and + instead.
 Oracle doesn't like character_length either, use length.
 Hope the result may help.
 
 create table chartest(col char(10) not null);
 insert into chartest values ('AAA');
 
 PostgreSQL:
 select character_length(col) from chartest;
 10
 SQL Server
 select len(col) from chartest;
 3
 Oracle
 select length(col) from chartest;
 10
 
 PostgreSQL:
 select character_length(col || 'hey') from chartest;
 6
 SQL Server:
 select len(col + 'hey') from chartest;
 13
 Oracle:
 select length(col || 'hey') from chartest;
 13
 
 PostgreSQL:
 select 'aaa   ' || 'bb';
 aaa   bb
 SQL Server:
 select 'aaa   ' + 'bb';
 aaa   bb
 Oracle:
 select 'aaa   ' || 'bb' from dual;
 aaa   bb
 
 PostgreSQL:
 select cast('aa   ' as char(10)) || 'b';
 aab
 SQL Server:
 select cast('aa   ' as char(10)) + 'b';
 aab
 Oracle:
 select cast('aa   ' as char(10)) || 'b' from dual;
 aab
 
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Bruce Momjian
Rod Taylor wrote:
  I think maybe the simplest thing is for me to prepare a patch that rips 
  that code out and replaces it with a (slightly simpler - less umask 
  hacking required, I think) piece of code that I will write.
 
 The FreeBSD folks sorted it out for us.
 
 Everyones names should be in the copyright for the file. The licence
 portion should be the 3 clause version -- no advertising clause.
 
 I think borrowing should be encouraged, and now that we know what
 license / copyright we need to carry over, this can be done without
 worry.

Agreed --- don't get too focussed on it --- they are all pretty much the
same, and if someone complains, we will rip it out and replace it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Marc G. Fournier
On Thu, 20 Nov 2003, Andrew Dunstan wrote:

 trollOf course, now that SCO is claiming ownership of BSD code .
 /troll

Interesting thread that ... last I read on the FreeBSD lists was
speculation that they would be going after ppl like Cisco (re: TCP/IP
Networking Code) since there really is nobody else large enough to bother
with ... its going to be interesting to see :)


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] [BUGS] 7.4: CHAR padding inconsistency

2003-11-20 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Well, that certainly is interesting.  Oracle and MS-SQL preserve the
 trailing space when concatenating.  Does anyone remember the logic for
 trimming space with ||?

trimming space with || is a completely inaccurate description of
what's happening.

7.4 trims spaces from char(n) data when converting it to text (or
varchar).  Since we don't have a bpchar version of ||, only a text
version, the implicitly invoked conversion is what's making the
difference.

AFAICS the Oracle and SQL Server behaviors are at least as inconsistent
as our own.  If trailing spaces are significant during concatenation,
why aren't they significant to LENGTH()?  I can't see a justification
in the SQL spec for handling one case differently from the other.

Actually the SQL spec is pretty inconsistent itself.  It's clear that
trailing spaces are insignificant in comparisons, if you are using a
PAD SPACE collation which I think is the implication of CHAR(n), but
I don't see anything that says that they are insignificant for other
purposes such as LENGTH() and concatenation.

I'd agree with changing bpcharlen() to not count trailing spaces,
I think.  That would be consistent with ignoring them in other contexts.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] tsearch2 patch for 7.4.1

2003-11-20 Thread Hannu Krosing
Tom Lane kirjutas N, 20.11.2003 kell 17:18:
 Oleg Bartunov [EMAIL PROTECTED] writes:
  we have a patch for contrib/tsearch2 we'd like to commit for 7.4.1.
  Is it ok ?
 
  We added support for compound words using ispell dictionaries.
  It's rather important feature for agglutinative languages like
  german, norwegian (tested). This work was sponsored by ABC Startsiden
  (www.startsiden.no).
 
 This strikes me as a feature addition, and therefore not appropriate for
 the 7.4 branch.  You should only commit bug fixes into stable branches,
 not feature additions.

Is this so even for contrib ?

I seem to remember that there have been new feature popping up in
contrib in stable branches earlier.

But if the general direction is to stabilise contrib as well, what about
committing it as a separate patch file 
contrib/tsearch2/patches/compond-words-patch.1 , so that these people
(or german/norvegian linux distributors) who consider lack of support
for compound words a bug can apply it themselves before building ?

---
Hannu

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [BUGS] 7.4: CHAR padding inconsistency

2003-11-20 Thread Troels Arvin
On Thu, 20 Nov 2003 12:40:30 -0500, Tom Lane wrote:

 AFAICS the Oracle and SQL Server behaviors are at least as inconsistent
 as our own.

 If trailing spaces are significant during concatenation,
 why aren't they significant to LENGTH()?

Oracle _does_ count the trailing spaces in it's LENGTH()-function. MSSQL's
rules certainly look strange.

 I'd agree with changing bpcharlen() to not count trailing spaces,
 I think.  That would be consistent with ignoring them in other contexts.

Why not just change the behaviour back to what it used to be like? I see
no justification for the change: It may break old queries in subtle ways
and doesn't make the CHAR-type any more consistent than before.

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] 7.4 logging bug.

2003-11-20 Thread Kurt Roeckx
I just installed a 7.4 on windows/cygwin.  I restored a dump but
ran out of disk space during the creating of an index.  In psql I
saw the ERROR: could not extend relation .

From that point on it seems to have stopped logging most things.

The ERROR and HINT are not in the log file, it stopped showing
the executed commands, and duration.  The only thing it
still shows is the recycled transation log file messages.

In psql I still see all the messages, like the notice that adding
a primary key will create an index.


Kurt


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] 7.4: CHAR padding inconsistency

2003-11-20 Thread Troels Arvin
(I already responded to Bruce, but the response seems to be stuck in some
mailing list, waiting for approval. Trying again:)

On Wed, 19 Nov 2003 14:50:24 -0500, on the BUGS-liste, Bruce Momjian wrote:

 Anyway, what did you want it to output?  AAA   hey?  We could do
 that, but I assume most people wouldn't expect that output?

I certainly depends on their background. Personally, the padding
characteristics of the CHAR type was one of the first things about SQL
that I learned (the hard way). Oracle and DB2 people should be used to
PostgreSQL's old behaviour.

The CHAR type may seem strange to some, but they may then just
use VARCHAR.

 How do other databases handle this?

I've started writing about it here:
http://troels.arvin.dk/db/rdbms/#data_types-char

Some of my test-material is also online:
http://troels.arvin.dk/db/tests/chartest-20031119a/

My summary:

With regard to CHAR-handling, PostgreSQL 7.4 is now in opposition to
 - previous versions of PostgreSQL; bad enough on its own,
   because there doesn't seem to have been a good discussion
   about it first - I can only find a few messages about it [1]
 - DB2
 - Oracle
 - MSSQL (which also behaves in a non-standard way,
   but different from PostgreSQL 7.4)
7.4 is close to how MySQL works, though.

I'm sorry about not testing this before 7.4 went gold, but I believe that
this is a bug which should be corrected before too much confusion
is created.


Reference 1:
An interesting one is this one:
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/10958/match=char+padding


-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Joshua D. Drake
Based on the below wouldn't they also have to go after Microsoft?

Marc G. Fournier wrote:

On Thu, 20 Nov 2003, Andrew Dunstan wrote:

 

trollOf course, now that SCO is claiming ownership of BSD code .
/troll
   

Interesting thread that ... last I read on the FreeBSD lists was
speculation that they would be going after ppl like Cisco (re: TCP/IP
Networking Code) since there really is nobody else large enough to bother
with ... its going to be interesting to see :)

Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match
 

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5

2003-11-20 Thread Tom Lane
Mark Wong [EMAIL PROTECTED] writes:
 I've changed all the numerics to integers and reals, where it was appropriate
 to maintain the precision specificed in the TPC-C spec.
 ...
 Looks like I see about an 8% improvement in the metric with this instance.  A
 definite decrease in user time in the processor utilization chart, which I
 presume is attributed to the 70% decrease in ticks to SearchCatCache in the
 database as reported by oprofile.  Can anyone explain that one?

That's odd; I don't see why NUMERIC would be incurring extra
SearchCatCache calls.  Does your profile provide info about where the
SearchCatCache calls are coming from?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Marc G. Fournier
On Thu, 20 Nov 2003, Joshua D. Drake wrote:

 Based on the below wouldn't they also have to go after Microsoft?

Depends ... does MicroSoft use BSD TCP/IP, or did they write their own?  I
know that Linux is not using BSD TCP/IP (or, at least, they didn't in
their first 3 incarnations of the stack) ...

 
 Marc G. Fournier wrote:

 On Thu, 20 Nov 2003, Andrew Dunstan wrote:
 
 
 
 trollOf course, now that SCO is claiming ownership of BSD code .
 /troll
 
 
 
 Interesting thread that ... last I read on the FreeBSD lists was
 speculation that they would be going after ppl like Cisco (re: TCP/IP
 Networking Code) since there really is nobody else large enough to bother
 with ... its going to be interesting to see :)
 
 
 Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
 Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 
 

 --
 Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
 Postgresql support, programming shared hosting and dedicated hosting.
 +1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com
 Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org





Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Andrew Dunstan
Marc G. Fournier wrote:

On Thu, 20 Nov 2003, Joshua D. Drake wrote:

 

Based on the below wouldn't they also have to go after Microsoft?
   

Depends ... does MicroSoft use BSD TCP/IP, or did they write their own?  I
know that Linux is not using BSD TCP/IP (or, at least, they didn't in
their first 3 incarnations of the stack) ...


M$ have used BSD code in the past, I'm fairly sure, even if they no 
longer do. But they do have a license from SCO.

OK, I know I shouldn't have raised this topic ...

cheers

andrew

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Release cycle length

2003-11-20 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 On Tue, 18 Nov 2003, Peter Eisentraut wrote:
 The time from release 7.3 to release 7.4 was 355 days, an all-time high.
 We really need to shorten that.

 I don't see much of a point for a shorter release cycle as long as we 
 don't get rid of the initdb requirement for releases that don't change 
 the system catalog structure. All we gain from that is spreading out the 
 number of different versions used in production.

Yeah, I think the main issue in all this is that for real production
sites, upgrading Postgres across major releases is *painful*.  We have
to find a solution to that before it makes sense to speed up the
major-release cycle.

By the same token, I'm not sure that there's much of a market for
development releases --- people who find a 7.3-7.4 upgrade painful
aren't going to want to add additional upgrades to incompatible
intermediate states.  If we could fix that, there'd be more interest.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Larry Rosenman


--On Thursday, November 20, 2003 16:00:44 -0400 Marc G. Fournier 
[EMAIL PROTECTED] wrote:

On Thu, 20 Nov 2003, Joshua D. Drake wrote:

Based on the below wouldn't they also have to go after Microsoft?
Depends ... does MicroSoft use BSD TCP/IP, or did they write their own?  I
know that Linux is not using BSD TCP/IP (or, at least, they didn't in
their first 3 incarnations of the stack) ...
M$ also bought a LARGE license from SCO...

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


pgp0.pgp
Description: PGP signature


Re: [HACKERS] 4 Clause license?

2003-11-20 Thread Joshua D. Drake
Hello,

 My understanding is that they use the BSD stack (at least as the 
basis) for TCP/IP. Windows that is.

J

Marc G. Fournier wrote:

On Thu, 20 Nov 2003, Joshua D. Drake wrote:

 

Based on the below wouldn't they also have to go after Microsoft?
   

Depends ... does MicroSoft use BSD TCP/IP, or did they write their own?  I
know that Linux is not using BSD TCP/IP (or, at least, they didn't in
their first 3 incarnations of the stack) ...

 

Marc G. Fournier wrote:

   

On Thu, 20 Nov 2003, Andrew Dunstan wrote:



 

trollOf course, now that SCO is claiming ownership of BSD code .
/troll
   

Interesting thread that ... last I read on the FreeBSD lists was
speculation that they would be going after ppl like Cisco (re: TCP/IP
Networking Code) since there really is nobody else large enough to bother
with ... its going to be interesting to see :)

Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
 

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org


   


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
 

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Release cycle length

2003-11-20 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes:
 ... That's why the release methodology used by the Linux kernel development
 team is a reasonable one.

I do not think we have the manpower to manage multiple active
development branches.  The Postgres developer community is a fraction of
the size of the Linux community; if we try to adopt what they do we'll
just drown in work.  It's hard enough to deal with the existing level of
commitment to back-patching one stable release --- I know that we miss
back-patching bug fixes that probably should have been back-patched.
And the stuff that does get back-patched isn't really tested to the
level that it ought to be, which discourages us from applying fixes
to the stable branch if they are too large to be obviously correct.
I don't see manpower emerging from the woodwork to fix those problems.

If we were doing active feature development in more than one branch
I think our process would break down completely.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Release cycle length

2003-11-20 Thread Tom Lane
Larry Rosenman [EMAIL PROTECTED] writes:
 [EMAIL PROTECTED] wrote:
 1. Start platform testing on day 1 of beta.  Last minute fixes for AIX or
 UnixWare are really becoming old jokes.

 The only reason we had last minute stuff for UnixWare this time was the
 timing of PG's release and the UP3 release from SCO.

Yes.  The late fixes for OS X also arose from the fact that Apple
released a new OS X version late in our beta cycle.  I don't think it's
reasonable to complain that there was insufficient port testing done
earlier; the issues didn't come from that.

I do agree with the opinion that our beta cycles are getting too long,
and that it's not clear we are getting any additional reliability out
of the longer time period.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Shridhar Daithankar
Josh Berkus wrote:

Shridhar,
 However I do not agree with this logic entirely. It pegs the next vacuum
w.r.t current table size which is not always a good thing.


No, I think the logic's fine, it's the numbers which are wrong.   We want to 
vacuum when updates reach between 5% and 15% of total rows.   NOT when 
updates reach 110% of total rows ... that's much too late.
Well, looks like thresholds below 1 should be norm rather than exception.

Hmmm ... I also think the threshold level needs to be lowered; I guess the 
purpose was to prevent continuous re-vacuuuming of small tables?  
Unfortunately, in the current implementation, the result is tha small tables 
never get vacuumed at all.

So for defaults, I would peg -V at 0.1 and -v at 100, so our default 
calculation for a table with 10,000 rows is:

100 +  ( 0.1 * 10,000 ) = 1100 rows.
I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that 
thresholds less than 1 should be used.

Furthermore analyze threshold depends upon inserts+updates. I think it
should also depends upon deletes for obvious reasons.
Yes.  Vacuum threshold is counting deletes, I hope?
It does.

My comment about the frequency of vacuums vs. analyze is that currently the 
*default* is to analyze twice as often as you vacuum.Based on my 
experiece as a PG admin on a variety of databases, I believe that the default 
should be to analyze half as often as you vacuum.
OK.

I am all for experimentation. If you have real life data to play with, I
can give you some patches to play around.
I will have real data very soon .
I will submit a patch that would account deletes in analyze threshold. Since you 
want to delay the analyze, I would calculate analyze count as

n=updates + inserts *-* deletes

Rather than current n = updates + inserts. Also update readme about examples 
and analyze frequency.

What does statistics gather BTW? Just number of rows or something else as well? 
I think I would put that on Hackers separately.

I am still wary of inverting vacuum analyze frequency. You think it is better to 
set inverted default rather than documenting it?

 Shridhar

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Matthew T. O'Connor
Shridhar Daithankar wrote:

Josh Berkus wrote:

Shridhar,
 However I do not agree with this logic entirely. It pegs the next 
vacuum

w.r.t current table size which is not always a good thing.

Ok, what do you recommend?  The point of two separate variables allows 
you to specify if you want vacuum based on a fixed number, based on 
table size or something inbetween.

No, I think the logic's fine, it's the numbers which are wrong.   We 
want to vacuum when updates reach between 5% and 15% of total rows.   
NOT when updates reach 110% of total rows ... that's much too late.

For small tables,  you don't need to vacuum too often.  In the testing I 
did a small table ~100 rows, didn't really show significant performance 
degredation until it had close to 1000 updates.  For large tables, 
vacuum is so expensive, that you don't want to do it very often, and  
scanning the whole table when there is only 5% wasted space is not very 
helpful.

Hmmm ... I also think the threshold level needs to be lowered; I 
guess the purpose was to prevent continuous re-vacuuuming of small 
tables?  Unfortunately, in the current implementation, the result is 
tha small tables never get vacuumed at all.

So for defaults, I would peg -V at 0.1 and -v at 100, so our default 
calculation for a table with 10,000 rows is:

100 +  ( 0.1 * 10,000 ) = 1100 rows.

Yes, the I set the defaults a little high perhaps so as to err on the 
side of caution.  I didn't want people to say pg_autovacuum kills the 
performance of my server.  A small table will get vacuumed, just not 
until it has reached the threshold.  So a table with 100 rows, will get 
vacuumed after 1200 updates / deletes.  In my testing it showed that 
there was no major performance problems  until you reached several 
thousand updates / deletes.

Furthermore analyze threshold depends upon inserts+updates. I think it
should also depends upon deletes for obvious reasons.
Yes.  Vacuum threshold is counting deletes, I hope?
It does.

My comment about the frequency of vacuums vs. analyze is that 
currently the *default* is to analyze twice as often as you 
vacuum.Based on my experiece as a PG admin on a variety of 
databases, I believe that the default should be to analyze half as 
often as you vacuum.

HUH?  analyze is very very cheap compared to vacuum.  Why not do it more 
often?

I am all for experimentation. If you have real life data to play 
with, I
can give you some patches to play around.
I will have real data very soon .
I will submit a patch that would account deletes in analyze threshold. 
Since you want to delay the analyze, I would calculate analyze count as
deletes are already accounted for in the analyze threshold.

I am still wary of inverting vacuum analyze frequency. You think it is 
better to set inverted default rather than documenting it?
I think inverting the vacuum and analyze frequency is wrong.  

What I think I am hearing is that people would like very much to be able 
to tweak the settings of pg_autovacuum for individual tables / databases 
etc.  So that you could set certain tables to be vacuumed more 
agressivly than others.  I agree this would be a good and welcome 
addition.  I hope have time to work on this at some point, but in the 
near future I won't.

Matthew



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Shridhar Daithankar
On Thursday 20 November 2003 20:00, Matthew T. O'Connor wrote:
 Shridhar Daithankar wrote:
  I will submit a patch that would account deletes in analyze threshold.
  Since you want to delay the analyze, I would calculate analyze count as

 deletes are already accounted for in the analyze threshold.

Yes. My bad. Deletes are not accounted in initializing analyze count but later 
they are used.

  I am still wary of inverting vacuum analyze frequency. You think it is
  better to set inverted default rather than documenting it?

 I think inverting the vacuum and analyze frequency is wrong.

Me. Too. ATM all I can think of this patch attached. Josh, is it sufficient 
for you?..:-)

Matthew, I am confyused about one thing. Why would autovacuum count updates 
while checking for analyze threshold? Analyze does not change statistics 
right? ( w.r.t line 1072, pg_autovacuum.c). For updating statistics, only 
inserts+deletes should suffice, isn't it?

Other than that, I think autovacuum does everything it can.

Comments?

 Shridhar
*** README.pg_autovacuum.orig	Thu Nov 20 19:58:29 2003
--- README.pg_autovacuum	Thu Nov 20 20:26:39 2003
***
*** 141,150 
  depending on the mixture of table activity (insert, update, or
  delete):
  
! - If the number of (inserts + updates + deletes)  AnalyzeThreshold, then
only an analyze is performed.
  
! - If the number of (deletes + updates)  VacuumThreshold, then a
vacuum analyze is performed.
  
  VacuumThreshold is equal to:
--- 141,150 
  depending on the mixture of table activity (insert, update, or
  delete):
  
! - If the number of (inserts + updates + deletes) = AnalyzeThreshold, then
only an analyze is performed.
  
! - If the number of (deletes + updates) = VacuumThreshold, then a
vacuum analyze is performed.
  
  VacuumThreshold is equal to:
***
*** 158,163 
--- 158,186 
  and running ANALYZE more often should not substantially degrade system
  performance.
  
+ Examples:
+ 
+ Following table shows typical usage of pg_autovacuum settings.
+ These are put here so that a DBA can have some starting point while
+ tuning pg_autovacuum.
+ 
+ Vacuum is triggered by updates and deletes. So in case of vacuum,
+ last column indicates total of updates and deletes required
+ to trigger vacuum. In case of analyze, the operations would count total
+ number of inserts, updates and deletes.
+ 
+ Threshold	Scaling factor	Records		No. of Operations
+ 1,000		1		10,000		11,000
+ 1,000		2		10,000		21,000
+ 1,000		0.5		10,000		 6,000
+ 1,000		0.1		10,000		 2,000
+ 
+ Although analyze is cheaper operation compared to vacuum,
+ it might be needed less often. The default is to analyze twice as much as
+ vacuum but that might be too aggressive for some installations. It is advised that
+ such installation tune their analyze threshold separately, rather than relying upon
+ the default behaviour.
+ 
  Sleeping:
  -
  

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Shridhar Daithankar
On Thursday 20 November 2003 20:29, Shridhar Daithankar wrote:
 On Thursday 20 November 2003 20:00, Matthew T. O'Connor wrote:
  Shridhar Daithankar wrote:
   I will submit a patch that would account deletes in analyze threshold.
   Since you want to delay the analyze, I would calculate analyze count as
 
  deletes are already accounted for in the analyze threshold.

 Yes. My bad. Deletes are not accounted in initializing analyze count but
 later they are used.

   I am still wary of inverting vacuum analyze frequency. You think it is
   better to set inverted default rather than documenting it?
 
  I think inverting the vacuum and analyze frequency is wrong.

 Me. Too. ATM all I can think of this patch attached. Josh, is it sufficient
 for you?..:-)

use this one. A warning added for too aggressive vacuumming. If it is OK by 
everybody, we can send it to patches list.

 Shridhar
*** README.pg_autovacuum.orig	Thu Nov 20 19:58:29 2003
--- README.pg_autovacuum	Thu Nov 20 20:35:34 2003
***
*** 141,163 
  depending on the mixture of table activity (insert, update, or
  delete):
  
! - If the number of (inserts + updates + deletes)  AnalyzeThreshold, then
only an analyze is performed.
  
! - If the number of (deletes + updates)  VacuumThreshold, then a
vacuum analyze is performed.
  
  VacuumThreshold is equal to:
! vacuum_base_value + (vacuum_scaling_factor * number of tuples in the table)
! 
  AnalyzeThreshold is equal to:
! analyze_base_value + (analyze_scaling_factor * number of tuples in the table)
! 
  The AnalyzeThreshold defaults to half of the VacuumThreshold since it
  represents a much less expensive operation (approx 5%-10% of vacuum),
  and running ANALYZE more often should not substantially degrade system
  performance.
  
  Sleeping:
  -
  
--- 141,191 
  depending on the mixture of table activity (insert, update, or
  delete):
  
! - If the number of (inserts + updates + deletes) = AnalyzeThreshold, then
only an analyze is performed.
  
! - If the number of (deletes + updates) = VacuumThreshold, then a
vacuum analyze is performed.
  
  VacuumThreshold is equal to:
! vacuum_base_value + (vacuum_scaling_factor * number of tuples in the 
! table)
  AnalyzeThreshold is equal to:
! analyze_base_value + (analyze_scaling_factor * number of tuples in the 
! table)
  The AnalyzeThreshold defaults to half of the VacuumThreshold since it
  represents a much less expensive operation (approx 5%-10% of vacuum),
  and running ANALYZE more often should not substantially degrade system
  performance.
  
+ Examples:
+ 
+ Following table shows typical usage of pg_autovacuum settings.
+ These are put here so that a DBA can have some starting point while
+ tuning pg_autovacuum.
+ 
+ Vacuum is triggered by updates and deletes. So in case of vacuum,
+ last column indicates total of updates and deletes required
+ to trigger vacuum. In case of analyze, the operations would count total
+ number of inserts, updates and deletes.
+ 
+ Base		Scaling factor	Records		No. of Operations
+ 1,000		1		10,000		11,000
+ 1,000		2		10,000		21,000
+ 1,000		0.5		10,000		 6,000
+ 1,000		0.1		10,000		 2,000
+ 
+ Although analyze is cheaper operation compared to vacuum,
+ it might be needed less often. The default is to analyze twice as much as
+ vacuum but that might be too aggressive for some installations. It is advised 
+ thatsuch installation tune their analyze threshold separately, rather than 
+ relying upon the default behaviour.
+ 
+ Furthermore, for aggressive vacuum/analyze behaviour, it is recommended that
+ scaling factor is set to less than 1. However too aggresive operation can affect
+ performance of normal database operations adversely. Do not apply such setting
+ to production databases without prior testing.
+ 
  Sleeping:
  -
  

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Matthew T. O'Connor
Shridhar Daithankar wrote:

On Thursday 20 November 2003 20:00, Matthew T. O'Connor wrote:
 

Shridhar Daithankar wrote:
   

I am still wary of inverting vacuum analyze frequency. You think it is
better to set inverted default rather than documenting it?
 

I think inverting the vacuum and analyze frequency is wrong.
   

Me. Too. ATM all I can think of this patch attached. Josh, is it sufficient 
for you?..:-)
 

The patch just adds an example to the README, this looks ok to me.

Matthew, I am confyused about one thing. Why would autovacuum count updates 
while checking for analyze threshold? Analyze does not change statistics 
right? ( w.r.t line 1072, pg_autovacuum.c). For updating statistics, only 
inserts+deletes should suffice, isn't it?
 

An update is the equivelant of an insert and a delete, so it counts 
towards the analyze count as much as an insert.

Other than that, I think autovacuum does everything it can.
 

It could be more customizable.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Josh Berkus
Matthew,

 For small tables,  you don't need to vacuum too often.  In the testing I
 did a small table ~100 rows, didn't really show significant performance
 degredation until it had close to 1000 updates. 

This is accounted for by using the threshold value.   That way small tables 
get vacuumed less often. However, the way large tables work is very different 
and I think your strategy shows a lack of testing on large active tables.

 For large tables,
 vacuum is so expensive, that you don't want to do it very often, and
 scanning the whole table when there is only 5% wasted space is not very
 helpful.

5% is probably too low, you're right ... in my experience, performance 
degredation starts to set in a 10-15% updates to, for example, a 1.1 million 
row table, particularly since users tend to request the most recently updated 
rows.   As long as we have the I/O issues that Background Writer and ARC are 
intended to solve, though, I can see being less agressive on the defaults; 
perhaps 20% or 25%.   If you wait until 110% of a 1.1 million row table is 
updated, though, that vaccuum will take an hour or more.

Additionally, you are not thinking of this in terms of an overall database 
maintanence strategy.   Lazy Vacuum needs to stay below the threshold of the 
Free Space Map (max_fsm_pages) to prevent creeping bloat from setting in to 
your databases.   With proper configuration of pg_avd, vacuum_mem and FSM 
values, it should be possible to never run a VACUUM FULL again, and as of 7.4 
never run an REINDEX again either.  

But this means running vacuum frequently enough that your max_fsm_pages 
threshold is never reached.   Which for a large database is going to have to 
be more frequently than 110% updates, because setting 20,000,000 
max_fsm_pages will eat your RAM.

 Yes, the I set the defaults a little high perhaps so as to err on the
 side of caution.  I didn't want people to say pg_autovacuum kills the
 performance of my server.  A small table will get vacuumed, just not
 until it has reached the threshold.  So a table with 100 rows, will get
 vacuumed after 1200 updates / deletes.  

Ok, I can see that for small tables.

 In my testing it showed that
 there was no major performance problems  until you reached several
 thousand updates / deletes.

Sure.  But several thousand updates can be only 2% of a very large table.

 HUH?  analyze is very very cheap compared to vacuum.  Why not do it more
 often?

Because nothing is cheap if it's not needed.   

Analyze is needed only as often as the *aggregate distribution* of data in the 
tables changes.   Depending on the application, this could be frequently, but 
far more often (in my experience running multiple databases for several 
clients) the data distribution of very large tables changes very slowly over 
time.  

One client's database, for example, that I have running VACUUM on chron 
scripts  runs on this schedule for the main tables:
VACUUM only: twice per hour
VACUUM ANALYZE: twice per day

On the other hand, I've another client's database where most activity involves 
updates to entire classes of records.   They run ANALYZE at the end of every 
transaction.

So if you're going to have a seperate ANALYZE schedule at all, it should be 
slightly less frequent than VACUUM for large tables.   Either that, or drop 
the idea, and simplify pg_avd by running VACUUM ANALYZE all the time instead 
of having 2 seperate schedules.

BUT  now I see how you arrived at the logic you did.  If you're testing 
only on small tables, and not vacuuming them until they reach 110% updates, 
then you *would* need to analyze more frequently. This is because of your 
threshold value ... you'd want to analyze the small table as soon as even 30% 
of its rows changed.

So the answer is to dramatically lower the threshold for the small tables.

 What I think I am hearing is that people would like very much to be able
 to tweak the settings of pg_autovacuum for individual tables / databases
 etc. 

Not from me you're not.   Though that would be nice, too.

So, my suggested defaults based on our conversation above:

Vacuum threshold: 1000 records
Vacuum scale factor:  0.2
Analyze threshold:  50 records
Analyze scale factor: 0.3

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Josh Berkus
Shridhar,

 I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that
 thresholds less than 1 should be used.

Yes, but not thresholds, scale factors of less than 1.0.  Thresholds should 
still be in the range of 100 to 1000.

 I will submit a patch that would account deletes in analyze threshold.
 Since you want to delay the analyze, I would calculate analyze count as

 n=updates + inserts *-* deletes

I'm not clear on how this is a benefit.  Deletes affect the statistics, too.

 What does statistics gather BTW? Just number of rows or something else as
 well? I think I would put that on Hackers separately.

Number of tuples, degree of uniqueness, some sample values, and high/low 
values.   Just query your pg_statistics view for an example.

 I am still wary of inverting vacuum analyze frequency. You think it is
 better to set inverted default rather than documenting it?

See my post to Matthew.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Tom Lane
Chester Kustarz [EMAIL PROTECTED] writes:
 i have some tables which are insert only. i do not want to vacuum them
 because there are never any dead tuples in them and the vacuum grows the
 indexes.

Those claims cannot both be true.  In any case, plain vacuum cannot grow
the indexes --- only a VACUUM FULL that moves a significant number of
rows could cause index growth.

 vacuum is to reclaim dead tuples. this means it depends on update and
 delete. analyze depends on data values/distribution. this means it depends on
 insert, update, and delete. thus the dependencies are slightly different
 between the 2 operations, an so you can come up with use-cases that
 justify running either more frequently.

Agreed.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] 7.4 logging bug.

2003-11-20 Thread Tom Lane
Kurt Roeckx [EMAIL PROTECTED] writes:
 I just installed a 7.4 on windows/cygwin.  I restored a dump but
 ran out of disk space during the creating of an index.  In psql I
 saw the ERROR: could not extend relation .
 From that point on it seems to have stopped logging most things.

Do you suppose that the stdio support under cygwin somehow permanently
closes the stderr output descriptor after the first failure to write?
That would be unpleasant.  (Are you sure there *is* now space to write?)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] [BUGS] 7.4: CHAR padding inconsistency

2003-11-20 Thread Tom Lane
Troels Arvin [EMAIL PROTECTED] writes:
 Why not just change the behaviour back to what it used to be like? I see
 no justification for the change:

Well, for one thing, it makes the behavior of comparisons compatible
with the SQL standard.  If we have unpleasant side-effects we can work
on those, but I don't have a lot of patience for saying revert it
without offering a better solution.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Background writer committed

2003-11-20 Thread Jan Wieck
Shridhar Daithankar wrote:

Jan Wieck wrote:

I committed the first part of the background writer process. We had a 
consensus on attempting to avoid write() calls from regular backends, 
but did no come to any conclusions what to do to force the kernel to 
actually do some IO.

Consequently, this patch is a separate process launched by postmaster, 
that periodically write()'s out some dirty buffers in LRU order. This 
causes the buffers returned for replacement (when a backend needs to 
read in a page) to be clean allways. The process does no sync(), fsync() 
or any other calls thus far. Nothing has changed in the checkpoint logic 
either.
Can we have some idea where to tweak sync routines for comparing results?

I mean I would like to run pgbench with same config all along and compare the 
performance difference between sync, fsync and fdatasync etc.
pgbench is actually a very bad example to test any cache strategy. 
Either 98% of your lookups result in cache hits, so basically your 
entire database is cached, or it doesn't fit and every cache strategy 
becomes useless. It doesn't have parts that fit and other parts that 
don't. I think pgbench doesn't use non-uniform random access as real 
world applications do (you have bestsellers and other items, you have 
frequent customers and once-a-year visitors). So it's very hard to get 
the system into a state where you have like 50% cache hitrate.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] PANIC: rename from /data/pg_xlog/0000002200000009

2003-11-20 Thread Tom Lane
Yurgis Baykshtis [EMAIL PROTECTED] writes:
 However, we still experience random PG failures on a weekly basis.

 PANIC:  rename from /data/pg_xlog/00220009 to
 /data/pg_xlog/00220012 (initialization of log file 34, segment 18)
 failed: No such file or directory

That's curious.  Is the kernel portion of the error always No such file
or directory, or does it vary?  What files do you actually see in
/data/pg_xlog/ after such a failure?  What filesystem are you using,
and what is the platform exactly?

 IpcMemoryCreate: shmget(key=5432001, size=203415552, 03600) failed: Not
 enough core

This means that you've exceeded the kernel SHMMAX setting.  We recently
realized that a restart under 7.3 will require SHMMAX to be at least
twice the requested shared memory size (203415552 bytes here), because
the stats processes remain connected to the old shared memory segment.
This is fixed in 7.4 and will be fixed in 7.3.5 whenever it comes out,
but in the meantime your best bet is to increase SHMMAX.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] logical column position

2003-11-20 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 On Mon, 2003-11-17 at 20:24, Christopher Kings-Lynne wrote:
 BTW, one main consideration is that all the postgres admin apps will now 
 need to support ORDER BY attlognum for 7.5+.

 Yeah... how about maintaining attnum for the logical attribute number
 and create an attphysnum or something for the physical position instead?
 This is more intrusive into the source, but you don't need to teach new
 tricks to external entities.
 [ and similar remarks from other people elsewhere in the thread ]

It's completely fallacious to imagine that we could make this change be
transparent to external applications.  To take two examples:

1. How many places do you think know that pg_attribute.attnum links to
pg_attrdef.adnum?  pg_dump, psql, and the JDBC driver all appear to
know that, in a quick search of the CVS tree; I haven't even bothered to
look at pgadmin and the other apps that are likely to have such
dependencies.

2. How about linking pg_attribute.attnum to entries in pg_index.indkey?
Lots of apps know about that too.

Unless you are going to change the meanings of pg_index.indkey and
pg_attrdef.adnum, you can't simply redefine attnum as a logical column
position.  And if you do make such a change you will break code
elsewhere.

If we add a *new* column attlogpos, without changing the semantics
of attnum, then I think we have a fighting chance of making this work
without an undue amount of effort.  I see no prospect that we can
change the meaning of attnum without breaking things far and wide.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] logical column position

2003-11-20 Thread Neil Conway
Tom Lane [EMAIL PROTECTED] writes:
 attpos is a horrid choice of name, because no one will be able to
 remember which of attnum and attpos is which.  Pick a more
 distinct name.  Offhand the best thing I can think of is attlognum
 or attlogpos.

Actually, I deliberately chose attpos rather than attlognum (which is
what some people had been calling this feature earlier). My reasoning
was that the logical number is really a nonsensical idea: we just
invented it on the spot. In contrast, a position is a fairly natural
thing for an attribute to have -- it's a notion with some counterpart
in the real world. To me, at least, it seems intuitive that an
attnum would identify a column whereas an attpos would specify the
column's position.

I'm happy to change the name if there's a consensus that attpos isn't
a good choice -- what does everyone think?

 What are you going to do with deleted columns?  I'd be inclined to
 give them all attlogpos = 0, but that destroys your last comment.

I hadn't planned to do anything in particular for deleted columns:
since they are never displayed to the user, does it matter what their
attpos is?

In any event, the property that no two columns in a table have the
same logical number isn't important anyway.

 You have not presented any proposal for exactly what ALTER TABLE
 operations would be offered to manipulate the column positions.

I'd like to get the backend storage side of things implemented
first. I'll take a look at the archives before I do any UI work --
thanks for the suggestion.

-Neil


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Release cycle length

2003-11-20 Thread Christopher Kings-Lynne
Yeah, I think the main issue in all this is that for real production
sites, upgrading Postgres across major releases is *painful*.  We have
to find a solution to that before it makes sense to speed up the
major-release cycle.
Well, I think one of the simplest is to do a topological sort of objects 
 in pg_dump (between object classes that need it), AND regression 
testing for pg_dump :)

Chris



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] Handy user/group hack

2003-11-20 Thread David Fetter
Kind people,

Here's something I came up with for finding whether a PostgreSQL 7.4
user is in a group.

Cheers,
D

CREATE OR REPLACE FUNCTION in_group (text, text) RETURNS BOOLEAN AS '
DECLARE
the_user  ALIAS FOR $1;
the_group ALIAS FOR $2;
dummy text; -- SELECT INTO dummy because PERFORM always returns true.
-- Is this a bug?
BEGIN
SELECT INTO dummy u.usename
FROM
  pg_user u
, pg_group g
WHERE
u.usename = the_user
AND g.groname = the_group
AND u.usesysid = ANY (g.grolist);

IF FOUND
THEN
RETURN true;
ELSE
RETURN false;
END IF;
END;
' LANGUAGE 'plpgsql' STRICT IMMUTABLE;
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100cell: +1 415 235 3778

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] logical column position

2003-11-20 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Actually, I deliberately chose attpos rather than attlognum (which is
 what some people had been calling this feature earlier). My reasoning
 was that the logical number is really a nonsensical idea: we just
 invented it on the spot.

True ...

 In contrast, a position is a fairly natural
 thing for an attribute to have -- it's a notion with some counterpart
 in the real world.

But position could at least as logically be considered to mean the
physical position in the tuple.  I still say that these names are ripe
for confusion.

I don't have a better choice of name offhand, but if we spend 1% of the
time already spent arguing about these issues on finding a better name,
I'm sure we can think of one ;-)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] logical column position

2003-11-20 Thread Rod Taylor

 I don't have a better choice of name offhand, but if we spend 1% of the
 time already spent arguing about these issues on finding a better name,
 I'm sure we can think of one ;-)

virtual (attvirtnum)
external (attextnum)

atttisoywnum - attribute this is the one you want number


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] PANIC: rename from /data/pg_xlog/0000002200000009

2003-11-20 Thread Tom Lane
Yurgis Baykshtis [EMAIL PROTECTED] writes:
 The most interesting thing is that rename failure is always followed by
 IpcMemoryCreate and vice-versa, IpcMemoryCreate always comes after the
 rename error.

That is not interesting ... it's exactly what I'd expect for the panic
recovery sequence (given that SHMMAX is preventing creation of a second
shared-memory segment).

 What filesystem are you using, and what is the platform exactly?

 DBExperts 7.3.4 on Win2000 (so it's a cygwin-based system)

Perhaps you need to get a real operating system :-(.  No such failure
mode has been reported on any Unix variant, AFAIR.

It's hard to be certain what's happening from the after-the-fact
evidence you've offered.  I'd like to see what is in pg_xlog immediately
after the crash, *before* Postgres is restarted.  I get the feeling that
what we will see is the destination filename already present and the
source not, which would suggest that two backends tried to do the rename
concurrently.  AFAICS that must mean that the operating system's lock
support is broken, because we do not try to rename WAL segments except
while holding the CheckpointLock, not to mention the ControlFileLock.

This is not necessarily Windows' fault, it could be a cygwin or cygipc
bug ... are you up to date on those?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] code question: rewriteDefine.c

2003-11-20 Thread Neil Conway
I noticed the following code in src/backend/rewrite/rewriteDefine.c,
circa line 390:

/*
 * Are we converting a relation to a view?
 *
 * If so, check that the relation is empty because the storage
 * for the relation is going to be deleted.  Also insist that
 * the rel not have any triggers, indexes, or child tables.
 */
if (event_relation-rd_rel-relkind != RELKIND_VIEW)
{

Under what circumstances do we convert a relation to a view? Is this
functionality exposed to the user?

Furthermore, it seems broken: it checks the pgclass.relhassubclass
attribute for this relation to see if it has child tables, but this
is wrong, as relhassubclass only indicates that the relation MAY have
a subclass, not that is definitely does[1]. It also doesn't drop the
relation's TOAST table, if any, as the code itself notes.

-Neil

[1] This is because relhassubclass is not updated when a table's child
table is dropped, due to concurrency concerns; see has_subclass() in
plancat.c


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Sponsoring enterprise features

2003-11-20 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 Personally, I think the best way is simply to make a post on -hackers
 with a description of what you want to accomplish with a call for
 estimates and proposals. ...
 I say a description of what you want to accomplish because certain
 features are not as useful on PostgreSQL as they are other databases
 (data partitioning being one of them, due to the ability to use partial
 indexes) so you may not achieve what you are expecting.

Right.  You can in any case get a great deal of free advice by starting
a pghackers discussion ;-)

It should be noted that because Oracle does it that way is a
guaranteed nonstarter as a rationale for any Postgres feature proposal.
There are enough differences between Postgres and Oracle that you will
need to do significant investigation before assuming that an Oracle-
based feature design is appropriate for Postgres.  Aside from technical
differences, we have fundamentally different priorities --- one of which
is simplicity of administration.  You'll get no buyin on proposals that
tend to create Oracle-like difficulties of installation and tuning.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Matthew T. O'Connor
Tom Lane wrote:

Chester Kustarz [EMAIL PROTECTED] writes:
 

vacuum is to reclaim dead tuples. this means it depends on update and
delete. analyze depends on data values/distribution. this means it depends on
insert, update, and delete. thus the dependencies are slightly different
between the 2 operations, an so you can come up with use-cases that
justify running either more frequently.
   

Agreed.
 

And that is why pg_autovacuum looks at insert, update and delete when 
deciding to do an analyze, but only looks at update and delete when 
deciding to do a vacuum.  In addition, this is why pg_autovacuum was 
given knobs so that the vacuum and analyze thresholds can be set 
independently.

Matthew

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Matthew T. O'Connor
Josh Berkus wrote:

Matthew,
 

For small tables,  you don't need to vacuum too often.  In the testing I
did a small table ~100 rows, didn't really show significant performance
degredation until it had close to 1000 updates. 
   

This is accounted for by using the threshold value.   That way small tables 
get vacuumed less often. However, the way large tables work is very different 
and I think your strategy shows a lack of testing on large active tables.
 

Probably more true than I would like to think...

For large tables,
vacuum is so expensive, that you don't want to do it very often, and
scanning the whole table when there is only 5% wasted space is not very
helpful.
   

5% is probably too low, you're right ... in my experience, performance 
degredation starts to set in a 10-15% updates to, for example, a 1.1 million 
row table, particularly since users tend to request the most recently updated 
rows.   As long as we have the I/O issues that Background Writer and ARC are 
intended to solve, though, I can see being less agressive on the defaults; 
perhaps 20% or 25%.   If you wait until 110% of a 1.1 million row table is 
updated, though, that vaccuum will take an hour or more.
 

True, but I think it would be one hour once, rather than 30 minutes 4 times.

Additionally, you are not thinking of this in terms of an overall database 
maintanence strategy.   Lazy Vacuum needs to stay below the threshold of the 
Free Space Map (max_fsm_pages) to prevent creeping bloat from setting in to 
your databases.   With proper configuration of pg_avd, vacuum_mem and FSM 
values, it should be possible to never run a VACUUM FULL again, and as of 7.4 
never run an REINDEX again either.  
 

This is one of the things I had hoped to add to pg_autovacuum, but never 
got to.  In addition to just the information from the stats collector on 
inserts updates and deletes, pg_autovacuum should also look at the FSM, 
and make decisions based on it.  Anyone looking for a project?

But this means running vacuum frequently enough that your max_fsm_pages 
threshold is never reached.   Which for a large database is going to have to 
be more frequently than 110% updates, because setting 20,000,000 
max_fsm_pages will eat your RAM.
 

Again, the think the only way to do this efficiently is to look at the 
FSM.  Otherwise the only way to make sure you keep the FSM populated is 
to run vacuum more than needed.

Yes, the I set the defaults a little high perhaps so as to err on the
side of caution.  I didn't want people to say pg_autovacuum kills the
performance of my server.  A small table will get vacuumed, just not
until it has reached the threshold.  So a table with 100 rows, will get
vacuumed after 1200 updates / deletes.  
   

Ok, I can see that for small tables.
 

In my testing it showed that
there was no major performance problems  until you reached several
thousand updates / deletes.
   

Sure.  But several thousand updates can be only 2% of a very large table.
 

But I can't imagine that 2% makes any difference on a large table.  In 
fact I would think that 10-15% would hardly be noticable, beyond that 
I'm not sure.

HUH?  analyze is very very cheap compared to vacuum.  Why not do it more
often?
   

Because nothing is cheap if it's not needed.   

Analyze is needed only as often as the *aggregate distribution* of data in the 
tables changes.   Depending on the application, this could be frequently, but 
far more often (in my experience running multiple databases for several 
clients) the data distribution of very large tables changes very slowly over 
time.  
 

Valid points, and again I think this points to the fact that 
pg_autovacuum needs to be more configurable.  Being able to set 
different thresholds for different tables will help considerably.  In 
fact, you may find that some tables should have a vac threshold much 
larger than the analyze thresold, while other tables might want the 
opposite.

One client's database, for example, that I have running VACUUM on chron 
scripts  runs on this schedule for the main tables:
VACUUM only: twice per hour
VACUUM ANALYZE: twice per day
 

I would be surprized if you can notice the difference between a vacuum 
analyze and a vacuum, especially on large tables.

On the other hand, I've another client's database where most activity involves 
updates to entire classes of records.   They run ANALYZE at the end of every 
transaction.

So if you're going to have a seperate ANALYZE schedule at all, it should be 
slightly less frequent than VACUUM for large tables.   Either that, or drop 
the idea, and simplify pg_avd by running VACUUM ANALYZE all the time instead 
of having 2 seperate schedules.
 

I think you need two separate schedules.  There are lots of times where 
a vacuum doesn't help, and an analyze is all that is needed, and an 
analyze is MUCH cheaper than a vacuum.

BUT  now I see how you arrived at the logic you did.  If you're testing 
only on small tables, and not 

Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Josh Berkus
Matthew,

  110% of a 1.1 million row table is updated, though, that vaccuum will
  take an hour or more.

 True, but I think it would be one hour once, rather than 30 minutes 4
 times.

Well, generally it would be about 6-8 times at 2-4 minutes each.

 This is one of the things I had hoped to add to pg_autovacuum, but never
 got to.  In addition to just the information from the stats collector on
 inserts updates and deletes, pg_autovacuum should also look at the FSM,
 and make decisions based on it.  Anyone looking for a project?

Hmmm ... I think that's the wrong approach.  Once your database is populated, 
it's very easy to determine how to set the FSM for a given pg_avd level.   If 
you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of 
the total database pages plus growth  safety margins.

I'd be really reluctant to base pv-avd frequency on the fsm settings instead.  
What if the user loads 8GB of data but leaves fsm_pages at the default of 
10,000?  You can't do much with that; you'd have to vacuum if even 1% of the 
data changed.

The other problem is that calculating data pages from a count of 
updates+deletes would require pg_avd to keep more statistics and do more math 
for every table.  Do we want to do this?

 But I can't imagine that 2% makes any difference on a large table.  In
 fact I would think that 10-15% would hardly be noticable, beyond that
 I'm not sure.

I've seen performance lag at 10% of records, especially in tables where both 
update and select activity focus on one subset of the table (calendar tables, 
for example).

 Valid points, and again I think this points to the fact that
 pg_autovacuum needs to be more configurable.  Being able to set
 different thresholds for different tables will help considerably.  In
 fact, you may find that some tables should have a vac threshold much
 larger than the analyze thresold, while other tables might want the
 opposite.

Sure.  Though I think we can make the present configuration work with a little 
adjustment of the numbers.   I'll have a chance to test on production 
databases soon.

 I would be surprized if you can notice the difference between a vacuum
 analyze and a vacuum, especially on large tables.

It's substantial for tables with high statistics settings.   A 1,000,000 row 
table with 5 columns set to statistics=250 can take 3 minutes to analyze on a 
medium-grade server.

 I think you need two separate schedules.  There are lots of times where
 a vacuum doesn't help, and an analyze is all that is needed

Agreed.  And I've just talked to a client who may want to use pg_avd's ANALYZE 
scheduling but not use vacuum at all.   BTW, I think we should have a setting 
for this; for example, if -V is -1, don't vacuum.

 I'm open to discussion on changing the defaults.  Perhaps what it would
 be better to use some non-linear (perhaps logorithmic) scaling factor.
 So that you wound up with something roughly like this:

 #tuples   activity% for vacuum
 1k   100%
 10k   70%
 100k 45%
 1M20%
 10M  10%
 100M  8%

That would be cool, too.Though a count of data pages would be a better 
scale than a count of rows, and equally obtainable from pg_class.

 Thanks for the lucid feedback / discussion.  autovacuum is a feature
 that, despite it's simple implementation, has generated a lot of
 feedback from users, and I would really like to see it become something
 closer to what it should be.

Well, I hope to help now.  Until very recently, I've not had a chance to 
seriously look at pg_avd and test it in production.   Now that I do, I'm 
interested in improving it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org