Re: [HACKERS] RPM building fun
On Thu, Nov 20, 2003 at 11:45:06AM +0530, Shridhar Daithankar wrote: Joshua D. Drake wrote: Is there some way to remove this piece of sh^H^Hlegacy from the configure script? Does anybody actually use info? All of GNU. Additionally it is very good resource when you use Konqueror to browse it as html.. Roight. How to tell it to get the right prefix, then? Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100cell: +1 415 235 3778 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] RPM building fun
On Thursday 20 November 2003 13:09, David Fetter wrote: On Thu, Nov 20, 2003 at 11:45:06AM +0530, Shridhar Daithankar wrote: Joshua D. Drake wrote: Is there some way to remove this piece of sh^H^Hlegacy from the configure script? Does anybody actually use info? All of GNU. Additionally it is very good resource when you use Konqueror to browse it as html.. Roight. How to tell it to get the right prefix, then? I didn't get you.. I type info:/autoconf and it will show me the autoconf page. You mean extension of actual info files? No idea. Never seen them in wild...:-) Shridhar ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] ALTER COLUMN/logical column position
Andreas Pflug kirjutas N, 20.11.2003 kell 01:38: Second, column type changes needing a nontrivial cast function should be implemented in a way that preserve attnum. This could be done like this: - decompile dependent objects, and memorize them for later recreation - ADD tmpCol, UPDATE tmpCol=col::newtype, DROP old column, cascading to dependent objects, RENAME tmpCol (known stuff) - restore old attnum, which is a simple UPDATE to pg_attribute at this stage and suddenly your table is broken, as you can't retrieve the tmpCol when the attnum points to the dropped old column which has data in the format for old type ... the whole point of separating attnum and attpos is that attnum is used internally to retrieve the data and you can't change it by just UPDATEing pg_attribute. --- Hannu ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] ALTER COLUMN/logical column position
Hannu Krosing wrote: attnum is used internally to retrieve the data Oops... So if an additional column number is invented, it should not be a logical column number, but a physical storage number for internal data retrieval. This way, the user interface doesn't change, and all those SELECT ... FROM pg_attribute ORDER BY attnum continue delivering the expected result. Regards, Andreas ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PG7.4 ordering operator
tgl wrote: strk [EMAIL PROTECTED] writes: Testing postgis support in PG7.4 (2003-11-11) I've encountered to this problem: ERROR: could not identify an ordering operator for type geometry Previous PG versions does not show this problem. Any hint on what might be missing ? A default btree operator class for type geometry. PG 7.4 no longer uses assumptions about operator names to determine sorting/grouping behavior. If you have some operators that provide a scalar sort ordering on your datatype, then make a btree opclass to show that. See http://www.postgresql.org/docs/7.4/static/xindex.html#XINDEX-OPCLASS-DEPENDENCIES regards, tom lane Thanks for the answer, I've one more question: I've provided a default btree operator class but I'm often going out of memory when using DISTINCT or UNION clauses. How can I reduce memory usage in these cases ? Since passed argument are TOASTED, but I use only a small initial portion if them to make the computation, can I avoid DETOASTING them and still reach that initial part ? The information I need is stored at offset 40 from detoasted data and is 6doubles long. I cannot find TOAST documentation. thanks. --strk; ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] tsearch2 patch for 7.4.1
Hi there, we have a patch for contrib/tsearch2 we'd like to commit for 7.4.1. Is it ok ? We added support for compound words using ispell dictionaries. It's rather important feature for agglutinative languages like german, norwegian (tested). This work was sponsored by ABC Startsiden (www.startsiden.no). Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 4 Clause license?
This whole thing is starting to make my head hurt. There has been more effort spent over this license issue than I would have spent if I hadn't taken the shortcut of using the FreeBSD code. I think maybe the simplest thing is for me to prepare a patch that rips that code out and replaces it with a (slightly simpler - less umask hacking required, I think) piece of code that I will write. I won't make the mistake of borrowing code again like this. (I naively thought using FreeBSD code would be safe, and was careful not to use any GPL code.) cheers andrew Terry Lambert wrote: Erik Trulsson wrote: On Mon, Nov 17, 2003 at 02:48:08PM -0500, Rod Taylor wrote: The PostgreSQL group has recently had a patch submitted with a snippet of code from FreeBSDs src/bin/mkdir/mkdir.c. http://www.freebsd.org/cgi/cvsweb.cgi/src/bin/mkdir/mkdir.c?annotate=1.27 Is this intentionally under the 4 clause license or does the copyright from the website (2 clause) applied to everything that is non-contrib? http://www.freebsd.org/copyright/freebsd-license.html That copyright notice on the website should apply to everything that is not under some other license. Different parts of the system is under different licenses and copyrights depending on who wrote it. The mkdir.c *was* under the 4 clause license. However all material that was part of the original BSDs and thus was copyrighted by The Regents of the University of California has had its license changed such that clause 3 (the advertising clause) no longer apply. People seem to frequently misunderstand what a license is, and more specifically, what the conversion from a 4 clause to a 3 clause license meant, in the case of the UCB License. This change does not apply to derivative works, only to the original code itself. So if you went back and grabbed the mkdir.c code off the BSD 4.4-Lite2 CDROM, and used that, fine. If you grabbed the mkdir.c off the FreeBSD sources, and even one line was modified by someone, then it's a derivative work, and, unless you can also get written permission from the contributor, it stays under the license from which it was derived. The announcement by the University only permits the change, it does not mandate the change, for this very reason: otherwise third party redistributed code would have sudddenly become legally questionable. By the same token, if you dual-license some code under th GPL and another license, and someone gets the GPL'ed version, and makes changes, unless thy specifically permit it, the code contributed back is only licensed under the GPL. This is why SGI licensing the XFS code under the GPL was a stupid move: a contributer contributing code back results in an improved code base that can only be used under the terms of the GPL, and not in SGI's commercial product offerings. I believe that SGI did not actually expect any significant or worthwhile bug fixes or enhancements to come from the GPL'ed code using community. In terms of getting written approval for the license change from other contributors, this is basically the role that the Regents of the University of California and the UCB CSRG were fulfilling: a legal entity to whom such representations could be made by contributors, and who could then legally forward those representations to another. FreeBSD has no such legal entity, at present. The closest you could come is perhaps the FreeBSD Foundation. Had there been a FreeBSD Foundation from day on, to whom rights could have been assigned by contributors (turning it into The FreeBSD Foundation and its Contributors), then the license would be capable of being modified after the fact. Without that, however, you must track down all of the individual contributors to get the license changed. My recommendation is to us the code off the 4.4 BSD-Lite2 CDROM, if you can, live with the 4 clause license if the code contains changes you need, if you can, or contact the contributors, if it is a small enough job. If none of those things will work for you, then start with the 4.4 BSD-Lite2 CDROM code, convert to the 3 clause license, as permitted by the university, and then hack out whatever modifications you ned on top of that for yourself. -- Terry ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] ALTER COLUMN/logical column position
Which is what started the whole discussion. Dave On Thu, 2003-11-20 at 04:40, Andreas Pflug wrote: Hannu Krosing wrote: attnum is used internally to retrieve the data Oops... So if an additional column number is invented, it should not be a logical column number, but a physical storage number for internal data retrieval. This way, the user interface doesn't change, and all those SELECT ... FROM pg_attribute ORDER BY attnum continue delivering the expected result. Regards, Andreas ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] ALTER COLUMN/logical column position
Andreas Pflug kirjutas N, 20.11.2003 kell 11:40: Hannu Krosing wrote: attnum is used internally to retrieve the data Oops... So if an additional column number is invented, it should not be a logical column number, but a physical storage number for internal data retrieval. You are just shifting the interface problems to a place needing way more changes in the backend. There will be some problems either way. also, tools needing knowledge should start using information schema as much as they can, making internal reshufflings less of a problem. This way, the user interface doesn't change, and all those SELECT ... FROM pg_attribute ORDER BY attnum continue delivering the expected result. Depending on what you expect ;) If you expect the above to give you all active columns as orderd as they are stored, then it does not give you what you expect. Btw, most of these concerns (and more) were already iterated when DROP column was done causing gaps in attnum. There were a lot of doomsday profecies, but in the end it went quite smoothly. The tools needing internal knowledge about storage (meaning any tool doing select .. from pg_...) have always needed some upgrades for new verions. IMHO, The only behaviour visible to common user we should worry about is SELECT * , and a special column for solving this is _the_ easiest way to do it. - Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] 4 Clause license?
I think maybe the simplest thing is for me to prepare a patch that rips that code out and replaces it with a (slightly simpler - less umask hacking required, I think) piece of code that I will write. The FreeBSD folks sorted it out for us. Everyones names should be in the copyright for the file. The licence portion should be the 3 clause version -- no advertising clause. I think borrowing should be encouraged, and now that we know what license / copyright we need to carry over, this can be done without worry. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] logical column position
On Wed, 19 Nov 2003 19:07:23 +0100, Andreas Pflug [EMAIL PROTECTED] wrote: is there any DB system out there that allows to reshuffle the column ordering? Firebird: ALTER TABLE tname ALTER COLUMN cname POSITION 7; Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] question about fixes in v7.4...
On Wed, 2003-11-19 at 22:50, Bruce Momjian wrote: Don Sceifers wrote: My company is fairly new at Postgresql, but we have hit a problem, where we modify a table using ALTER, and our stored procedures stop working. We have a grasp as to why this happens, but I was wondering if this v7.4 upgrade fixes this issue? This is a known issue. There is no way for us to know what stored tables/columns you are using in your function. You might find that reconnecting to the database after the ALTER will fix the problem. We do cache function code, so reconnecting will erase that cache. Would it be worth putting in a partial implementation? I'm thinking specifically about functions that return table types. Could they be hooked up with dependency information and made to complain? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] question about fixes in v7.4...
Don Sceifers wrote: My company is fairly new at Postgresql, but we have hit a problem, where we modify a table using ALTER, and our stored procedures stop working. We have a grasp as to why this happens, but I was wondering if this v7.4 upgrade fixes this issue? Not sure if this applies with Postgres but did you also recompile the stored procedures? If I remember correctly Sybase and Informix (?) required store procedured to be recompiled after DDL to tables they used, and Oracle often invalidates any functions, views, packages and procedures if they have any relation to the modified table. - Original Message - From: Robert Treat [EMAIL PROTECTED] To: Bruce Momjian [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, November 20, 2003 7:52 AM Subject: Re: [HACKERS] question about fixes in v7.4... On Wed, 2003-11-19 at 22:50, Bruce Momjian wrote: Don Sceifers wrote: My company is fairly new at Postgresql, but we have hit a problem, where we modify a table using ALTER, and our stored procedures stop working. We have a grasp as to why this happens, but I was wondering if this v7.4 upgrade fixes this issue? This is a known issue. There is no way for us to know what stored tables/columns you are using in your function. You might find that reconnecting to the database after the ALTER will fix the problem. We do cache function code, so reconnecting will erase that cache. Would it be worth putting in a partial implementation? I'm thinking specifically about functions that return table types. Could they be hooked up with dependency information and made to complain? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PG7.4 ordering operator
strk [EMAIL PROTECTED] writes: I've provided a default btree operator class but I'm often going out of memory when using DISTINCT or UNION clauses. How can I reduce memory usage in these cases ? Since passed argument are TOASTED, but I use only a small initial portion if them to make the computation, can I avoid DETOASTING them and still reach that initial part ? No, I don't think so, but see PG_FREE_IF_COPY. Operators used in btree indexes are expected not to leak memory. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Build farm
Peter Eisentraut wrote: Andrew Dunstan writes: Essentially what I have is something like this pseudocode: cvs update Be sure check past branches as well. check if there really was an update and if not exit OK. configure; get config.log Ideally, you'd try all possible option combinations for configure. Or at least enable everything. I have had in mind from the start doing multiple configurations and multiple branches. Right now I'm working only with everything/head, but will make provision for multiple sets of both. How many branches back do you think should we go? Right now I'd be inclined only to do REL7_4_STABLE and HEAD as a default. Maybe we could set the default to be gettable from the web server so that as new releases come along build farm members using the default wouldn't need to make any changes. However, everything would also be settable locally on each build farm member in an options file. make 21 | make-filter makelog make check 21 | check-filter checklog You could also try out make distcheck. It tries out the complete build, installation, uninstallation, regression test, and distribution building. OK. (TBD) send config status, make status, check status, logfiles OK. make distclean When I played around with this, always copied the CVS tree to a new directory and deleted that one at the end. That way, bugs in the clean procedure (known to happen) don't trip up the whole process. OK. We've also seen odd problems with cvs update, I seem to recall, but I'd rather avoid having to fetch the entire tree for each run, to keep bandwidth use down. (I believe cvs update should be fairly reliable if there are no local changes, which would be true in this instance). The send piece will probably be a perl script using LWP and talking to a CGI script. That will be the difficult part to organize, if it's supposed to be distributed and autonomous. sending the results won't be a huge problem - storing and displaying them nicely will be a bit more fun :-) Upload of results would be over authenticated SSL to prevent spurious results being fed to us - all you would need to join the build farm would be a username/password from the buildfarm admin. Thanks for your input cheers andrew ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 4 Clause license?
Rod Taylor wrote: I think maybe the simplest thing is for me to prepare a patch that rips that code out and replaces it with a (slightly simpler - less umask hacking required, I think) piece of code that I will write. The FreeBSD folks sorted it out for us. Everyones names should be in the copyright for the file. The licence portion should be the 3 clause version -- no advertising clause. Whose names? It's not easily discoverable from browsing the CVS tree. I would probably spend as much time, if not more, finding out as in rewriting the 40 or so lines of code required. I think borrowing should be encouraged, and now that we know what license / copyright we need to carry over, this can be done without worry. I'm a fan of borrowing, but not if it causes headaches. I'll hold off for a bit in case I've misunderstood something. trollOf course, now that SCO is claiming ownership of BSD code . /troll cheers andrew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] tsearch2 patch for 7.4.1
Oleg Bartunov [EMAIL PROTECTED] writes: we have a patch for contrib/tsearch2 we'd like to commit for 7.4.1. Is it ok ? We added support for compound words using ispell dictionaries. It's rather important feature for agglutinative languages like german, norwegian (tested). This work was sponsored by ABC Startsiden (www.startsiden.no). This strikes me as a feature addition, and therefore not appropriate for the 7.4 branch. You should only commit bug fixes into stable branches, not feature additions. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] ALTER COLUMN/logical column position
Hannu Krosing [EMAIL PROTECTED] writes: You are just shifting the interface problems to a place needing way more changes in the backend. There will be some problems either way. Exactly. I'm considerably more worried about breaking out-of-the-way places in the backend than I am about what order someone's admin tool presents the columns in. Btw, most of these concerns (and more) were already iterated when DROP column was done causing gaps in attnum. There were a lot of doomsday profecies, but in the end it went quite smoothly. That is a good comparison point. I'm inclined to think that we should do it in a way that minimizes backend changes. The way to do that is to keep attnum with its current definition (physical position) and add a new column for the logical position, which only a small number of places will need to care about. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] ALTER COLUMN/logical column position
Tom Lane writes: Exactly. I'm considerably more worried about breaking out-of-the-way places in the backend than I am about what order someone's admin tool presents the columns in. Clearly, the effort of adding logical column numbers will consist of making choices between physical and logical numbers in the backend in some places. So one option is to replace some uses of attnum by attlognum. The other optionis to replace *all* uses of attnum by attphysnum and then replace some uses of attphysnum by attnum. To me, this looks like an equal risk as far as the backend goes. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] logical column position
Neil Conway [EMAIL PROTECTED] writes: At present, attnum basically does three things: identifies an column within a relation, indicates which columns are system columns, and defines the order of a relation's columns. I'd like to move this last functionality into a separate pg_attribute column named attpos (or attlogicalpos): attpos is a horrid choice of name, because no one will be able to remember which of attnum and attpos is which. Pick a more distinct name. Offhand the best thing I can think of is attlognum or attlogpos. - when the table is created, attnum == attpos. System columns have attpos 0, as with attnum. At no point will two columns of the same relation have the same attpos. What are you going to do with deleted columns? I'd be inclined to give them all attlogpos = 0, but that destroys your last comment. (a) ISTM this should also apply to COPY TO and COPY FROM if the user didn't supply a column list. Is this reasonable? Yes, also INSERT INTO, also the implicit ordering of output columns of a JOIN, also the matching of aliases to columns in a FROM-list alias, probably one or two other places. SQL exposes column ordering in more places than just SELECT *. If we want to avoid this, one easy (but arguably unclean) way to do so would be to make the initial value of attpos == attnum * 1000, and make attpos an int4 rather than an int2. Then, we can do most column reordering operations with only a single pg_attribute update -- in the worst-case that enough re-orderings are done that we overflow the 999 padding positions, we can just fall-back to doing multiple pg_attribute updates. Is this worth doing, and/or is there a better way to achieve the same effect? That seems horribly messy. Just renumber. (c) Do I need to consider inheritance? Yes. I think it'd be good if things were constrained so that columns 1..n in a parent table always matched columns 1..n in every child, which is not true now after adding/dropping columns. That would make it easier/cheaper/more reliable to match up which child columns are to be referenced in an inherited query (see adjust_inherited_attrs). I think the effective constraints would have to be about the same as what we now impose on column names in an inheritance hierarchy. You have not presented any proposal for exactly what ALTER TABLE operations would be offered to manipulate the column positions. My recollection is that some consensus was reached on that point in the last thread we had on this issue --- have you consulted the archives? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] ALTER COLUMN/logical column position
Peter Eisentraut [EMAIL PROTECTED] writes: Tom Lane writes: Exactly. I'm considerably more worried about breaking out-of-the-way places in the backend than I am about what order someone's admin tool presents the columns in. Clearly, the effort of adding logical column numbers will consist of making choices between physical and logical numbers in the backend in some places. So one option is to replace some uses of attnum by attlognum. The other optionis to replace *all* uses of attnum by attphysnum and then replace some uses of attphysnum by attnum. To me, this looks like an equal risk as far as the backend goes. This would be a reasonable assessment if we had our hands on every line of backend code that exists. But you are neglecting the probability of breaking user-written C functions, PL languages outside the main distro, etc. If we were going to go about this in a way that does not localize the changes, I'd be inclined to use attlognum and attphysnum ... that is, *deliberately* break every use that hasn't been looked at and updated. Even that would not guarantee catching all the trouble spots; for example loop indexes and attnums passed as function parameters might not have names that would be caught by a simplistic search-and-replace update. I'm for localizing the changes. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] logical column position
On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote: (c) Do I need to consider inheritance? Yes. I think it'd be good if things were constrained so that columns 1..n in a parent table always matched columns 1..n in every child, which is not true now after adding/dropping columns. That would make it easier/cheaper/more reliable to match up which child columns are to be referenced in an inherited query (see adjust_inherited_attrs). No way, because of multiple inheritance. Each child should have an attparentnum, which would point to the parent's attnum for this to work ... -- Alvaro Herrera ([EMAIL PROTECTED]) Aprender sin pensar es inĂștil; pensar sin aprender, peligroso (Confucio) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] logical column position
Alvaro Herrera Munoz [EMAIL PROTECTED] writes: On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote: (c) Do I need to consider inheritance? Yes. I think it'd be good if things were constrained so that columns 1..n in a parent table always matched columns 1..n in every child, which is not true now after adding/dropping columns. No way, because of multiple inheritance. Each child should have an attparentnum, which would point to the parent's attnum for this to work ... Hm, good point. And I think we merge identically-named columns inherited from different parents, which would mean that attparentnum wouldn't have a unique value anyway. Perhaps rearranging a parent's columns shouldn't have *any* direct effect on a child? Seems ugly though. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [BUGS] 7.4: CHAR padding inconsistency
Well, that certainly is interesting. Oracle and MS-SQL preserve the trailing space when concatenating. Does anyone remember the logic for trimming space with ||? --- William ZHANG wrote: Bruce said: How do other databases handle this? I have tried on MS SQL Server 2000 and Oracle 9i for Windows. SQL Server doesn't like character_length and || , so use len and + instead. Oracle doesn't like character_length either, use length. Hope the result may help. create table chartest(col char(10) not null); insert into chartest values ('AAA'); PostgreSQL: select character_length(col) from chartest; 10 SQL Server select len(col) from chartest; 3 Oracle select length(col) from chartest; 10 PostgreSQL: select character_length(col || 'hey') from chartest; 6 SQL Server: select len(col + 'hey') from chartest; 13 Oracle: select length(col || 'hey') from chartest; 13 PostgreSQL: select 'aaa ' || 'bb'; aaa bb SQL Server: select 'aaa ' + 'bb'; aaa bb Oracle: select 'aaa ' || 'bb' from dual; aaa bb PostgreSQL: select cast('aa ' as char(10)) || 'b'; aab SQL Server: select cast('aa ' as char(10)) + 'b'; aab Oracle: select cast('aa ' as char(10)) || 'b' from dual; aab ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] 4 Clause license?
Rod Taylor wrote: I think maybe the simplest thing is for me to prepare a patch that rips that code out and replaces it with a (slightly simpler - less umask hacking required, I think) piece of code that I will write. The FreeBSD folks sorted it out for us. Everyones names should be in the copyright for the file. The licence portion should be the 3 clause version -- no advertising clause. I think borrowing should be encouraged, and now that we know what license / copyright we need to carry over, this can be done without worry. Agreed --- don't get too focussed on it --- they are all pretty much the same, and if someone complains, we will rip it out and replace it. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] 4 Clause license?
On Thu, 20 Nov 2003, Andrew Dunstan wrote: trollOf course, now that SCO is claiming ownership of BSD code . /troll Interesting thread that ... last I read on the FreeBSD lists was speculation that they would be going after ppl like Cisco (re: TCP/IP Networking Code) since there really is nobody else large enough to bother with ... its going to be interesting to see :) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [BUGS] 7.4: CHAR padding inconsistency
Bruce Momjian [EMAIL PROTECTED] writes: Well, that certainly is interesting. Oracle and MS-SQL preserve the trailing space when concatenating. Does anyone remember the logic for trimming space with ||? trimming space with || is a completely inaccurate description of what's happening. 7.4 trims spaces from char(n) data when converting it to text (or varchar). Since we don't have a bpchar version of ||, only a text version, the implicitly invoked conversion is what's making the difference. AFAICS the Oracle and SQL Server behaviors are at least as inconsistent as our own. If trailing spaces are significant during concatenation, why aren't they significant to LENGTH()? I can't see a justification in the SQL spec for handling one case differently from the other. Actually the SQL spec is pretty inconsistent itself. It's clear that trailing spaces are insignificant in comparisons, if you are using a PAD SPACE collation which I think is the implication of CHAR(n), but I don't see anything that says that they are insignificant for other purposes such as LENGTH() and concatenation. I'd agree with changing bpcharlen() to not count trailing spaces, I think. That would be consistent with ignoring them in other contexts. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] tsearch2 patch for 7.4.1
Tom Lane kirjutas N, 20.11.2003 kell 17:18: Oleg Bartunov [EMAIL PROTECTED] writes: we have a patch for contrib/tsearch2 we'd like to commit for 7.4.1. Is it ok ? We added support for compound words using ispell dictionaries. It's rather important feature for agglutinative languages like german, norwegian (tested). This work was sponsored by ABC Startsiden (www.startsiden.no). This strikes me as a feature addition, and therefore not appropriate for the 7.4 branch. You should only commit bug fixes into stable branches, not feature additions. Is this so even for contrib ? I seem to remember that there have been new feature popping up in contrib in stable branches earlier. But if the general direction is to stabilise contrib as well, what about committing it as a separate patch file contrib/tsearch2/patches/compond-words-patch.1 , so that these people (or german/norvegian linux distributors) who consider lack of support for compound words a bug can apply it themselves before building ? --- Hannu ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [BUGS] 7.4: CHAR padding inconsistency
On Thu, 20 Nov 2003 12:40:30 -0500, Tom Lane wrote: AFAICS the Oracle and SQL Server behaviors are at least as inconsistent as our own. If trailing spaces are significant during concatenation, why aren't they significant to LENGTH()? Oracle _does_ count the trailing spaces in it's LENGTH()-function. MSSQL's rules certainly look strange. I'd agree with changing bpcharlen() to not count trailing spaces, I think. That would be consistent with ignoring them in other contexts. Why not just change the behaviour back to what it used to be like? I see no justification for the change: It may break old queries in subtle ways and doesn't make the CHAR-type any more consistent than before. -- Greetings from Troels Arvin, Copenhagen, Denmark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] 7.4 logging bug.
I just installed a 7.4 on windows/cygwin. I restored a dump but ran out of disk space during the creating of an index. In psql I saw the ERROR: could not extend relation . From that point on it seems to have stopped logging most things. The ERROR and HINT are not in the log file, it stopped showing the executed commands, and duration. The only thing it still shows is the recycled transation log file messages. In psql I still see all the messages, like the notice that adding a primary key will create an index. Kurt ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.4: CHAR padding inconsistency
(I already responded to Bruce, but the response seems to be stuck in some mailing list, waiting for approval. Trying again:) On Wed, 19 Nov 2003 14:50:24 -0500, on the BUGS-liste, Bruce Momjian wrote: Anyway, what did you want it to output? AAA hey? We could do that, but I assume most people wouldn't expect that output? I certainly depends on their background. Personally, the padding characteristics of the CHAR type was one of the first things about SQL that I learned (the hard way). Oracle and DB2 people should be used to PostgreSQL's old behaviour. The CHAR type may seem strange to some, but they may then just use VARCHAR. How do other databases handle this? I've started writing about it here: http://troels.arvin.dk/db/rdbms/#data_types-char Some of my test-material is also online: http://troels.arvin.dk/db/tests/chartest-20031119a/ My summary: With regard to CHAR-handling, PostgreSQL 7.4 is now in opposition to - previous versions of PostgreSQL; bad enough on its own, because there doesn't seem to have been a good discussion about it first - I can only find a few messages about it [1] - DB2 - Oracle - MSSQL (which also behaves in a non-standard way, but different from PostgreSQL 7.4) 7.4 is close to how MySQL works, though. I'm sorry about not testing this before 7.4 went gold, but I believe that this is a bug which should be corrected before too much confusion is created. Reference 1: An interesting one is this one: http://article.gmane.org/gmane.comp.db.postgresql.devel.general/10958/match=char+padding -- Greetings from Troels Arvin, Copenhagen, Denmark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 4 Clause license?
Based on the below wouldn't they also have to go after Microsoft? Marc G. Fournier wrote: On Thu, 20 Nov 2003, Andrew Dunstan wrote: trollOf course, now that SCO is claiming ownership of BSD code . /troll Interesting thread that ... last I read on the FreeBSD lists was speculation that they would be going after ppl like Cisco (re: TCP/IP Networking Code) since there really is nobody else large enough to bother with ... its going to be interesting to see :) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] OSDL DBT-2 w/ PostgreSQL 7.3.4 and 7.4beta5
Mark Wong [EMAIL PROTECTED] writes: I've changed all the numerics to integers and reals, where it was appropriate to maintain the precision specificed in the TPC-C spec. ... Looks like I see about an 8% improvement in the metric with this instance. A definite decrease in user time in the processor utilization chart, which I presume is attributed to the 70% decrease in ticks to SearchCatCache in the database as reported by oprofile. Can anyone explain that one? That's odd; I don't see why NUMERIC would be incurring extra SearchCatCache calls. Does your profile provide info about where the SearchCatCache calls are coming from? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] 4 Clause license?
On Thu, 20 Nov 2003, Joshua D. Drake wrote: Based on the below wouldn't they also have to go after Microsoft? Depends ... does MicroSoft use BSD TCP/IP, or did they write their own? I know that Linux is not using BSD TCP/IP (or, at least, they didn't in their first 3 incarnations of the stack) ... Marc G. Fournier wrote: On Thu, 20 Nov 2003, Andrew Dunstan wrote: trollOf course, now that SCO is claiming ownership of BSD code . /troll Interesting thread that ... last I read on the FreeBSD lists was speculation that they would be going after ppl like Cisco (re: TCP/IP Networking Code) since there really is nobody else large enough to bother with ... its going to be interesting to see :) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] 4 Clause license?
Marc G. Fournier wrote: On Thu, 20 Nov 2003, Joshua D. Drake wrote: Based on the below wouldn't they also have to go after Microsoft? Depends ... does MicroSoft use BSD TCP/IP, or did they write their own? I know that Linux is not using BSD TCP/IP (or, at least, they didn't in their first 3 incarnations of the stack) ... M$ have used BSD code in the past, I'm fairly sure, even if they no longer do. But they do have a license from SCO. OK, I know I shouldn't have raised this topic ... cheers andrew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Release cycle length
Jan Wieck [EMAIL PROTECTED] writes: On Tue, 18 Nov 2003, Peter Eisentraut wrote: The time from release 7.3 to release 7.4 was 355 days, an all-time high. We really need to shorten that. I don't see much of a point for a shorter release cycle as long as we don't get rid of the initdb requirement for releases that don't change the system catalog structure. All we gain from that is spreading out the number of different versions used in production. Yeah, I think the main issue in all this is that for real production sites, upgrading Postgres across major releases is *painful*. We have to find a solution to that before it makes sense to speed up the major-release cycle. By the same token, I'm not sure that there's much of a market for development releases --- people who find a 7.3-7.4 upgrade painful aren't going to want to add additional upgrades to incompatible intermediate states. If we could fix that, there'd be more interest. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 4 Clause license?
--On Thursday, November 20, 2003 16:00:44 -0400 Marc G. Fournier [EMAIL PROTECTED] wrote: On Thu, 20 Nov 2003, Joshua D. Drake wrote: Based on the below wouldn't they also have to go after Microsoft? Depends ... does MicroSoft use BSD TCP/IP, or did they write their own? I know that Linux is not using BSD TCP/IP (or, at least, they didn't in their first 3 incarnations of the stack) ... M$ also bought a LARGE license from SCO... -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 pgp0.pgp Description: PGP signature
Re: [HACKERS] 4 Clause license?
Hello, My understanding is that they use the BSD stack (at least as the basis) for TCP/IP. Windows that is. J Marc G. Fournier wrote: On Thu, 20 Nov 2003, Joshua D. Drake wrote: Based on the below wouldn't they also have to go after Microsoft? Depends ... does MicroSoft use BSD TCP/IP, or did they write their own? I know that Linux is not using BSD TCP/IP (or, at least, they didn't in their first 3 incarnations of the stack) ... Marc G. Fournier wrote: On Thu, 20 Nov 2003, Andrew Dunstan wrote: trollOf course, now that SCO is claiming ownership of BSD code . /troll Interesting thread that ... last I read on the FreeBSD lists was speculation that they would be going after ppl like Cisco (re: TCP/IP Networking Code) since there really is nobody else large enough to bother with ... its going to be interesting to see :) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Release cycle length
Kevin Brown [EMAIL PROTECTED] writes: ... That's why the release methodology used by the Linux kernel development team is a reasonable one. I do not think we have the manpower to manage multiple active development branches. The Postgres developer community is a fraction of the size of the Linux community; if we try to adopt what they do we'll just drown in work. It's hard enough to deal with the existing level of commitment to back-patching one stable release --- I know that we miss back-patching bug fixes that probably should have been back-patched. And the stuff that does get back-patched isn't really tested to the level that it ought to be, which discourages us from applying fixes to the stable branch if they are too large to be obviously correct. I don't see manpower emerging from the woodwork to fix those problems. If we were doing active feature development in more than one branch I think our process would break down completely. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Release cycle length
Larry Rosenman [EMAIL PROTECTED] writes: [EMAIL PROTECTED] wrote: 1. Start platform testing on day 1 of beta. Last minute fixes for AIX or UnixWare are really becoming old jokes. The only reason we had last minute stuff for UnixWare this time was the timing of PG's release and the UP3 release from SCO. Yes. The late fixes for OS X also arose from the fact that Apple released a new OS X version late in our beta cycle. I don't think it's reasonable to complain that there was insufficient port testing done earlier; the issues didn't come from that. I do agree with the opinion that our beta cycles are getting too long, and that it's not clear we are getting any additional reliability out of the longer time period. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Shridhar, However I do not agree with this logic entirely. It pegs the next vacuum w.r.t current table size which is not always a good thing. No, I think the logic's fine, it's the numbers which are wrong. We want to vacuum when updates reach between 5% and 15% of total rows. NOT when updates reach 110% of total rows ... that's much too late. Well, looks like thresholds below 1 should be norm rather than exception. Hmmm ... I also think the threshold level needs to be lowered; I guess the purpose was to prevent continuous re-vacuuuming of small tables? Unfortunately, in the current implementation, the result is tha small tables never get vacuumed at all. So for defaults, I would peg -V at 0.1 and -v at 100, so our default calculation for a table with 10,000 rows is: 100 + ( 0.1 * 10,000 ) = 1100 rows. I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that thresholds less than 1 should be used. Furthermore analyze threshold depends upon inserts+updates. I think it should also depends upon deletes for obvious reasons. Yes. Vacuum threshold is counting deletes, I hope? It does. My comment about the frequency of vacuums vs. analyze is that currently the *default* is to analyze twice as often as you vacuum.Based on my experiece as a PG admin on a variety of databases, I believe that the default should be to analyze half as often as you vacuum. OK. I am all for experimentation. If you have real life data to play with, I can give you some patches to play around. I will have real data very soon . I will submit a patch that would account deletes in analyze threshold. Since you want to delay the analyze, I would calculate analyze count as n=updates + inserts *-* deletes Rather than current n = updates + inserts. Also update readme about examples and analyze frequency. What does statistics gather BTW? Just number of rows or something else as well? I think I would put that on Hackers separately. I am still wary of inverting vacuum analyze frequency. You think it is better to set inverted default rather than documenting it? Shridhar ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Shridhar Daithankar wrote: Josh Berkus wrote: Shridhar, However I do not agree with this logic entirely. It pegs the next vacuum w.r.t current table size which is not always a good thing. Ok, what do you recommend? The point of two separate variables allows you to specify if you want vacuum based on a fixed number, based on table size or something inbetween. No, I think the logic's fine, it's the numbers which are wrong. We want to vacuum when updates reach between 5% and 15% of total rows. NOT when updates reach 110% of total rows ... that's much too late. For small tables, you don't need to vacuum too often. In the testing I did a small table ~100 rows, didn't really show significant performance degredation until it had close to 1000 updates. For large tables, vacuum is so expensive, that you don't want to do it very often, and scanning the whole table when there is only 5% wasted space is not very helpful. Hmmm ... I also think the threshold level needs to be lowered; I guess the purpose was to prevent continuous re-vacuuuming of small tables? Unfortunately, in the current implementation, the result is tha small tables never get vacuumed at all. So for defaults, I would peg -V at 0.1 and -v at 100, so our default calculation for a table with 10,000 rows is: 100 + ( 0.1 * 10,000 ) = 1100 rows. Yes, the I set the defaults a little high perhaps so as to err on the side of caution. I didn't want people to say pg_autovacuum kills the performance of my server. A small table will get vacuumed, just not until it has reached the threshold. So a table with 100 rows, will get vacuumed after 1200 updates / deletes. In my testing it showed that there was no major performance problems until you reached several thousand updates / deletes. Furthermore analyze threshold depends upon inserts+updates. I think it should also depends upon deletes for obvious reasons. Yes. Vacuum threshold is counting deletes, I hope? It does. My comment about the frequency of vacuums vs. analyze is that currently the *default* is to analyze twice as often as you vacuum.Based on my experiece as a PG admin on a variety of databases, I believe that the default should be to analyze half as often as you vacuum. HUH? analyze is very very cheap compared to vacuum. Why not do it more often? I am all for experimentation. If you have real life data to play with, I can give you some patches to play around. I will have real data very soon . I will submit a patch that would account deletes in analyze threshold. Since you want to delay the analyze, I would calculate analyze count as deletes are already accounted for in the analyze threshold. I am still wary of inverting vacuum analyze frequency. You think it is better to set inverted default rather than documenting it? I think inverting the vacuum and analyze frequency is wrong. What I think I am hearing is that people would like very much to be able to tweak the settings of pg_autovacuum for individual tables / databases etc. So that you could set certain tables to be vacuumed more agressivly than others. I agree this would be a good and welcome addition. I hope have time to work on this at some point, but in the near future I won't. Matthew ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
On Thursday 20 November 2003 20:00, Matthew T. O'Connor wrote: Shridhar Daithankar wrote: I will submit a patch that would account deletes in analyze threshold. Since you want to delay the analyze, I would calculate analyze count as deletes are already accounted for in the analyze threshold. Yes. My bad. Deletes are not accounted in initializing analyze count but later they are used. I am still wary of inverting vacuum analyze frequency. You think it is better to set inverted default rather than documenting it? I think inverting the vacuum and analyze frequency is wrong. Me. Too. ATM all I can think of this patch attached. Josh, is it sufficient for you?..:-) Matthew, I am confyused about one thing. Why would autovacuum count updates while checking for analyze threshold? Analyze does not change statistics right? ( w.r.t line 1072, pg_autovacuum.c). For updating statistics, only inserts+deletes should suffice, isn't it? Other than that, I think autovacuum does everything it can. Comments? Shridhar *** README.pg_autovacuum.orig Thu Nov 20 19:58:29 2003 --- README.pg_autovacuum Thu Nov 20 20:26:39 2003 *** *** 141,150 depending on the mixture of table activity (insert, update, or delete): ! - If the number of (inserts + updates + deletes) AnalyzeThreshold, then only an analyze is performed. ! - If the number of (deletes + updates) VacuumThreshold, then a vacuum analyze is performed. VacuumThreshold is equal to: --- 141,150 depending on the mixture of table activity (insert, update, or delete): ! - If the number of (inserts + updates + deletes) = AnalyzeThreshold, then only an analyze is performed. ! - If the number of (deletes + updates) = VacuumThreshold, then a vacuum analyze is performed. VacuumThreshold is equal to: *** *** 158,163 --- 158,186 and running ANALYZE more often should not substantially degrade system performance. + Examples: + + Following table shows typical usage of pg_autovacuum settings. + These are put here so that a DBA can have some starting point while + tuning pg_autovacuum. + + Vacuum is triggered by updates and deletes. So in case of vacuum, + last column indicates total of updates and deletes required + to trigger vacuum. In case of analyze, the operations would count total + number of inserts, updates and deletes. + + Threshold Scaling factor Records No. of Operations + 1,000 1 10,000 11,000 + 1,000 2 10,000 21,000 + 1,000 0.5 10,000 6,000 + 1,000 0.1 10,000 2,000 + + Although analyze is cheaper operation compared to vacuum, + it might be needed less often. The default is to analyze twice as much as + vacuum but that might be too aggressive for some installations. It is advised that + such installation tune their analyze threshold separately, rather than relying upon + the default behaviour. + Sleeping: - ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
On Thursday 20 November 2003 20:29, Shridhar Daithankar wrote: On Thursday 20 November 2003 20:00, Matthew T. O'Connor wrote: Shridhar Daithankar wrote: I will submit a patch that would account deletes in analyze threshold. Since you want to delay the analyze, I would calculate analyze count as deletes are already accounted for in the analyze threshold. Yes. My bad. Deletes are not accounted in initializing analyze count but later they are used. I am still wary of inverting vacuum analyze frequency. You think it is better to set inverted default rather than documenting it? I think inverting the vacuum and analyze frequency is wrong. Me. Too. ATM all I can think of this patch attached. Josh, is it sufficient for you?..:-) use this one. A warning added for too aggressive vacuumming. If it is OK by everybody, we can send it to patches list. Shridhar *** README.pg_autovacuum.orig Thu Nov 20 19:58:29 2003 --- README.pg_autovacuum Thu Nov 20 20:35:34 2003 *** *** 141,163 depending on the mixture of table activity (insert, update, or delete): ! - If the number of (inserts + updates + deletes) AnalyzeThreshold, then only an analyze is performed. ! - If the number of (deletes + updates) VacuumThreshold, then a vacuum analyze is performed. VacuumThreshold is equal to: ! vacuum_base_value + (vacuum_scaling_factor * number of tuples in the table) ! AnalyzeThreshold is equal to: ! analyze_base_value + (analyze_scaling_factor * number of tuples in the table) ! The AnalyzeThreshold defaults to half of the VacuumThreshold since it represents a much less expensive operation (approx 5%-10% of vacuum), and running ANALYZE more often should not substantially degrade system performance. Sleeping: - --- 141,191 depending on the mixture of table activity (insert, update, or delete): ! - If the number of (inserts + updates + deletes) = AnalyzeThreshold, then only an analyze is performed. ! - If the number of (deletes + updates) = VacuumThreshold, then a vacuum analyze is performed. VacuumThreshold is equal to: ! vacuum_base_value + (vacuum_scaling_factor * number of tuples in the ! table) AnalyzeThreshold is equal to: ! analyze_base_value + (analyze_scaling_factor * number of tuples in the ! table) The AnalyzeThreshold defaults to half of the VacuumThreshold since it represents a much less expensive operation (approx 5%-10% of vacuum), and running ANALYZE more often should not substantially degrade system performance. + Examples: + + Following table shows typical usage of pg_autovacuum settings. + These are put here so that a DBA can have some starting point while + tuning pg_autovacuum. + + Vacuum is triggered by updates and deletes. So in case of vacuum, + last column indicates total of updates and deletes required + to trigger vacuum. In case of analyze, the operations would count total + number of inserts, updates and deletes. + + Base Scaling factor Records No. of Operations + 1,000 1 10,000 11,000 + 1,000 2 10,000 21,000 + 1,000 0.5 10,000 6,000 + 1,000 0.1 10,000 2,000 + + Although analyze is cheaper operation compared to vacuum, + it might be needed less often. The default is to analyze twice as much as + vacuum but that might be too aggressive for some installations. It is advised + thatsuch installation tune their analyze threshold separately, rather than + relying upon the default behaviour. + + Furthermore, for aggressive vacuum/analyze behaviour, it is recommended that + scaling factor is set to less than 1. However too aggresive operation can affect + performance of normal database operations adversely. Do not apply such setting + to production databases without prior testing. + Sleeping: - ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Shridhar Daithankar wrote: On Thursday 20 November 2003 20:00, Matthew T. O'Connor wrote: Shridhar Daithankar wrote: I am still wary of inverting vacuum analyze frequency. You think it is better to set inverted default rather than documenting it? I think inverting the vacuum and analyze frequency is wrong. Me. Too. ATM all I can think of this patch attached. Josh, is it sufficient for you?..:-) The patch just adds an example to the README, this looks ok to me. Matthew, I am confyused about one thing. Why would autovacuum count updates while checking for analyze threshold? Analyze does not change statistics right? ( w.r.t line 1072, pg_autovacuum.c). For updating statistics, only inserts+deletes should suffice, isn't it? An update is the equivelant of an insert and a delete, so it counts towards the analyze count as much as an insert. Other than that, I think autovacuum does everything it can. It could be more customizable. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Matthew, For small tables, you don't need to vacuum too often. In the testing I did a small table ~100 rows, didn't really show significant performance degredation until it had close to 1000 updates. This is accounted for by using the threshold value. That way small tables get vacuumed less often. However, the way large tables work is very different and I think your strategy shows a lack of testing on large active tables. For large tables, vacuum is so expensive, that you don't want to do it very often, and scanning the whole table when there is only 5% wasted space is not very helpful. 5% is probably too low, you're right ... in my experience, performance degredation starts to set in a 10-15% updates to, for example, a 1.1 million row table, particularly since users tend to request the most recently updated rows. As long as we have the I/O issues that Background Writer and ARC are intended to solve, though, I can see being less agressive on the defaults; perhaps 20% or 25%. If you wait until 110% of a 1.1 million row table is updated, though, that vaccuum will take an hour or more. Additionally, you are not thinking of this in terms of an overall database maintanence strategy. Lazy Vacuum needs to stay below the threshold of the Free Space Map (max_fsm_pages) to prevent creeping bloat from setting in to your databases. With proper configuration of pg_avd, vacuum_mem and FSM values, it should be possible to never run a VACUUM FULL again, and as of 7.4 never run an REINDEX again either. But this means running vacuum frequently enough that your max_fsm_pages threshold is never reached. Which for a large database is going to have to be more frequently than 110% updates, because setting 20,000,000 max_fsm_pages will eat your RAM. Yes, the I set the defaults a little high perhaps so as to err on the side of caution. I didn't want people to say pg_autovacuum kills the performance of my server. A small table will get vacuumed, just not until it has reached the threshold. So a table with 100 rows, will get vacuumed after 1200 updates / deletes. Ok, I can see that for small tables. In my testing it showed that there was no major performance problems until you reached several thousand updates / deletes. Sure. But several thousand updates can be only 2% of a very large table. HUH? analyze is very very cheap compared to vacuum. Why not do it more often? Because nothing is cheap if it's not needed. Analyze is needed only as often as the *aggregate distribution* of data in the tables changes. Depending on the application, this could be frequently, but far more often (in my experience running multiple databases for several clients) the data distribution of very large tables changes very slowly over time. One client's database, for example, that I have running VACUUM on chron scripts runs on this schedule for the main tables: VACUUM only: twice per hour VACUUM ANALYZE: twice per day On the other hand, I've another client's database where most activity involves updates to entire classes of records. They run ANALYZE at the end of every transaction. So if you're going to have a seperate ANALYZE schedule at all, it should be slightly less frequent than VACUUM for large tables. Either that, or drop the idea, and simplify pg_avd by running VACUUM ANALYZE all the time instead of having 2 seperate schedules. BUT now I see how you arrived at the logic you did. If you're testing only on small tables, and not vacuuming them until they reach 110% updates, then you *would* need to analyze more frequently. This is because of your threshold value ... you'd want to analyze the small table as soon as even 30% of its rows changed. So the answer is to dramatically lower the threshold for the small tables. What I think I am hearing is that people would like very much to be able to tweak the settings of pg_autovacuum for individual tables / databases etc. Not from me you're not. Though that would be nice, too. So, my suggested defaults based on our conversation above: Vacuum threshold: 1000 records Vacuum scale factor: 0.2 Analyze threshold: 50 records Analyze scale factor: 0.3 -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Shridhar, I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that thresholds less than 1 should be used. Yes, but not thresholds, scale factors of less than 1.0. Thresholds should still be in the range of 100 to 1000. I will submit a patch that would account deletes in analyze threshold. Since you want to delay the analyze, I would calculate analyze count as n=updates + inserts *-* deletes I'm not clear on how this is a benefit. Deletes affect the statistics, too. What does statistics gather BTW? Just number of rows or something else as well? I think I would put that on Hackers separately. Number of tuples, degree of uniqueness, some sample values, and high/low values. Just query your pg_statistics view for an example. I am still wary of inverting vacuum analyze frequency. You think it is better to set inverted default rather than documenting it? See my post to Matthew. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Chester Kustarz [EMAIL PROTECTED] writes: i have some tables which are insert only. i do not want to vacuum them because there are never any dead tuples in them and the vacuum grows the indexes. Those claims cannot both be true. In any case, plain vacuum cannot grow the indexes --- only a VACUUM FULL that moves a significant number of rows could cause index growth. vacuum is to reclaim dead tuples. this means it depends on update and delete. analyze depends on data values/distribution. this means it depends on insert, update, and delete. thus the dependencies are slightly different between the 2 operations, an so you can come up with use-cases that justify running either more frequently. Agreed. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.4 logging bug.
Kurt Roeckx [EMAIL PROTECTED] writes: I just installed a 7.4 on windows/cygwin. I restored a dump but ran out of disk space during the creating of an index. In psql I saw the ERROR: could not extend relation . From that point on it seems to have stopped logging most things. Do you suppose that the stdio support under cygwin somehow permanently closes the stderr output descriptor after the first failure to write? That would be unpleasant. (Are you sure there *is* now space to write?) regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [BUGS] 7.4: CHAR padding inconsistency
Troels Arvin [EMAIL PROTECTED] writes: Why not just change the behaviour back to what it used to be like? I see no justification for the change: Well, for one thing, it makes the behavior of comparisons compatible with the SQL standard. If we have unpleasant side-effects we can work on those, but I don't have a lot of patience for saying revert it without offering a better solution. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Background writer committed
Shridhar Daithankar wrote: Jan Wieck wrote: I committed the first part of the background writer process. We had a consensus on attempting to avoid write() calls from regular backends, but did no come to any conclusions what to do to force the kernel to actually do some IO. Consequently, this patch is a separate process launched by postmaster, that periodically write()'s out some dirty buffers in LRU order. This causes the buffers returned for replacement (when a backend needs to read in a page) to be clean allways. The process does no sync(), fsync() or any other calls thus far. Nothing has changed in the checkpoint logic either. Can we have some idea where to tweak sync routines for comparing results? I mean I would like to run pgbench with same config all along and compare the performance difference between sync, fsync and fdatasync etc. pgbench is actually a very bad example to test any cache strategy. Either 98% of your lookups result in cache hits, so basically your entire database is cached, or it doesn't fit and every cache strategy becomes useless. It doesn't have parts that fit and other parts that don't. I think pgbench doesn't use non-uniform random access as real world applications do (you have bestsellers and other items, you have frequent customers and once-a-year visitors). So it's very hard to get the system into a state where you have like 50% cache hitrate. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PANIC: rename from /data/pg_xlog/0000002200000009
Yurgis Baykshtis [EMAIL PROTECTED] writes: However, we still experience random PG failures on a weekly basis. PANIC: rename from /data/pg_xlog/00220009 to /data/pg_xlog/00220012 (initialization of log file 34, segment 18) failed: No such file or directory That's curious. Is the kernel portion of the error always No such file or directory, or does it vary? What files do you actually see in /data/pg_xlog/ after such a failure? What filesystem are you using, and what is the platform exactly? IpcMemoryCreate: shmget(key=5432001, size=203415552, 03600) failed: Not enough core This means that you've exceeded the kernel SHMMAX setting. We recently realized that a restart under 7.3 will require SHMMAX to be at least twice the requested shared memory size (203415552 bytes here), because the stats processes remain connected to the old shared memory segment. This is fixed in 7.4 and will be fixed in 7.3.5 whenever it comes out, but in the meantime your best bet is to increase SHMMAX. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] logical column position
Rod Taylor [EMAIL PROTECTED] writes: On Mon, 2003-11-17 at 20:24, Christopher Kings-Lynne wrote: BTW, one main consideration is that all the postgres admin apps will now need to support ORDER BY attlognum for 7.5+. Yeah... how about maintaining attnum for the logical attribute number and create an attphysnum or something for the physical position instead? This is more intrusive into the source, but you don't need to teach new tricks to external entities. [ and similar remarks from other people elsewhere in the thread ] It's completely fallacious to imagine that we could make this change be transparent to external applications. To take two examples: 1. How many places do you think know that pg_attribute.attnum links to pg_attrdef.adnum? pg_dump, psql, and the JDBC driver all appear to know that, in a quick search of the CVS tree; I haven't even bothered to look at pgadmin and the other apps that are likely to have such dependencies. 2. How about linking pg_attribute.attnum to entries in pg_index.indkey? Lots of apps know about that too. Unless you are going to change the meanings of pg_index.indkey and pg_attrdef.adnum, you can't simply redefine attnum as a logical column position. And if you do make such a change you will break code elsewhere. If we add a *new* column attlogpos, without changing the semantics of attnum, then I think we have a fighting chance of making this work without an undue amount of effort. I see no prospect that we can change the meaning of attnum without breaking things far and wide. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] logical column position
Tom Lane [EMAIL PROTECTED] writes: attpos is a horrid choice of name, because no one will be able to remember which of attnum and attpos is which. Pick a more distinct name. Offhand the best thing I can think of is attlognum or attlogpos. Actually, I deliberately chose attpos rather than attlognum (which is what some people had been calling this feature earlier). My reasoning was that the logical number is really a nonsensical idea: we just invented it on the spot. In contrast, a position is a fairly natural thing for an attribute to have -- it's a notion with some counterpart in the real world. To me, at least, it seems intuitive that an attnum would identify a column whereas an attpos would specify the column's position. I'm happy to change the name if there's a consensus that attpos isn't a good choice -- what does everyone think? What are you going to do with deleted columns? I'd be inclined to give them all attlogpos = 0, but that destroys your last comment. I hadn't planned to do anything in particular for deleted columns: since they are never displayed to the user, does it matter what their attpos is? In any event, the property that no two columns in a table have the same logical number isn't important anyway. You have not presented any proposal for exactly what ALTER TABLE operations would be offered to manipulate the column positions. I'd like to get the backend storage side of things implemented first. I'll take a look at the archives before I do any UI work -- thanks for the suggestion. -Neil ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Release cycle length
Yeah, I think the main issue in all this is that for real production sites, upgrading Postgres across major releases is *painful*. We have to find a solution to that before it makes sense to speed up the major-release cycle. Well, I think one of the simplest is to do a topological sort of objects in pg_dump (between object classes that need it), AND regression testing for pg_dump :) Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] Handy user/group hack
Kind people, Here's something I came up with for finding whether a PostgreSQL 7.4 user is in a group. Cheers, D CREATE OR REPLACE FUNCTION in_group (text, text) RETURNS BOOLEAN AS ' DECLARE the_user ALIAS FOR $1; the_group ALIAS FOR $2; dummy text; -- SELECT INTO dummy because PERFORM always returns true. -- Is this a bug? BEGIN SELECT INTO dummy u.usename FROM pg_user u , pg_group g WHERE u.usename = the_user AND g.groname = the_group AND u.usesysid = ANY (g.grolist); IF FOUND THEN RETURN true; ELSE RETURN false; END IF; END; ' LANGUAGE 'plpgsql' STRICT IMMUTABLE; -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100cell: +1 415 235 3778 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] logical column position
Neil Conway [EMAIL PROTECTED] writes: Actually, I deliberately chose attpos rather than attlognum (which is what some people had been calling this feature earlier). My reasoning was that the logical number is really a nonsensical idea: we just invented it on the spot. True ... In contrast, a position is a fairly natural thing for an attribute to have -- it's a notion with some counterpart in the real world. But position could at least as logically be considered to mean the physical position in the tuple. I still say that these names are ripe for confusion. I don't have a better choice of name offhand, but if we spend 1% of the time already spent arguing about these issues on finding a better name, I'm sure we can think of one ;-) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] logical column position
I don't have a better choice of name offhand, but if we spend 1% of the time already spent arguing about these issues on finding a better name, I'm sure we can think of one ;-) virtual (attvirtnum) external (attextnum) atttisoywnum - attribute this is the one you want number ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PANIC: rename from /data/pg_xlog/0000002200000009
Yurgis Baykshtis [EMAIL PROTECTED] writes: The most interesting thing is that rename failure is always followed by IpcMemoryCreate and vice-versa, IpcMemoryCreate always comes after the rename error. That is not interesting ... it's exactly what I'd expect for the panic recovery sequence (given that SHMMAX is preventing creation of a second shared-memory segment). What filesystem are you using, and what is the platform exactly? DBExperts 7.3.4 on Win2000 (so it's a cygwin-based system) Perhaps you need to get a real operating system :-(. No such failure mode has been reported on any Unix variant, AFAIR. It's hard to be certain what's happening from the after-the-fact evidence you've offered. I'd like to see what is in pg_xlog immediately after the crash, *before* Postgres is restarted. I get the feeling that what we will see is the destination filename already present and the source not, which would suggest that two backends tried to do the rename concurrently. AFAICS that must mean that the operating system's lock support is broken, because we do not try to rename WAL segments except while holding the CheckpointLock, not to mention the ControlFileLock. This is not necessarily Windows' fault, it could be a cygwin or cygipc bug ... are you up to date on those? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] code question: rewriteDefine.c
I noticed the following code in src/backend/rewrite/rewriteDefine.c, circa line 390: /* * Are we converting a relation to a view? * * If so, check that the relation is empty because the storage * for the relation is going to be deleted. Also insist that * the rel not have any triggers, indexes, or child tables. */ if (event_relation-rd_rel-relkind != RELKIND_VIEW) { Under what circumstances do we convert a relation to a view? Is this functionality exposed to the user? Furthermore, it seems broken: it checks the pgclass.relhassubclass attribute for this relation to see if it has child tables, but this is wrong, as relhassubclass only indicates that the relation MAY have a subclass, not that is definitely does[1]. It also doesn't drop the relation's TOAST table, if any, as the code itself notes. -Neil [1] This is because relhassubclass is not updated when a table's child table is dropped, due to concurrency concerns; see has_subclass() in plancat.c ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Sponsoring enterprise features
Rod Taylor [EMAIL PROTECTED] writes: Personally, I think the best way is simply to make a post on -hackers with a description of what you want to accomplish with a call for estimates and proposals. ... I say a description of what you want to accomplish because certain features are not as useful on PostgreSQL as they are other databases (data partitioning being one of them, due to the ability to use partial indexes) so you may not achieve what you are expecting. Right. You can in any case get a great deal of free advice by starting a pghackers discussion ;-) It should be noted that because Oracle does it that way is a guaranteed nonstarter as a rationale for any Postgres feature proposal. There are enough differences between Postgres and Oracle that you will need to do significant investigation before assuming that an Oracle- based feature design is appropriate for Postgres. Aside from technical differences, we have fundamentally different priorities --- one of which is simplicity of administration. You'll get no buyin on proposals that tend to create Oracle-like difficulties of installation and tuning. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Tom Lane wrote: Chester Kustarz [EMAIL PROTECTED] writes: vacuum is to reclaim dead tuples. this means it depends on update and delete. analyze depends on data values/distribution. this means it depends on insert, update, and delete. thus the dependencies are slightly different between the 2 operations, an so you can come up with use-cases that justify running either more frequently. Agreed. And that is why pg_autovacuum looks at insert, update and delete when deciding to do an analyze, but only looks at update and delete when deciding to do a vacuum. In addition, this is why pg_autovacuum was given knobs so that the vacuum and analyze thresholds can be set independently. Matthew ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Matthew, For small tables, you don't need to vacuum too often. In the testing I did a small table ~100 rows, didn't really show significant performance degredation until it had close to 1000 updates. This is accounted for by using the threshold value. That way small tables get vacuumed less often. However, the way large tables work is very different and I think your strategy shows a lack of testing on large active tables. Probably more true than I would like to think... For large tables, vacuum is so expensive, that you don't want to do it very often, and scanning the whole table when there is only 5% wasted space is not very helpful. 5% is probably too low, you're right ... in my experience, performance degredation starts to set in a 10-15% updates to, for example, a 1.1 million row table, particularly since users tend to request the most recently updated rows. As long as we have the I/O issues that Background Writer and ARC are intended to solve, though, I can see being less agressive on the defaults; perhaps 20% or 25%. If you wait until 110% of a 1.1 million row table is updated, though, that vaccuum will take an hour or more. True, but I think it would be one hour once, rather than 30 minutes 4 times. Additionally, you are not thinking of this in terms of an overall database maintanence strategy. Lazy Vacuum needs to stay below the threshold of the Free Space Map (max_fsm_pages) to prevent creeping bloat from setting in to your databases. With proper configuration of pg_avd, vacuum_mem and FSM values, it should be possible to never run a VACUUM FULL again, and as of 7.4 never run an REINDEX again either. This is one of the things I had hoped to add to pg_autovacuum, but never got to. In addition to just the information from the stats collector on inserts updates and deletes, pg_autovacuum should also look at the FSM, and make decisions based on it. Anyone looking for a project? But this means running vacuum frequently enough that your max_fsm_pages threshold is never reached. Which for a large database is going to have to be more frequently than 110% updates, because setting 20,000,000 max_fsm_pages will eat your RAM. Again, the think the only way to do this efficiently is to look at the FSM. Otherwise the only way to make sure you keep the FSM populated is to run vacuum more than needed. Yes, the I set the defaults a little high perhaps so as to err on the side of caution. I didn't want people to say pg_autovacuum kills the performance of my server. A small table will get vacuumed, just not until it has reached the threshold. So a table with 100 rows, will get vacuumed after 1200 updates / deletes. Ok, I can see that for small tables. In my testing it showed that there was no major performance problems until you reached several thousand updates / deletes. Sure. But several thousand updates can be only 2% of a very large table. But I can't imagine that 2% makes any difference on a large table. In fact I would think that 10-15% would hardly be noticable, beyond that I'm not sure. HUH? analyze is very very cheap compared to vacuum. Why not do it more often? Because nothing is cheap if it's not needed. Analyze is needed only as often as the *aggregate distribution* of data in the tables changes. Depending on the application, this could be frequently, but far more often (in my experience running multiple databases for several clients) the data distribution of very large tables changes very slowly over time. Valid points, and again I think this points to the fact that pg_autovacuum needs to be more configurable. Being able to set different thresholds for different tables will help considerably. In fact, you may find that some tables should have a vac threshold much larger than the analyze thresold, while other tables might want the opposite. One client's database, for example, that I have running VACUUM on chron scripts runs on this schedule for the main tables: VACUUM only: twice per hour VACUUM ANALYZE: twice per day I would be surprized if you can notice the difference between a vacuum analyze and a vacuum, especially on large tables. On the other hand, I've another client's database where most activity involves updates to entire classes of records. They run ANALYZE at the end of every transaction. So if you're going to have a seperate ANALYZE schedule at all, it should be slightly less frequent than VACUUM for large tables. Either that, or drop the idea, and simplify pg_avd by running VACUUM ANALYZE all the time instead of having 2 seperate schedules. I think you need two separate schedules. There are lots of times where a vacuum doesn't help, and an analyze is all that is needed, and an analyze is MUCH cheaper than a vacuum. BUT now I see how you arrived at the logic you did. If you're testing only on small tables, and not
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
Matthew, 110% of a 1.1 million row table is updated, though, that vaccuum will take an hour or more. True, but I think it would be one hour once, rather than 30 minutes 4 times. Well, generally it would be about 6-8 times at 2-4 minutes each. This is one of the things I had hoped to add to pg_autovacuum, but never got to. In addition to just the information from the stats collector on inserts updates and deletes, pg_autovacuum should also look at the FSM, and make decisions based on it. Anyone looking for a project? Hmmm ... I think that's the wrong approach. Once your database is populated, it's very easy to determine how to set the FSM for a given pg_avd level. If you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of the total database pages plus growth safety margins. I'd be really reluctant to base pv-avd frequency on the fsm settings instead. What if the user loads 8GB of data but leaves fsm_pages at the default of 10,000? You can't do much with that; you'd have to vacuum if even 1% of the data changed. The other problem is that calculating data pages from a count of updates+deletes would require pg_avd to keep more statistics and do more math for every table. Do we want to do this? But I can't imagine that 2% makes any difference on a large table. In fact I would think that 10-15% would hardly be noticable, beyond that I'm not sure. I've seen performance lag at 10% of records, especially in tables where both update and select activity focus on one subset of the table (calendar tables, for example). Valid points, and again I think this points to the fact that pg_autovacuum needs to be more configurable. Being able to set different thresholds for different tables will help considerably. In fact, you may find that some tables should have a vac threshold much larger than the analyze thresold, while other tables might want the opposite. Sure. Though I think we can make the present configuration work with a little adjustment of the numbers. I'll have a chance to test on production databases soon. I would be surprized if you can notice the difference between a vacuum analyze and a vacuum, especially on large tables. It's substantial for tables with high statistics settings. A 1,000,000 row table with 5 columns set to statistics=250 can take 3 minutes to analyze on a medium-grade server. I think you need two separate schedules. There are lots of times where a vacuum doesn't help, and an analyze is all that is needed Agreed. And I've just talked to a client who may want to use pg_avd's ANALYZE scheduling but not use vacuum at all. BTW, I think we should have a setting for this; for example, if -V is -1, don't vacuum. I'm open to discussion on changing the defaults. Perhaps what it would be better to use some non-linear (perhaps logorithmic) scaling factor. So that you wound up with something roughly like this: #tuples activity% for vacuum 1k 100% 10k 70% 100k 45% 1M20% 10M 10% 100M 8% That would be cool, too.Though a count of data pages would be a better scale than a count of rows, and equally obtainable from pg_class. Thanks for the lucid feedback / discussion. autovacuum is a feature that, despite it's simple implementation, has generated a lot of feedback from users, and I would really like to see it become something closer to what it should be. Well, I hope to help now. Until very recently, I've not had a chance to seriously look at pg_avd and test it in production. Now that I do, I'm interested in improving it. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org