Re: [HACKERS] Fixing insecure security definer functions
On 2/15/07, Tom Lane [EMAIL PROTECTED] wrote: Merlin Moncure [EMAIL PROTECTED] writes: yikes! If you guys go through with forcing functions to attach to objects when they are created, it will break almost every project I've ever worked on :(. The schema/function combo fits into all kinds of de facto partitioning strategies and organization methods. If you read a bit further, I did suggest providing an option to retain the current behavior. I don't think it should be the default though. Yeah, I saw that, but the issue is really deeper, functions that create functions, etc. changing the default behavior affects how functions work in a really fundamental way...all pl/pgsql code that can float over schemas would have to be checked. In the worst case, this could mean converting large libraries to dynamic sql or creating thousands of additional functions...ugh. Maybe there could be a GUC setting(function default function schema path=current path/path null)? It would seem only appropriate to have security definer raise a warning/error for path null though. Then we could debate about how that should be set by default but nobody really loses that argument. merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS
Jim C. Nasby [EMAIL PROTECTED] writes: On Thu, Feb 08, 2007 at 01:54:13PM -0500, Tom Lane wrote: I would be satisfied if the returned command tag were something else, maybe NO OPERATION. TABLE blah DID NOT EXIST might be less confusing... You're confusing a command tag with a notice. In the first place, we shouldn't assume that applications are ready to deal with indefinitely long command tags (the backend itself doesn't think they can be longer than 64 bytes); in the second place, they should be constant strings for the most part so that simple strcmp()s suffice to see what happened. Command tags are meant for programs to deal with, more than humans. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] buildfarm failure in XML code
UX:acomp: ERROR: xml.c, line 2188: undefined symbol: INT64_MAX UX:acomp: ERROR: xml.c, line 2188: undefined symbol: INT64_MIN http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=warthogdt=2007-02-16%2009:06:01 -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Mail getting through?
I sent two emails last night, one to -patches and one to -hackers. Neither seems to have come through. Has anyone else seen them? On -patches I posted an updated patch that was functionally a noop but changed the macro api to SETVARSIZE(). It also replaced the VARATT_SIZE and VARATT_DATA macros with VARSIZE and VARDATA, changed inet to call detoast, and changed arrays, inet, geometric data types and a handful of others to use SET_VARSIZE instead of accessing a struct member directly. On -hackers I reposted Tom's email where he proposed two sets of bitpatterns with tradeoffs and also included an additional one that was the second of his with the 2-byte cases removed. In it I said that removing the 2-byte cases had no advantages but actually since then I've thought of one. It makes the toaster code simpler since it can just set a bit in the four-byte header just as it does now. It doesn't have to worry about converting to a 2-byte header. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] HOT WIP Patch - version 1
Ühel kenal päeval, N, 2007-02-15 kell 10:49, kirjutas Heikki Linnakangas: We already log tuple removals by normal vacuums. We can't use that wal entry as it is: if a dead tuple is in the middle of an update chain, it needs to be unlinked from the chain. But I don't see any particular problem with that, it just needs to be wal logged like every other data changing operation. Do we actually ever want to remove dead tuples from the middle of the chain? If a tuple in the middle of the chain is dead, surely every tuple before it in the chain is dead as well, and we want to remove them as well. I'm thinking, removing tuples from the middle of the chain can be problematic, because we'd need to fiddle with the xmin/xmax of the other tuples to make them match. Or change the tuple-following logic to not do the xmin=xmax check, but it's a nice robustness feature. What kind of robustness does it provide ? In other words - what failure scenario does this guard against ? I can't see the case where the xmin=xmax check can not succeed, at least not for same page tuples. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [COMMITTERS] pgsql: Functions for mapping table data and table schemas to XML (a.k.a.
Am Freitag, 16. Februar 2007 08:46 schrieb Peter Eisentraut: Log Message: --- Functions for mapping table data and table schemas to XML (a.k.a. XML export) Breaks various platforms. I'm on it. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Mail getting through?
Gregory Stark wrote: I sent two emails last night, one to -patches and one to -hackers. Neither seems to have come through. Has anyone else seen them? Not yet, but I got bounces from the server saying 451 - Server configuration problem or something to that effect. Since my emails were eventually delivered, I'd expect yours to be too. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_restore fails with a custom backup file
On Fri, Feb 16, 2007 at 02:09:41PM +0900, Yoshiyuki Asaba wrote: Does not compile on my MinGW - errors in the system headers (unistd.h, io.h) due to changing the argument format for chsize(). The change of off_t propagated into parts of the system headers, thus chaos was ensured. I still think we need to use a pgoff_t. Will look at combining these two approaches. Here's a patch that tries this. *needs more testing*. But built with this patch, I can dump and restore a table at the end of a 10gb database without errors. I tried the attached patch. But I got the following error. pg_backup_archiver.o(.text+0x1fa4): In function `allocAH': C:/msys/1.0/home/y-asaba/postgresql-8.2.3-patch/src/bin/pg_dump/pg_backup_archiver.c:1580: undefined reference to `fseeko64' ... make[3]: *** [pg_dump] Error 1 $ uname -sr MINGW32_NT-5.1 1.0.10(0.46/3/2) Is MINGW version too old? I think so. It seems this was added in version 1.24 of stdio.h in mingw (http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/mingw/include/stdio.h?cvsroot=src). Could you try upgrading mingw and see if that helps? Or possibly instlaling side-by-side a different version (if they even allow that)? //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT for PostgreSQL 8.3
Ühel kenal päeval, K, 2007-02-14 kell 10:41, kirjutas Tom Lane: Hannu Krosing [EMAIL PROTECTED] writes: OTOH, for same page HOT tuples, we have the command and trx ids stored twice first as cmax,xmax of the old tuple and as cmin,xmin of the updated tuple. One of these could probably be used for in-page HOT tuple pointer. This proposal seems awfully fragile, because the existing tuple-chain-following logic *depends for correctness* on comparing each tuple's xmin to prior xmax. What kinds of correctnes guarantees does this give for same-page tuples? The comparing of each tuple's xmin to prior xmax should stay for inter-page ctid links. Mostly you can think of the same-page HOT chain as one extended tuple when looking at it from outside of that page. I don't think you can just wave your hands and say we don't need that cross-check. Furthermore it seems to me you haven't fixed the problem, which is that you can't remove the chain member that is being pointed at by off-page links (either index entries or a previous generation of the same tuple). You can't remove any tuples before they are invisible for all transactions (i.e. dead). And being dead implies that all previous versions are dead as well. So if I can remove a tuple, I can also remove all its previous versions as well. Or are you trying to say that VACUUM follows ctid links of dead tuples for some purpose ? The problem I am trying to fix is reusing in-page space without need to touch indexes. As described, you've made that problem worse because you're trying to say we don't know which of the chain entries is pointed at. There should be a flag, say HOT_CHAIN_ENTRY for the tuple the index(es) point at. And this should be the preferred CTID for inserting new versions once the old one is dead. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] buildfarm failure in XML code
Am Freitag, 16. Februar 2007 14:59 schrieb Alvaro Herrera: UX:acomp: ERROR: xml.c, line 2188: undefined symbol: INT64_MAX UX:acomp: ERROR: xml.c, line 2188: undefined symbol: INT64_MIN http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=warthogdt=2007-02-1 6%2009:06:01 It needs a rebuild after the fix. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Intermittent buildfarm failures due to timestamptz test
Multiple systems are occasionally failing the timestamptz test at the check stage. From what I can tell, this is due to the newly added ISO week checks referring to TIMESTAMP_TBL used by the timestamp test instead of TIMESTAMPTZ_TBL. Both checks run at the same segment, so the test fails if the timing between the timestamp and timestamptz checks doesn't match. http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=jackaldt=2007-02-16%2012:21:01 -- Seneca [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] buildfarm failure in XML code
Peter Eisentraut [EMAIL PROTECTED] writes: Am Freitag, 16. Februar 2007 14:59 schrieb Alvaro Herrera: UX:acomp: ERROR: xml.c, line 2188: undefined symbol: INT64_MAX UX:acomp: ERROR: xml.c, line 2188: undefined symbol: INT64_MIN http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=warthogdt=2007-02-1 6%2009:06:01 It needs a rebuild after the fix. This fix doesn't fix anything. We have always in the past managed to avoid assuming that int64 actually is available; I don't intend to give the xml code a free pass to break that, especially for such an utterly marginal purpose as this code has. I'm also wondering why xml.c is the only place anywhere in the code that uses stdint.h. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Mail getting through? Short varlena headers
I sent two emails last night, one to -patches and one to -hackers. Neither seems to have come through. Has anyone else seen them? I sent this email once already too and it hasn't come through. I'm resending it from a different account now. On -patches I posted an updated patch that was functionally a noop but changed the macro api to SETVARSIZE(). It also replaced the VARATT_SIZE and VARATT_DATA macros with VARSIZE and VARDATA, changed inet to call detoast, and changed arrays, inet, geometric data types and a handful of others to use SET_VARSIZE instead of accessing a struct member directly. On -hackers I reposted Tom's email where he proposed two sets of bitpatterns with tradeoffs and also included an additional one that was the second of his with the 2-byte cases removed. In it I said that removing the 2-byte cases had no advantages but actually since then I've thought of one. It makes the toaster code simpler since it can just set a bit in the four-byte header just as it does now. It doesn't have to worry about converting to a 2-byte header. So I'm thinking of doing it for now at least. I still think paying 2 bytes on virtually every datum is silly even if mathematically it's only 2% space savings that's still a 2% performance penalty on sequential scans and it gains us nothing except a few lines of code saved in tuptoaster.c. Comments? -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] buildfarm failure in XML code
Am Freitag, 16. Februar 2007 16:09 schrieb Tom Lane: This fix doesn't fix anything. Sure, it fixes the compilation failures. We have always in the past managed to avoid assuming that int64 actually is available; I don't intend to give the xml code a free pass to break that, I don't intend that either, but a proper fix will be more elaborate. I will work on that. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT for PostgreSQL 8.3
On 2/16/07, Hannu Krosing [EMAIL PROTECTED] wrote: Ühel kenal päeval, K, 2007-02-14 kell 10:41, kirjutas Tom Lane: Hannu Krosing [EMAIL PROTECTED] writes: OTOH, for same page HOT tuples, we have the command and trx ids stored twice first as cmax,xmax of the old tuple and as cmin,xmin of the updated tuple. One of these could probably be used for in-page HOT tuple pointer. This proposal seems awfully fragile, because the existing tuple-chain-following logic *depends for correctness* on comparing each tuple's xmin to prior xmax. What kinds of correctnes guarantees does this give for same-page tuples? I agree with Tom that xmin/xmax check does help to guarantee correctness. I myself have used it often during HOT development to find/fix bugs. But ISTM that we don't need atleast for in-page tuple chain, if we are careful. So if removing this buys us something important, I am all for it. The comparing of each tuple's xmin to prior xmax should stay for inter-page ctid links. Agree. Mostly you can think of the same-page HOT chain as one extended tuple when looking at it from outside of that page. I don't think you can just wave your hands and say we don't need that cross-check. Furthermore it seems to me you haven't fixed the problem, which is that you can't remove the chain member that is being pointed at by off-page links (either index entries or a previous generation of the same tuple). You can't remove any tuples before they are invisible for all transactions (i.e. dead). And being dead implies that all previous versions are dead as well. So if I can remove a tuple, I can also remove all its previous versions as well. Or are you trying to say that VACUUM follows ctid links of dead tuples for some purpose ? The only exception to this would be the case of aborted updates. In that case a tuple is dead, but the one pointing to it is still live. But I don't see any reason somebody would want to follow a chain past a live tuple. Not sure about the VACUUM FULL code path though. Thats the only place other than EvalPlanQual where we follow ctid chain. The problem I am trying to fix is reusing in-page space without need to touch indexes. Can we do some kind of indirection from the root line pointer ? Haven't completely thought through yet, but the basic idea is to release the actual space consumed by the root tuple once it becomes dead, but store the offnum of the new root in the line pointer of the original root tuple. We may need to flag the line pointer for that, but if I am not wrong, LP_DELETE is not used for heap tuples. We would waste 4 bytes of line pointer until the tuple is COLD updated and the entire chain and the associated index entry is removed. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] HOT for PostgreSQL 8.3
As described, you've made that problem worse because you're trying to say we don't know which of the chain entries is pointed at. There should be a flag, say HOT_CHAIN_ENTRY for the tuple the it's called HEAP_UPDATE_ROOT index(es) point at. And this should be the preferred CTID for inserting new versions once the old one is dead. This is not possible, see my reply to Bruce (maybe unless the whole hot chain is dead). (because that would need a back pointer, so readers arriving at the root find the visible tuple) Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] HOT for PostgreSQL 8.3
On 2/16/07, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote: As described, you've made that problem worse because you're trying to say we don't know which of the chain entries is pointed at. There should be a flag, say HOT_CHAIN_ENTRY for the tuple the it's called HEAP_UPDATE_ROOT Just to avoid any confusion with the patch I sent out this week, we are setting HEAP_UPDATE_ROOT on all tuples which are HOT-updated. We set HEAP_ONLY_TUPLE for all tuples which does not have index reference. So may be combination of (HEAP_UPDATE_ROOT ~HEAP_ONLY_TUPLE) can be used to identify index referred tuple in a HOT-update chain. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Intermittent buildfarm failures due to timestamptz test
Seneca Cunningham wrote: Multiple systems are occasionally failing the timestamptz test at the check stage. From what I can tell, this is due to the newly added ISO week checks referring to TIMESTAMP_TBL used by the timestamp test instead of TIMESTAMPTZ_TBL. Both checks run at the same segment, so the test fails if the timing between the timestamp and timestamptz checks doesn't match. http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=jackaldt=2007-02-16%2012:21:01 Alvaro has committed a fix for this. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] HOT for PostgreSQL 8.3
Just to avoid any confusion with the patch I sent out this week, we are setting HEAP_UPDATE_ROOT on all tuples which are HOT-updated. We set HEAP_ONLY_TUPLE for all tuples which does not have index reference. So may be combination of (HEAP_UPDATE_ROOT ~HEAP_ONLY_TUPLE) can be used to identify index referred tuple in a HOT-update chain. Oh sorry. Thanks for the clarification. Imho HEAP_UPDATE_ROOT should be renamed for this meaning then (or what does ROOT mean here ?). Maybe HEAP_UPDATE_CHAIN ? Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Mail getting through? Short varlena headers
Gregory Stark [EMAIL PROTECTED] writes: In it I said that removing the 2-byte cases had no advantages but actually since then I've thought of one. It makes the toaster code simpler since it can just set a bit in the four-byte header just as it does now. It doesn't have to worry about converting to a 2-byte header. Run that by me again? A toast pointer datum ought to have a 1-byte header, since its only twenty-something bytes long. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] buildfarm failure in XML code
Peter Eisentraut [EMAIL PROTECTED] writes: Am Freitag, 16. Februar 2007 16:09 schrieb Tom Lane: This fix doesn't fix anything. Sure, it fixes the compilation failures. Not here: gcc -O1 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -fno-strict-aliasing -g -I../../../../src/include -D_XOPEN_SOURCE_EXTENDED -c -o xml.o xml.c xml.c: In function `map_sql_type_to_xmlschema_type': xml.c:2192: `INT64_MAX' undeclared (first use in this function) xml.c:2192: (Each undeclared identifier is reported only once xml.c:2192: for each function it appears in.) xml.c:2192: `INT64_MIN' undeclared (first use in this function) make[4]: *** [xml.o] Error 1 And this is in a build WITHOUT xml enabled --- why is this code being compiled at all? Kindly fix. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Mail getting through? Short varlena headers
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: In it I said that removing the 2-byte cases had no advantages but actually since then I've thought of one. It makes the toaster code simpler since it can just set a bit in the four-byte header just as it does now. It doesn't have to worry about converting to a 2-byte header. Run that by me again? A toast pointer datum ought to have a 1-byte header, since its only twenty-something bytes long. I was referring to compressed inline data. By not having inline compressed data be 2-byte headers it eliminates having to check a lot of corner cases and reduces the changes in tuptoaster.c since it means pg_lzcompress can return a normal 4-byte header and nobody has to convert it to a 2-byte header. So I am doing that for now. I suspect we'll never get around to reintroducing 2-byte headers, but we could if we wanted to. It would be a small change everywhere else but an annoying bunch of fiddly changes in tuptoaster.c. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT for PostgreSQL 8.3
On 2/16/07, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote: Oh sorry. Thanks for the clarification. Imho HEAP_UPDATE_ROOT should be renamed for this meaning then (or what does ROOT mean here ?). Maybe HEAP_UPDATE_CHAIN ? Yes, you are right. There is some disconnect between what Simon had originally posted and the patch I sent out. Hopefully as we discuss HOT more here, everything will converge. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] anyelement2 pseudotype
Tom Lane wrote: So it seems neither can_coerce_type() nor find_coercion_pathway() are really particularly well thought out in terms of what they test or don't test. I'm not very sure what a good refactoring would look like, but I am sure that I don't want all their call sites having to individually account for ANYfoo types. Any thoughts? Yeah, I remember thinking at the time that some of it was a bit backwards, but it's been almost 6 months since I did the original enum patch, so I'll need to refresh my memory. I'll have a look over the weekend and see if I can come up with something that'll work for these various cases. To begin with I'll need to do a survey of the call sites to see what they really need, since perhaps it isn't what the coerce functions are currently offering. :) I completely agree that anything requiring call sites to understand specifics about ANY* types is a bad idea, the most that we would want would be a generic IsGeneric(typoid) macro, but it would be nice to hide that inside a coerce function as well. We'll see. Cheers Tom ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] buildfarm failure in XML code
Tom Lane wrote: Not here: Fixed. And this is in a build WITHOUT xml enabled --- why is this code being compiled at all? The enablement pertains to libxml, which this code doesn't use. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] buildfarm failure in XML code
Peter Eisentraut [EMAIL PROTECTED] writes: Tom Lane wrote: Not here: Fixed. Looks good, thanks. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] anyelement2 pseudotype
Tom Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: So it seems neither can_coerce_type() nor find_coercion_pathway() are really particularly well thought out in terms of what they test or don't test. I'm not very sure what a good refactoring would look like, but I am sure that I don't want all their call sites having to individually account for ANYfoo types. Any thoughts? To begin with I'll need to do a survey of the call sites to see what they really need, since perhaps it isn't what the coerce functions are currently offering. :) I realized that I can probably fix ATAddForeignKeyConstraint to do the right thing by having it pass the two actual column types to can_coerce_type, thus allowing check_generic_type_consistency to kick in and detect the problem. I haven't got round to trying that (up to my rear in planner bugs ATM :-() but I think the immediate problem can be dealt with without refactoring. Still, if you have any ideas for making this code cleaner, I'm all ears. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Confusing message on startup after a crash while recovering
Hi When postgres crashes during recovery, and is then restarted, it says: database system was interrupted while in recovery at ... This probably means that some data is corrupted and you will have to use the last backup for recovery. When I first read that message, I assumed that there are cases were postgres can't recover from a crash that happened during recovery. I guessed that some operations done during wal restore are not idempotent, and lead to corrupt data if performed twice. Only after actually reading the sourcecode of xlog.c, and seeing that the a similar (but better worded) warning is output after a crash during archive log replay, I realized that this warning probably just means that corrupt data could be the _cause_ for the crash during recovery, not the _caused_by_ a crash during recovery. I'd suggest that the text is changed to something along the line of: database system was interrupted while in recovery at ... If this has occurred more than once some data may be corrupted and you may need to restore from the last backup. This would also match the message for interrupted while doign archive log replay more closely. greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL
My suggestion would be to focus on a period data type first and foremost, as that's something that could be readily used by a lot of folks. Of particular note, it's difficult to query tables that have start_time and end_time fields to define a period; it's easy to screw up the boundary conditions, and it's also hard to make those queries perform well without going to extra lengths (such as defining a 'bogus' GiST index on something like box(point(start,start),point(end,end)). And it's not possible to do that in a way that avoids floating points and their errors. On Sat, Feb 10, 2007 at 12:20:28AM -0700, Warren Turkal wrote: Temporal Extensions for PostgreSQL by: Warren Turkal I would like to see a comprehensive solution to time varying tables (or temporal) in PostgreSQL. I specifically want to see suuport for valid-time and transacation-time and bitemporal (valid-time and transaction-time) tables. I will be defering the descriptions of much of the functionality to Dr. Richard T. Snodgrass's _Developing Time-Oriented Database Applications in SQL_ at [1]. The mangled pages 30-31 are at [2]. a) Functionality Dr. Richard T. Snodgrass has worked on defining semantics of temporal very completely in several writings. He was also involved in an unsuccessful effort to standardize temporal extensions to SQL. I believe his book does a good job in presenting the semantics of temporal databases and describing extensions to SQL that make the data much more natural with which to work. b) How current solutions fall flat Current solutions fall flat due to the extreme complexity of implementing valid-time and transaction time semantics on tables by adding columns to track all of the data. Please see chapter 11 of [1] for a more complete description of this complexity. Chapter 12 of [1] goes on to lay out new syntax for SQL that will make dealing with data of this nature much more natural. c) Examples --create normal table CREATE TABLE products ( id SERIAL PRIMARY KEY , description TEXT ); -- Add valid-time support to the table with granularity of timestamp. ALTER TABLE products ADD VALIDTIME PERIOD(TIMESTAMP WITH TIMEZONE); -- Insert row valid from 2006-01-01 to just before 2007-01-01 VALIDTIME PERIOD '[2006-01-01 - 2007-01-01)' INSERT INTO products ( description ) VALUES ( 'red ball' ); -- Insert row valid from 2007-01-01 to just before 2008-01-01 -- Should be smart enough to realize the id=777 does not conflict in this time -- of validity. VALIDTIME PERIOD '[2007-01-01 - 2008-01-01)' INSERT INTO products ( id , description ) VALUES ( 777 , 'blue ball' ); -- Select history of products with id=777 VALIDTIME SELECT * FROM product WHERE id=777; id | description | valid_period -- 777| red ball| [2006-01-01 - 2007-01-01) 777| blue ball | [2007-01-01 - 2008-01-01) -- Select current products with id=777 -- The date when query was run was 2007-02-10. SELECT * FROM products WHERE id=777; id | description -- 777| blue ball There are many more details in chapter 12 of [1]. d) New stuff (dependencies, indices, syntax, libraries) One of the base level additions is the PERIOD datatype. I think that implementing temporal support is reliant on developing such a type. The description of this datatype is laid out in chapter 4 of [1]. The SQL syntax is present in chapter 12 of [1]. I see this as the first piece that needs to be implemented in order to take steps toward a DBMS to supports full temporal capabilities. I think that PERIOD can largely reuse the datatime functionality for parsing of literals and for comparisons. The RTREE seems to nicely incorporate needed indexing of the PERIOD type. The syntax of the parser will have to be extended to handle the PERIOD literals and constructor. I believe any additional libraries will be required. There are also extensions to the syntax of table creation, table altering, querying, inserting, and updating on temporal tables. These are all discussed in some detail in chapter 12 of [1]. I don't think that any of these changes will require new libraries. The semantics of temporal tables and querying them could have a dramatic affect on how things like primary keys and unique constraints work. I would like to get some comments about this from the community. e) See Also Addtional resources can be found at Dr. Richard T. Snodgrass's website at [3], including SQL valid-time table support spec at [4] and SQL transaction-time table support spec at [5]. Thoughts? Questions? Comments? [1]http://www.cs.arizona.edu/~rts/tdbbook.pdf
Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL
Jim C. Nasby wrote: My suggestion would be to focus on a period data type first and foremost, as that's something that could be readily used by a lot of folks. Of particular note, it's difficult to query tables that have start_time and end_time fields to define a period; it's easy to screw up the boundary conditions, and it's also hard to make those queries perform well without going to extra lengths (such as defining a 'bogus' GiST index on something like box(point(start,start),point(end,end)). And it's not possible to do that in a way that avoids floating points and their errors. FWIW there's already a type called tinterval that stores (start,end). I don't think it's very much documented; maybe it can be extended or used as base for a new, more complete and robust type, indexable in a more natural way, etc etc. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] autovacuum next steps
After staring at my previous notes for autovac scheduling, it has become clear that this basics of it is not really going to work as specified. So here is a more realistic plan: First, we introduce an autovacuum_max_workers parameter, to limit the total amount of workers that can be running at any time. Use this number to create extra PGPROC entries, etc, similar to the way we handle the prepared xacts stuff. The default should be low, say 3 o 4. The launcher sends a worker into a database just like it does currently. This worker determines what tables need vacuuming per the pg_autovacuum settings and pgstat data. If it's more than one table, it puts the number of tables in shared memory and sends a signal to the launcher. The launcher then starts min(autovacuum_max_workers - currently running workers, tables to vacuum - 1) more workers to process that database. Maybe we could have a max-workers parameter per-database in pg_database to use as a limit here as well. Each worker, including the initial one, starts vacuuming tables according to pgstat data. They recheck the pgstat data after finishing each table, so that a table vacuumed by another worker is not processed twice (maybe problematic: a table with high update rate may be vacuumed more than once. Maybe this is a feature not a bug). Once autovacuum_naptime has passed, if the workers have not finished yet, the launcher wants to vacuum another database. At this point, the launcher wants some of the workers processing the first database to exit early as soon as they finish one table, so that they can help vacuuming the other database. It can do this by setting a flag in shmem that the workers can check when finished with a table; if the flag is set, they exit instead of continuing with another table. The launcher then starts a worker in the second database. The launcher does this until the number of workers is even among both databases. This can be done till having one worker per database; so at most autovacuum_max_workers databases can be under automatic vacuuming at any time, one worker each. When there are autovacuum_max_workers databases under vacuum, the launcher doesn't have anything else to do until some worker exits on its own. When there is a single worker processing a database, it does not recheck pgstat data after each table. This is to prevent a high-update-rate table from starving the vacuuming of other databases. How does this sound? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Confusing message on startup after a crash while recovering
Florian G. Pflug [EMAIL PROTECTED] writes: I'd suggest that the text is changed to something along the line of: database system was interrupted while in recovery at ... If this has occurred more than once some data may be corrupted and you may need to restore from the last backup. It seems the real problem is that it's not specifying *which* data is probably corrupted. Maybe: HINT: If recovery fails repeatedly, it probably means that the recovery log data is corrupted; you may have to restore from your last full backup. Also, do we want to suggest use of pg_resetxlog in the message? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL
On Fri, Feb 16, 2007 at 05:39:24PM -0300, Alvaro Herrera wrote: FWIW there's already a type called tinterval that stores (start,end). I don't think it's very much documented; maybe it can be extended or used as base for a new, more complete and robust type, indexable in a more natural way, etc etc. The book I cited has a very complete description of the period data type including details on what extensions to SQL are needed. I am very interested in starting a robust implementation of the period datatype. I think the datetime infrastructure will already do most of the needed parsing and packing of the hard parts of the period datatype (namely the date and time formats). I will investigate the tinterval to see if it meets the needs of the PERIOD datatypes. I agree with focusing on the PERIOD datatype. I think that is a major part of the foundation for temporal extensions and would have to be implemented first. Therefore, I present the following plan for getting there. 1) Focus first on PERIOD(DATE) to keep things as simple as possible. 2) Implement a first cut on the period datatype that only handles storing two dates. (Maybe tinterval will get us here for free?) 3) Add information to the datatype for open or closed interval for beginning and ending sides of the period. I could probably have this done in time for the freeze with some mentoring. I could probably even start implementation of some indices and operator function for the type. This functionality is what I expect to have a shot of making an appearance in 8.3. It will be minimally functional at this point. The next project will be altering the parser to be able to construct and operate on PERIOD types with the syntax extensions to SQL in Dr. Snodgrass's book. Once all of the syntax is implemented for PERIOD(DATE), the next project will be to extend to support PERIOD(DATETIME WITH TIMEZONE). Again, I think the datatime infrastructure will be very useful here. wt ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] autovacuum next steps
Alvaro Herrera wrote: After staring at my previous notes for autovac scheduling, it has become clear that this basics of it is not really going to work as specified. So here is a more realistic plan: [Snip Detailed Description] How does this sound? On first blush, I'm not sure I like this as it doesn't directly attack the table starvation problem, and I think it could be a net loss of speed. VACUUM is I/O bound, as such, just sending multiple vacuum commands at a DB isn't going to make things faster, you are now going to have multiple processes reading from multiple tables at the same time. I think in general this is a bad thing (unless we someday account for I/O made available from multiple tablespaces). In general the only time it's a good idea to have multiple vacuums running at the same time is when a big table is starving a small hot table and causing bloat. I think we can extend the current autovacuum stats to add one more column that specifies is hot or something to that effect. Then when the AV launcher sends a worker to a DB, it will first look for tables marked as hot and work on them. While working on hot tables, the launcher need not send any additional workers to this database, if the launcher notices that a worker is working on regular tables, it can send another worker which will look for hot tables to working, if the worker doesn't find any hot tables that need work, then it exits leaving the original working to continue plodding along. Thoughts? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] autovacuum next steps
In an ideal world I think you want precisely one vacuum process running per tablespace on the assumption that each tablespace represents a distinct physical device. The cases where we currently find ourselves wanting more are where small tables are due for vacuuming more frequently than the time it takes for a large table to receive a single full pass. If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle of vacuums on smaller tables, then resume, that problem would go away. That sounds too difficult though, but perhaps we could do something nearly as good. One option that I've heard before is to have vacuum after a single iteration (ie, after it fills maintenance_work_mem and does the index cleanup and the second heap pass), remember where it was and pick up from that point next time. If instead autovacuum could tell vacuum exactly how long to run for (or calculated how many pages that represented based on cost_delay) then it could calculate when it will next need to schedule another table in the same tablespace and try to arrange for the vacuum of the large table to be done by then. Once there are no smaller more frequently vacuumed small tables due to be scheduled it would start vacuum for the large table again and it would resume from where the first one left off. This only works if the large tables really don't need to be vacuumed so often that autovacuum can't keep up. Our current situation is that there is a size at which this happens. But arranging to have only one vacuum process per tablespace will only make that less likely to happen rather than more. I think the changes to vacuum itself are pretty small to get it to remember where it left off last time and start from mid-table. I'm not sure how easy it would be to get autovacuum to juggle all these variables though. Of course users may not create separate tablespaces for physical devices, or they may set cost_delay so high you really do need more vacuum processes, etc. So you probably still need a num_vacuum_daemons but the recommended setting would be the same as the number of physical devices and autovacuum could try to divide them equally between tablespaces which would amount to the same thing. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] autovacuum next steps
Matthew T. O'Connor wrote: Alvaro Herrera wrote: After staring at my previous notes for autovac scheduling, it has become clear that this basics of it is not really going to work as specified. So here is a more realistic plan: [Snip Detailed Description] How does this sound? On first blush, I'm not sure I like this as it doesn't directly attack the table starvation problem, and I think it could be a net loss of speed. VACUUM is I/O bound, as such, just sending multiple vacuum commands at a DB isn't going to make things faster, you are now going to have multiple processes reading from multiple tables at the same time. I think in general this is a bad thing (unless we someday account for I/O made available from multiple tablespaces). Yeah, I understand that. However, I think that can be remedied by using a reasonable autovacuum_vacuum_cost_delay setting, so that each worker uses less than the total I/O available. The main point of the proposal is to allow multiple workers on a DB while also allowing multiple databases to be processed in parallel. I think we can extend the current autovacuum stats to add one more column that specifies is hot or something to that effect. Then when the AV launcher sends a worker to a DB, it will first look for tables marked as hot and work on them. While working on hot tables, the launcher need not send any additional workers to this database, if the launcher notices that a worker is working on regular tables, it can send another worker which will look for hot tables to working, if the worker doesn't find any hot tables that need work, then it exits leaving the original working to continue plodding along. How would you define what's a hot table? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] autovacuum next steps
Alvaro Herrera wrote: Matthew T. O'Connor wrote: On first blush, I'm not sure I like this as it doesn't directly attack the table starvation problem, and I think it could be a net loss of speed. VACUUM is I/O bound, as such, just sending multiple vacuum commands at a DB isn't going to make things faster, you are now going to have multiple processes reading from multiple tables at the same time. I think in general this is a bad thing (unless we someday account for I/O made available from multiple tablespaces). Yeah, I understand that. However, I think that can be remedied by using a reasonable autovacuum_vacuum_cost_delay setting, so that each worker uses less than the total I/O available. The main point of the proposal is to allow multiple workers on a DB while also allowing multiple databases to be processed in parallel. So you are telling people to choose an autovacuum_delay so high that they need to run multiple autovacuums at once to keep up? I'm probably being to dramatic, but it seems inconsistent. I think we can extend the current autovacuum stats to add one more column that specifies is hot or something to that effect. Then when the AV launcher sends a worker to a DB, it will first look for tables marked as hot and work on them. While working on hot tables, the launcher need not send any additional workers to this database, if the launcher notices that a worker is working on regular tables, it can send another worker which will look for hot tables to working, if the worker doesn't find any hot tables that need work, then it exits leaving the original working to continue plodding along. How would you define what's a hot table? I wasn't clear, I would have the Admin specified it, and we can store it as an additional column in the pg_autovacuum_settings table. Or perhaps if the table is below some size threshold and autovacuum seems that it needs to be vacuumed every time it checks it 10 times in a row or something like that. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] autovacuum next steps
[EMAIL PROTECTED] (Alvaro Herrera) writes: When there is a single worker processing a database, it does not recheck pgstat data after each table. This is to prevent a high-update-rate table from starving the vacuuming of other databases. This case is important; I don't think that having multiple workers fully alleviates the problem condition. Pointedly, you need to have a way of picking up tables often enough to avoid the XID rollover problem. That may simply require that on some periodic basis, a query is run to queue up tables that are getting close to having an XID problem. -- (reverse (concatenate 'string ofni.secnanifxunil @ enworbbc)) http://linuxfinances.info/info/finances.html Rules of the Evil Overlord #189. I will never tell the hero Yes I was the one who did it, but you'll never be able to prove it to that incompetent old fool. Chances are, that incompetent old fool is standing behind the curtain. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] autovacuum next steps
Alvaro Herrera [EMAIL PROTECTED] writes: Each worker, including the initial one, starts vacuuming tables according to pgstat data. They recheck the pgstat data after finishing each table, so that a table vacuumed by another worker is not processed twice (maybe problematic: a table with high update rate may be vacuumed more than once. Maybe this is a feature not a bug). How are you going to make that work without race conditions? ISTM practically guaranteed that all the workers will try to vacuum the same table. Once autovacuum_naptime has passed, if the workers have not finished yet, the launcher wants to vacuum another database. This seems a rather strange design, as it will encourage concentrations of workers in a single database. Wouldn't it be better to spread them out among multiple databases by default? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] n-gram search function
Hi, Is anybody working on implementing n-gram search functionality for text type data? tsearch2 is great for long text but it's not appropreate for short (10-100 bytes) text data. What I want to achieve is a fast partial match search using indexes, i.e. foo ~ 'bar' or foo LIKE '%bar%' type matching. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] 8.3 patches hold queue empty
I have completed processing of the 8.3 patches hold queue. There are some emails where I am waiting on a reply from the authors, but I will now just handle them as part of the normal patch process. I will now return to processing patches as they come in, and deal with the patches that are now waiting. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [pgsql-patches] [PATCHES] [HACKERS] [Fwd: Index Advisor]
I need someone to review this patch to make sure the API used is logical. You can do that by reading the README file in the patch. To me, the procedure seems overly complicated, and too restrictive. The patch is in the patches queue. --- Gurjeet Singh wrote: On 1/20/07, Bruce Momjian [EMAIL PROTECTED] wrote: I can't read a 7z file on my end. Please email me the file and I will put it at a URL. --- Gurjeet Singh wrote: Please find attached the patches ported to HEAD as of now. The patch to the contrib modules is the same as before; the version number has been kept but branch designator has been changed. 1) pg_post_planner_plugin-HEAD_20070116-v2.patch.gz 2) pg_index_adviser-HEAD_20070116-v26.7z I am attaching the .gz versions of both the patches, and CC'ing to -patches also. If it doesn't turn up on -patches even this time, then please do the needful. Thanks and best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | yahoo }.com [ Attachment, skipping... ] [ Attachment, skipping... ] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] integer datetimes
OK, mention removed. We can always re-add it if we find we need to warn people away from integer timestamps again. --- Magnus Hagander wrote: On Wed, Feb 14, 2007 at 12:38:12PM -0500, Andrew Dunstan wrote: Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Our docs for the integer datetime option says: Note also that the integer datetimes code is newer than the floating-point code, and we still find bugs in it from time to time. Is the last sentence about bugs really true anymore? At least the buildfarm seems to have a lot *more* machines with it enabled than without. Buildfarm proves only that the regression tests don't expose any bugs, not that there aren't any. (I'm thinking about making it the defautl for the vc++ build, which is why I came across that) FWIW, there are several Linux distros that build their RPMs that way, so it's not like people aren't using it. But it seems like we find bugs in the datetime/interval stuff all the time, as people trip over different weird edge cases. I think it's disappointing, to say the least, that we treat this code as a sort of second class citizen. BTW, the buildfarm has a majority of machines using it by design - it's in the default set of options in the distributed config file. If we think there are bugs we haven't found, then we need to engage in some sort of analytical effort to isolate them. I don't see any reason in principle why this code should be any more buggy than the float based datetimes, and I see plenty of reason in principle why we should make sure it's right. That was exactly what I thought, which is why I was kinda surprised to see that note in the configure stuff. If we go with that, then we can say that *any* new feature is less tested, no? ;-) //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/installation.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v retrieving revision 1.282 diff -c -c -r1.282 installation.sgml *** doc/src/sgml/installation.sgml 3 Feb 2007 23:01:06 - 1.282 --- doc/src/sgml/installation.sgml 17 Feb 2007 01:24:57 - *** *** 965,973 the full range (see ![%standalone-include[the documentation about datetime datatypes]] ![%standalone-ignore[xref linkend=datatype-datetime]] ! for more information). Note also that the integer datetimes code is ! newer than the floating-point code, and we still find bugs in it from ! time to time. /para /listitem /varlistentry --- 965,971 the full range (see ![%standalone-include[the documentation about datetime datatypes]] ![%standalone-ignore[xref linkend=datatype-datetime]] ! for more information). /para /listitem /varlistentry ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] NULL and plpgsql rows
Added to TODO under PL/pgSQL: o Allow row and record variables to be set to NULL constants, and allow NULL tests on such variables Because a row is not scalar, do not allow assignment from NULL-valued scalars. --- Jim C. Nasby wrote: On Tue, Feb 13, 2007 at 05:55:11PM -0500, Bruce Momjian wrote: Is there a TODO here? --- Jim Nasby wrote: On Oct 2, 2006, at 6:28 PM, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: However, the test right above that means that we'll fail if the user tries something like row_variable := NULL;: The patch you seem to have in mind would allow row_variable := int_variable; to succeed if the int_variable chanced to contain NULL, which is surely not very desirable. Well, that's Tom's objection, though I'm not sure if by 'int_variable' he means 'internal' or 'integer'. Personally, I think it would be useful to just allow setting a row or record variable to NULL as I showed it above; ie: no variables involved. This is something you might want to do to invalidate a row/record variable after taking some action (perhaps deleting a row). You'd also think that you should be able to detect if a record variable is null, as you can with row. So, I suggest: * Allow row and record variables in plpgsql to be set to NULL It's not clear if it's a wise idea to allow this assignment from a variable. It may be better to only allow explicitly setting them, ie: row_variable := NULL; * Allow testing a record variable to see if it's NULL Currently works for row variables, but not record variables -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] autovacuum next steps
Alvaro Herrera wrote: Once autovacuum_naptime... autovacuum_max_workers... How does this sound? The knobs exposed on autovacuum feel kinda tangential to what I think I'd really want to control. IMHO vacuum_mbytes_per_second would be quite a bit more intuitive than cost_delay, naptime, etc. ISTM I can relatively easily estimate and/or spec out how much extra I/O bandwidth I have per device for vacuum; and would pretty much want vacuum to be constantly running on whichever table that needs it the most so long as it can stay under that bandwith limit. Could vacuum have a tunable that says X MBytes/second (perhaps per device) and have it measure how much I/O it's actually doing and try to stay under that limit? For more fine-grained control a cron job could go around setting different MBytes/second limits during peak times vs idle times. If people are concerned about CPU intensive vacuums instead of I/O intensive ones (does anyone experience that? - another tuneable vacuum_percent_of_cpu would be more straightforward than delay_cost, cost_page_hit, etc. But I'd be a bit surprised if cpu intensive vacuums are common. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] add to ToDo, please
Pavel Stehule wrote: Hello please add to ToDo: Holdable cursor support in SPI Added: * Allow holdable cursors in SPI -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Invalid to_date patterns (was: [PATCHES] [GENERAL] ISO week dates)
On 2/17/07, Alvaro Herrera [EMAIL PROTECTED] wrote: Bruce Momjian escribió: Maybe now would be an appropriate time to discuss the open questions in the submitting email: Brendan Jurd wrote: I'd also like to raise the topic of how conversion from text to ISO week dates should be handled, where the user has specified a bogus mixture of fields. Existing code basically ignores these issues; for example, if a user were to call to_date('1998-01-01 2454050', '-MM-DD J') the function returns 2006-01-01, a result of setting the year field from , then overwriting year, month and day with the values from the Julian date in J, then setting the month and day normally from MM and DD. 2006-01-01 is not a valid representation of either of the values the user specified. Now you might say ask a silly question, get a silly answer; the user shouldn't send nonsense arguments to to_date and expect a sensible result. But perhaps the right way to respond to a broken timestamp definition is to throw an error, rather than behave as though everything has gone to plan, and return something which is not correct. The same situation can arise if the user mixes ISO and Gregorian data; how should Postgres deal with something like to_date('2006-250', 'IYYY-DDD')? The current behaviour in my patch is actually to assume that the user meant to say 'IYYY-IDDD', since the 250th Gregorian day of the ISO year 2006 is total gibberish. But perhaps it should be throwing an error message. My thinking is that erroneous patterns should throw an error, and not try to second-guess the user. (IIRC this was being discussed in some other thread not long ago). It seems to me there are basically two different responses to the problem of invalid patterns. One is to reject all patterns which potentially under- or over-constrain the date value, and the other is to only reject those patterns which, when applied to the given date string, actually cause a conflict. For example, on the surface the pattern '-MM-DD J' would appear to be invalid, because it specifies the date using both the Gregorian and Julian conventions. You could argue that the whole idea of using a pattern like this is bogus, and reject the pattern as soon as it is parsed. On the other hand, if a user called to_date('2007-02-17 2454149', '-MM-DD J'), and you attempted to resolve the pattern you would find that the Julian date and the Gregorian date agree perfectly with each other, and there is no reason to reject the conversion. My gut reaction at first was to go with the former approach. It's programmatically more simple, and it's easier to explain in documentation/error messages. But then it occurred to me that one of the use cases for to_date is slurping date information out of textual reports which may contain redundant date information. If a user wanted to parse something like 2007-02-17 Q1, he would probably try '-MM-DD QQ', even though this pattern is logically over-constraining. Would it be fair to throw an error in such a case? Please let me know what you think. BJ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL
On Fri, 16 Feb 2007, Alvaro Herrera wrote: Jim C. Nasby wrote: My suggestion would be to focus on a period data type first and foremost, as that's something that could be readily used by a lot of folks. Of particular note, it's difficult to query tables that have start_time and end_time fields to define a period; it's easy to screw up the boundary conditions, and it's also hard to make those queries perform well without going to extra lengths (such as defining a 'bogus' GiST index on something like box(point(start,start),point(end,end)). And it's not possible to do that in a way that avoids floating points and their errors. FWIW there's already a type called tinterval that stores (start,end). I don't think it's very much documented; maybe it can be extended or used as base for a new, more complete and robust type, indexable in a more natural way, etc etc. RI-Tree (Relational intervar tree) http://www.dbs.informatik.uni-muenchen.de/Forschung/CAD/presentations/RI-Tree.pdf looks promising for that purposes. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] n-gram search function
3-gram is implemented as a contrib/pg_trgm. It currently uses GiST index, but may be enhanced with the GiN. Oleg On Sat, 17 Feb 2007, Tatsuo Ishii wrote: Hi, Is anybody working on implementing n-gram search functionality for text type data? tsearch2 is great for long text but it's not appropreate for short (10-100 bytes) text data. What I want to achieve is a fast partial match search using indexes, i.e. foo ~ 'bar' or foo LIKE '%bar%' type matching. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings