Re: [HACKERS] Hint Bits and Write I/O
On Tue, 2008-05-27 at 19:32 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: My proposal is to have this as a two-stage process. When we set the hint on a tuple in a clean buffer we mark it BM_DIRTY_HINTONLY, if not already dirty. If we set a hint on a buffer that is BM_DIRTY_HINTONLY then we mark it BM_DIRTY. I wonder if it is worth actually counting the number of newly set hint bits, rather than just having a counter that saturates at two. We could steal a byte from usage_count without making the buffer headers bigger. That's the right place to start. We can instrument the backend like that and then get some data about what actually happens. The other stuff is probably me just explaining it badly, so lets leave it for now. You're right, it was too complex for first cut. If the bgwriter has time, it will write out BM_DIRTY_HINTONLY buffers, though on a consistently busy server this should not occur. What do you mean by if it has time? How would it know that? This won't change the behaviour of first-read-after-copy. To improve that behaviour, I suggest that we only move from BM_DIRTY_HINTONLY to BM_DIRTY when we are setting the hint for a new xid. If we are just setting the same xid over-and-over again then we should avoid setting the page dirty. So when data has been loaded via COPY, we will just check the status of the xid once, then scan the whole page using the single-item transaction cache. This doesn't make any sense to me. What is a new xid? And what is setting the same xid over and over? If a page is full of occurrences of the same xid, that doesn't really mean that it's less useful to correctly hint each occurrence. The whole proposal seems a bit overly complicated. What we talked about at PGCon was simply not setting the dirtybit when setting a hint bit. There's a certain amount of self-optimization there: if a page continually receives hint bit updates, that also means it is getting pinned and hence its usage_count stays high, thus it will tend to stay in shared buffers until something happens to make it really dirty. (Although that argument might not hold water for a bulk seqscan: you'll have hinted all the tuples and then very possibly throw the page away immediately. So counting the hints and eventually deciding we did enough to justify dirtying the page might be worth doing.) Yes, we probably need to do something different for bulk seqscans. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove redundant extra_desc info for enum GUC variables?
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: One point of interest is that for client_min_messages and log_min_messages, the ordering of the values has significance, and it's different for the two cases. Is there any actual reason why they're supposed to be treated differently? Yeah: LOG level sorts differently in the two cases; it's fairly high priority for server log output and much lower for client output. Ok, easy fix if we break them apart. Should we continue to accept values that we're not going to care about, or should I change that at the same time? (for example, client_min_messages doesn't use INFO, but we do accept that in = 8.3 anyway) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add dblink function to check if a named connection exists
I have locked down access to all dblink_* functions, so that only certain privileged users have access to them, and instead provide a set of SRF functions defined as security definer functions, where I connect to the remote server, fetch some data, disconnect from remote server, and return the data. One obvious disadvantage of this approach, is that I need to connect and disconnect in every function. A possible solution to this, would be having a function f.ex dblink_exists('connection_name') that returns true/false depending on whether the connection already exists. This way, I could just check if a named connection exists, and establish a connection if not, and wait until the end of the session to disconnect all established connections. I've attached a patch with a suggested implementation of such a function. -- Tommy Gildseth Index: dblink.c === RCS file: /projects/cvsroot/pgsql/contrib/dblink/dblink.c,v retrieving revision 1.73 diff -c -c -r1.73 dblink.c *** dblink.c 4 Apr 2008 17:02:56 - 1.73 --- dblink.c 28 May 2008 08:06:23 - *** *** 192,208 freeconn = true; \ } \ } while (0) - #define DBLINK_GET_NAMED_CONN \ do { \ ! char *conname = text_to_cstring(PG_GETARG_TEXT_PP(0)); \ rconn = getConnectionByName(conname); \ if(rconn) \ conn = rconn-conn; \ - else \ - DBLINK_CONN_NOT_AVAIL; \ } while (0) #define DBLINK_INIT \ do { \ if (!pconn) \ --- 192,214 freeconn = true; \ } \ } while (0) #define DBLINK_GET_NAMED_CONN \ do { \ ! char *conname = NULL; \ ! DBLINK_GET_NAMED_CONN_IF_EXISTS; \ ! if(!rconn) \ ! DBLINK_CONN_NOT_AVAIL; \ ! } while (0) ! ! #define DBLINK_GET_NAMED_CONN_IF_EXISTS \ ! do { \ ! conname = text_to_cstring(PG_GETARG_TEXT_PP(0)); \ rconn = getConnectionByName(conname); \ if(rconn) \ conn = rconn-conn; \ } while (0) + #define DBLINK_INIT \ do { \ if (!pconn) \ *** *** 1056,1061 --- 1062,1090 PG_RETURN_INT32(PQisBusy(conn)); } + + /* + * Checks if a given named remote connection exists + * + * Returns 1 if the connection is busy, 0 otherwise + * Params: + * text connection_name - name of the connection to check + * + */ + PG_FUNCTION_INFO_V1(dblink_exists); + Datum + dblink_exists(PG_FUNCTION_ARGS) + { + PGconn *conn = NULL; + remoteConn *rconn = NULL; + char *conname = NULL; + + DBLINK_INIT; + DBLINK_GET_NAMED_CONN_IF_EXISTS; + + PG_RETURN_BOOL(conn != NULL); + } + /* * Cancels a running request on a connection * Index: dblink.h === RCS file: /projects/cvsroot/pgsql/contrib/dblink/dblink.h,v retrieving revision 1.20 diff -c -c -r1.20 dblink.h *** dblink.h 4 Apr 2008 16:57:21 - 1.20 --- dblink.h 28 May 2008 08:06:23 - *** *** 49,54 --- 49,55 extern Datum dblink_get_result(PG_FUNCTION_ARGS); extern Datum dblink_get_connections(PG_FUNCTION_ARGS); extern Datum dblink_is_busy(PG_FUNCTION_ARGS); + extern Datum dblink_exists(PG_FUNCTION_ARGS); extern Datum dblink_cancel_query(PG_FUNCTION_ARGS); extern Datum dblink_error_message(PG_FUNCTION_ARGS); extern Datum dblink_exec(PG_FUNCTION_ARGS); Index: dblink.sql.in === RCS file: /projects/cvsroot/pgsql/contrib/dblink/dblink.sql.in,v retrieving revision 1.17 diff -c -c -r1.17 dblink.sql.in *** dblink.sql.in 5 Apr 2008 02:44:42 - 1.17 --- dblink.sql.in 28 May 2008 08:06:23 - *** *** 178,183 --- 178,188 AS 'MODULE_PATHNAME', 'dblink_is_busy' LANGUAGE C STRICT; + CREATE OR REPLACE FUNCTION dblink_exists(text) + RETURNS boolean + AS 'MODULE_PATHNAME', 'dblink_exists' + LANGUAGE C STRICT; + CREATE OR REPLACE FUNCTION dblink_get_result(text) RETURNS SETOF record AS 'MODULE_PATHNAME', 'dblink_get_result' Index: expected/dblink.out === RCS file: /projects/cvsroot/pgsql/contrib/dblink/expected/dblink.out,v retrieving revision 1.23 diff -c -c -r1.23 dblink.out *** expected/dblink.out 6 Apr 2008 16:54:48 - 1.23 --- expected/dblink.out 28 May 2008 08:06:23 - *** *** 731,736 --- 731,748 0 (1 row) + SELECT dblink_exists('dtest1'); + dblink_exists + --- + t + (1 row) + + SELECT dblink_exists('doesnotexist'); + dblink_exists + --- + f + (1 row) + SELECT dblink_disconnect('dtest1'); dblink_disconnect --- Index: sql/dblink.sql === RCS file: /projects/cvsroot/pgsql/contrib/dblink/sql/dblink.sql,v retrieving revision 1.20 diff -c -c -r1.20 dblink.sql *** sql/dblink.sql 6 Apr 2008 16:54:48 - 1.20 --- sql/dblink.sql 28 May 2008 08:06:23
Re: [HACKERS] Hiding undocumented enum values?
Alex Hunsaker wrote: On Tue, May 27, 2008 at 12:05 PM, Magnus Hagander [EMAIL PROTECTED] wrote: Alex Hunsaker wrote: On Tue, May 27, 2008 at 10:20 AM, Tom Lane [EMAIL PROTECTED] wrote: I am wondering if it's a good idea to hide the redundant entries to reduce clutter in the pg_settings display. (We could do this by adding a hidden boolean to struct config_enum_entry.) Thoughts? +1 regards, tom lane Maybe something like the attached patch? Oops, missed that there was a patch posted already. Looks like the way to do it (except I'd move the comment :-P) if that's the way we go. OK, the updated patch is on pg_patches under guc config_enum_entry add hidden field Thanks, I've reviewed and applied. -moved the comment into config_enum_get_options() I moved it again, to the header :-) -fixed a possible buffer underrun if every option was hidden That fix didn't take into account the possibility of having different prefixes. Since it is a pretty stupid thing to have a GUC enum with *only* hidden entries, I just made it do nothing in this case and updated the comment. The buffer underrun check is still there. I looked into just making it a string so we could use parse_bool... because backslash_quote seems to be the exception not the rule. But I decided having a hidden flag seems more useful anyway... It used to be a string. We don't want that, because then we can't tell the client which possible values are available. That's the whole reason for the creation of the enum type gucs... Well its good i did not go that route then :) Yup :) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hint Bits and Write I/O
Tom Lane [EMAIL PROTECTED] writes: (Although that argument might not hold water for a bulk seqscan: you'll have hinted all the tuples and then very possibly throw the page away immediately. That seems like precisely the case where we don't want to dirty the buffer. So counting the hints and eventually deciding we did enough to justify dirtying the page might be worth doing.) What if we counted how many hint bits were *not* set? I feel like the goal should be to dirty the buffer precisely once when all the bits can be set. The problem case is when we dirty the page but still have some hint bits to be set on a subsequent iteration. Of course that doesn't deal with the case where tuples are being touched continuously. Perhaps the idea should be to treat the page as dirty every n hint bit settings where n is the number of tuples on the page. or highest number of unset hint bits seen on the page. or something like that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Avoiding second heap scan in VACUUM
Tom brought this up during the PGCon developer meet. After thinking a bit about it, I think it's actually possible to avoid the second heap scan, especially now that we've HOT. If we can remove the second pass, not only would that speed up vacuum, but also reduce lots of redundant read and write IO. Currently second heap scan is required to remove the dead tuples from the heap. We can not do this in the first scan because we haven't yet removed the index pointers pointing to them. HOT now prunes and defrags the pages in the first phase itself and what is left behind is just a bunch of DEAD line pointers. The line pointers are marked UNUSED in the second heap scan. Since we don't repair any line pointer bloat, no additional free space is created in the second pass. So frankly there is not much left to be done in the second phase. Of course we also update the FSM information at the end of second pass. If we want to remove the second pass, what we need is a mechanism to reclaim the DEAD line pointers. But to this correctly, we must ensure that the DEAD line pointers are reclaimed only and only after the index entries pointing to them are removed. Tom's idea was to store the vacuum-xid in the tuple header and check that xid to see if the vacuum successfully removed the index pointers or not. Heikki had some brilliant idea to store the xid in the line pointer itself. These ideas are good, but would require xid wraparound handling. I am thinking of a solution on the following lines to handle DEAD line pointers. Other ideas are welcome too. 1. Before VACUUM starts, it updates the pg_class row of the target table, noting that VACUUM_IN_PROGRESS for the target table. 2. It then waits for all the existing transactions to finish to make sure that everyone can see the change in the pg_class row, 3. It then scans the heap, prunes and defrags the pages. The normal pruning would reclaim all the dead tuples and mark their line pointers as DEAD. Since VACUUM is going to remove the index pointers pointing to these DEAD line pointers, it now marks these DEAD line pointers with additional flag, say DEAD_RECLAIMED. 4. At the end of first scan, VACUUM updates FSM information for heap pages. 5. It then proceeds with the index scan and removes index pointers pointing to the DEAD line pointers collected in the heap scan. 6. Finally, it again updates the pg_class row and clears the VACUUM_IN_PROGRESS flag. Any other backend, when invokes page pruning, would check if the VACUUM is in progress by looking at the VACUUM_IN_PROGRESS flag. Note that if the previous vacuum had failed or database crashed before vacuum completed, the VACUUM_IN_PROGRESS flag would remain set until the next vacuum successfully completes on the table and resets the flag (VACUUM_NOT_IN_PROGRESS state). Since vacuum waits for the existing transactions to finish before marking any DEAD line pointers DEAD_RECLAIMED, for a backend which sees VACUUM_NOT_IN_PROGRESS, any DEAD_RECLAIMED line pointer it finds must be left over from the previously successfully completed vacuum. Since the previous vacuum must have removed the index pointers pointing to it, the backend can now safely reclaim the line pointer itself. The backend can potentially do this any time it sees a DEAD_RECLAIMED line pointer, but we may restrict this only during the pruning activity to keep things simple. This operation need not be WAL logged if we appropriately handle DEAD_RECLAIMED line pointer during redo recovery (if it's reused for some other insert/update activity). I think this scheme guarantees that a backend would always see VACUUM_IN_PROGRESS if vacuum is currently in progress on the table or the last vacuum has failed. There might be situations when a backend sees VACUUM_IN_PROGRESS when if fact there is no vacuum is progress and the last vacuum finished successfully, but that won't have any correctness implication, but would only delay reclaiming DEAD_RECLAIMED line pointers. Comments ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hint Bits and Write I/O
On Wed, 2008-05-28 at 06:08 -0400, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: (Although that argument might not hold water for a bulk seqscan: you'll have hinted all the tuples and then very possibly throw the page away immediately. That seems like precisely the case where we don't want to dirty the buffer. (1) So counting the hints and eventually deciding we did enough to justify dirtying the page might be worth doing.) What if we counted how many hint bits were *not* set? I feel like the goal should be to dirty the buffer precisely once when all the bits can be set. (2) Agreed. I think the difficulty is that (1) and (2) are contradictory goals, and since those conditions frequently occur together, cause conflict. When we fully scan a buffer this will result in 1 or more actual clog lookups, L. L is often less than the number of tuples on the page because of the single-item xid cache. If L = 1 then there is a high probability that when we do a seq scan the clog blocks will be cached also, so although we do a 1 clog lookup per table block we would seldom do clog I/O during a SeqScan. So what I tried to say in a previous post was that if L 1 then we should dirty the buffer because the single-item cache becomes less-effective and we may need to access other clog blocks, that may result in clog I/O. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_regress: referencing shared objects from tests
Hi, we would like to be able to use and ship pg_regress and the PostgreSQL test suite independently of the PostgreSQL build environment, for testing and maybe even as a separate package to be build and shipped with the OS for others to test their setup. Does this sound like a sane and OK thing to do? I have a problem with one of the tests (create_function_1.source): -8888--- 20 CREATE FUNCTION int44out(city_budget) 21RETURNS cstring 22AS '@abs_builddir@/[EMAIL PROTECTED]@' 23LANGUAGE C STRICT; 24 25 CREATE FUNCTION check_primary_key () 26 RETURNS trigger 27 AS '@abs_builddir@/../../../contrib/spi/[EMAIL PROTECTED]@' 28 LANGUAGE C; ... 35 CREATE FUNCTION autoinc () 36 RETURNS trigger 37 AS '@abs_builddir@/../../../contrib/spi/[EMAIL PROTECTED]@' 38 LANGUAGE C; -8888--- (The ../../../contrib/spi-path does not exist outside of the build environment, so to be able to run the test you need to have source code, compilers, ...) I could work around this problem by copying the needed shared objects to @abs_builddir@ as part of make or make check, I could add a “–look-for-shared-objects-here” parameter to pg_regress, and you probably have other suggestions. Is this something we want to fix, and what would be the right way to do it? (I have no problem providing a patch.) -Jørgen -- Jørgen Austvik, Software Engineering - QA Sun Microsystems Database Group http://blogs.sun.com/austvik, http://www.autvik.net/ Sun Microsystems AS Haakon VII gt. 7b N-7485 Trondheim, Norway begin:vcard fn;quoted-printable:J=C3=B8rgen Austvik n;quoted-printable:Austvik;J=C3=B8rgen org:Sun Microsystems;Database Technology Group adr:;;Haakon VII gt. 7b;Trondheim;;NO-7485;Norway email;internet:[EMAIL PROTECTED] title:Senior Engineer tel;work:+47 73 84 21 10 tel;fax:+47 73 84 21 01 tel;cell:+47 901 97 886 x-mozilla-html:FALSE url:http://www.sun.com/ version:2.1 end:vcard -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [JDBC] How embarrassing: optimization of a one-shot query doesn't work
On 23-May-08, at 9:20 AM, Tom Lane wrote: Dave Cramer [EMAIL PROTECTED] writes: Any word on 8.3.2 ? Obviously, nothing is happening during PGCon ;-) There was some discussion a week or so back about scheduling a set of releases in early June, but it's not formally decided. Now that PGCon is over has there been any more discussion ? Dave -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_regress: dbname in PostgreSQL test suite
Hi. pg_regress has a --dbname option (which actually take a list of database names): --dbname=DB use database DB (default \regression\) ... but the PostgreSQL regression test suite does not really support this: [EMAIL PROTECTED]:regress] ggrep -R regression sql/* | grep -v regression_ | grep -v :-- sql/prepare.sql:EXECUTE q2('regression'); sql/privileges.sql:\c regression sql/temp.sql:\c regression I suggest we replace @dbname@ with the first element in the dblist linked list in convert_sourcefiles_in(). What do you think? (I can provide a patch if you think it is an acceptable solution.) -J -- Jørgen Austvik, Software Engineering - QA Sun Microsystems Database Group http://blogs.sun.com/austvik/ http://www.austvik.net/ begin:vcard fn;quoted-printable:J=C3=B8rgen Austvik n;quoted-printable:Austvik;J=C3=B8rgen org:Sun Microsystems;Database Group adr:;;Haakon VII gt. 7b;Trondheim;;NO-7485;Norway email;internet:[EMAIL PROTECTED] title:Senior Engineer tel;work:+47 73 84 21 10 tel;fax:+47 73 84 21 01 tel;cell:+47 901 97 886 note:http://www.austvik.net/ x-mozilla-html:FALSE url:http://blogs.sun.com/austvik/ version:2.1 end:vcard -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_regress: referencing shared objects from tests
Jorgen Austvik - Sun Norway [EMAIL PROTECTED] writes: we would like to be able to use and ship pg_regress and the PostgreSQL test suite independently of the PostgreSQL build environment, for testing and maybe even as a separate package to be build and shipped with the OS for others to test their setup. Does this sound like a sane and OK thing to do? The RPM packages have done this since approximately forever. You might want to look at the patches used there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_regress: dbname in PostgreSQL test suite
Jorgen Austvik - Sun Norway [EMAIL PROTECTED] writes: pg_regress has a --dbname option (which actually take a list of database names): --dbname=DB use database DB (default \regression\) ... but the PostgreSQL regression test suite does not really support this: That option is intended for running other sets of regression tests (eg, the contrib ones are customarily run in contrib_regression). I see zero value in trying to make the standard tests run under some other database name. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_regress: dbname in PostgreSQL test suite
Jorgen Austvik - Sun Norway wrote: Hi. pg_regress has a --dbname option (which actually take a list of database names): --dbname=DB use database DB (default \regression\) ... but the PostgreSQL regression test suite does not really support this: [EMAIL PROTECTED]:regress] ggrep -R regression sql/* | grep -v regression_ | grep -v :-- sql/prepare.sql:EXECUTE q2('regression'); sql/privileges.sql:\c regression sql/temp.sql:\c regression I suggest we replace @dbname@ with the first element in the dblist linked list in convert_sourcefiles_in(). What do you think? (I can provide a patch if you think it is an acceptable solution.) We have more than one set of regression tests. This feature is used by the PL regression tests and the contrib regression tests to run using a different database name. I'm not quite sure why it's a list. cheers andrew cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Upcoming back-branch update releases
Yup, we're overdue for that, so: After some discussion among core and the packagers list, we have tentatively set June 9 as the release date for minor updates of all supported PG release branches (back to 7.4). As has been the recent practice, code freeze will occur the preceding Thursday, June 5. If you've got any bug fixes you've been working on, now is a good time to get them finished up and sent in... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [JDBC] How embarrassing: optimization of a one-shot query doesn't work
Dave Cramer [EMAIL PROTECTED] writes: On 23-May-08, at 9:20 AM, Tom Lane wrote: There was some discussion a week or so back about scheduling a set of releases in early June, but it's not formally decided. Now that PGCon is over has there been any more discussion ? Yeah, I just posted an announcement about it on -hackers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add dblink function to check if a named connection exists
Tommy Gildseth [EMAIL PROTECTED] writes: One obvious disadvantage of this approach, is that I need to connect and disconnect in every function. A possible solution to this, would be having a function f.ex dblink_exists('connection_name') that returns true/false depending on whether the connection already exists. Can't you do this already? SELECT 'myconn' = ANY (dblink_get_connections()); A dedicated function might be a tad faster, but it probably isn't going to matter compared to the overhead of sending a remote query. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove redundant extra_desc info for enum GUC variables?
Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Yeah: LOG level sorts differently in the two cases; it's fairly high priority for server log output and much lower for client output. Ok, easy fix if we break them apart. Should we continue to accept values that we're not going to care about, or should I change that at the same time? (for example, client_min_messages doesn't use INFO, but we do accept that in = 8.3 anyway) I'd be inclined to keep the actual behavior the same as it was. We didn't document INFO for this variable, perhaps, but it's accepted and has a well-defined behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
Pavan Deolasee [EMAIL PROTECTED] writes: 1. Before VACUUM starts, it updates the pg_class row of the target table, noting that VACUUM_IN_PROGRESS for the target table. If I understand correctly nobody would be able to re-use any line-pointers when a vacuum is in progress? I find that a bit scary since for large tables you may actually always be running a vacuum. Perhaps the DSM will fix that but for heavily updated tables I think you might still be pretty much continuously running vacuum. On the other hand it would just result in line pointer bloat. And I think VACUUM could still safely remove old dead line pointers if it noted that the table had a clean vacuum status when it started. 2. It then waits for all the existing transactions to finish to make sure that everyone can see the change in the pg_class row, I'm a bit scared of how many waits for all transactions to finish we're accumulating. It seemed safe enough when we had only one but I'm not sure what the consequences for this action are when there are several of them. Are we perhaps creating deadlocks? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #4204: COPY to table with FK has memory leak
[moving to -hackers] Tom Lane [EMAIL PROTECTED] writes: Tomasz Rybak [EMAIL PROTECTED] writes: I tried to use COPY to import 27M rows to table: CREATE TABLE sputnik.ccc24 ( station CHARACTER(4) NOT NULL REFERENCES sputnik.station24 (id), moment INTEGER NOT NULL, flags INTEGER NOT NULL ) INHERITS (sputnik.sputnik); COPY sputnik.ccc24(id, moment, station, strength, sequence, flags) FROM '/tmp/24c3' WITH DELIMITER AS ' '; This is expected to take lots of memory because each row-requiring-check generates an entry in the pending trigger event list. Even if you had not exhausted memory, the actual execution of the retail checks would have taken an unreasonable amount of time. The recommended way to do this sort of thing is to add the REFERENCES constraint *after* you load all the data; that'll be a lot faster in most cases because the checks are done in bulk using a JOIN rather than one-at-a-time. Hm, it occurs to me that we could still do a join against the pending event trigger list... I wonder how feasible it would be to store the pending trigger event list in a temporary table instead of in ram. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Packages in oracle Style
2008/5/27 Zdenek Kotala [EMAIL PROTECTED]: Coutinho napsal(a): this is listed on TODO: http://www.postgresql.org/docs/faqs.TODO.html Add features of Oracle-style packages (Pavel) My last idea was only global variables for plpgsql. It needs hack of plpgsql :(. But it's can be simple work. Pavel I see. Sorry I overlooked it. I think Pavel Stehule will help you. He has idea how to do it. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #4204: COPY to table with FK has memory leak
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: This is expected to take lots of memory because each row-requiring-check generates an entry in the pending trigger event list. Hm, it occurs to me that we could still do a join against the pending event trigger list... I wonder how feasible it would be to store the pending trigger event list in a temporary table instead of in ram. We could make that list spill to disk, but the problem remains that verifying the rows one at a time will take forever. The idea that's been kicked around occasionally is that once you get past N pending events, throw them all away and instead queue a single operation to do a bulk verify (just like initial establishment of the FK constraint). I'm not sure how to do the queue management for this though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Wed, 2008-05-28 at 16:56 +0530, Pavan Deolasee wrote: 2. It then waits for all the existing transactions to finish to make sure that everyone can see the change in the pg_class row I'm not happy that the VACUUM waits. It might wait a very long time and cause worse overall performance than the impact of the second scan. Happily, I think we already have a solution to this overall problem elsewhere in the code. When we VACUUM away all the index entries on a page we don't yet remove it. We only add it to the FSM on the second pass of that page on the *next* VACUUM. So the idea is to have one pass per VACUUM, but make that one pass do the first pass of *this* VACUUM and the second pass of the *last* VACUUM. We mark the xid of the VACUUM in pg_class as you suggest, but we do it after VACUUM has completed the pass. In single pass we mark DEAD line pointers as RECENTLY_DEAD. If the last VACUUM xid is old enough we mark RECENTLY_DEAD as UNUSED, as well, during this first pass. If last xid is not old enough we do second pass to remove them. That has the effect that large tables that are infrequently VACUUMed will need only a single scan. Smaller tables that require almost continual VACUUMing will probably do two scans, but who cares? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #4204: COPY to table with FK has memory leak
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: This is expected to take lots of memory because each row-requiring-check generates an entry in the pending trigger event list. Hm, it occurs to me that we could still do a join against the pending event trigger list... I wonder how feasible it would be to store the pending trigger event list in a temporary table instead of in ram. We could make that list spill to disk, but the problem remains that verifying the rows one at a time will take forever. Well I was thinking if we did a join between a temporary table and the fk target then it wouldn't have to be a one-by-one operation. It could be a merge join if the planner thought that was better. How to get accurate stats into the planner at that point would be a missing detail though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
Simon Riggs [EMAIL PROTECTED] writes: So the idea is to have one pass per VACUUM, but make that one pass do the first pass of *this* VACUUM and the second pass of the *last* VACUUM. I think that's exactly the same as the original suggestion of having HOT pruning do the second pass of the last vacuum. The trick is to know whether the last vacuum committed or not. If it didn't commit then it's not safe to remove those line pointers yet. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Wed, 2008-05-28 at 16:55 -0400, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: So the idea is to have one pass per VACUUM, but make that one pass do the first pass of *this* VACUUM and the second pass of the *last* VACUUM. I think that's exactly the same as the original suggestion of having HOT pruning do the second pass of the last vacuum. The trick is to know whether the last vacuum committed or not. If it didn't commit then it's not safe to remove those line pointers yet. Perhaps, though I'm not suggesting storing extra xids on-block. I think if we have to wait for a VACUUM to run before marking the line pointers then we may as well wait for two. Having something wait for a VACUUM and then removed it by HOT afterwards gives you the worst of both worlds: long wait for a VACUUM then more overhead and extra code during HOT pruning. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #4204: COPY to table with FK has memory leak
On Wed, 2008-05-28 at 16:28 -0400, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: This is expected to take lots of memory because each row-requiring-check generates an entry in the pending trigger event list. Hm, it occurs to me that we could still do a join against the pending event trigger list... I wonder how feasible it would be to store the pending trigger event list in a temporary table instead of in ram. We could make that list spill to disk, but the problem remains that verifying the rows one at a time will take forever. The idea that's been kicked around occasionally is that once you get past N pending events, throw them all away and instead queue a single operation to do a bulk verify (just like initial establishment of the FK constraint). I'm not sure how to do the queue management for this though. Neither of those approaches is really suitable. Just spilling to disk is O(N) of the number of rows loaded, the second one is O(N) at least on the number of rows (loaded + existing). The second one doesn't help either since if the table was empty you'd have added the FK afterwards, so we must assume there is already rows in there and in most cases rows already loaded will exceed those being added by the bulk operation. AFAICS we must aggregate the trigger checks. We would need a special property of triggers that allowed them to be aggregated when two similar checks arrived. We can then use hash aggregation to accumulate them. We might conceivably need to spill to disk also, since the aggregation may not always be effective. But in most cases the tables against which FK checks are made are significantly smaller than the tables being loaded. Once we have hash aggregated them, that is then the first part of a hash join to the target table. We certainly need a TODO item for improve RI checks during bulk operations. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #4204: COPY to table with FK has memory leak
Simon Riggs [EMAIL PROTECTED] writes: AFAICS we must aggregate the trigger checks. We would need a special property of triggers that allowed them to be aggregated when two similar checks arrived. We can then use hash aggregation to accumulate them. We might conceivably need to spill to disk also, since the aggregation may not always be effective. But in most cases the tables against which FK checks are made are significantly smaller than the tables being loaded. Once we have hash aggregated them, that is then the first part of a hash join to the target table. Well we can't aggregate them as they're created because later modifications could delete or update the original records. The SQL spec requires that FK checks be effective at the end of the command. I admit off the top of my head I can't actually come up with any situations which would be covered by the spec. All the instances I can think of involve either Postgres's UPDATE FROM or plpgsql functions or some other postgres specific functionality. But I do seem to recall there were some situations where it mattered. But we could aggregate them when it comes time to actually check them. Or we could hash the FK keys and scan the event list. Or we could sort the two and merge join them We certainly need a TODO item for improve RI checks during bulk operations. I have a feeling it's already there. Hm. There's a whole section on RI triggers but the closest I see is this, neither of the links appear to refer to bulk operations: Optimize referential integrity checks http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php http://archives.postgresql.org/pgsql-hackers/2007-04/msg00744.php -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #4204: COPY to table with FK has memory leak
Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: We certainly need a TODO item for improve RI checks during bulk operations. I have a feeling it's already there. Hm. There's a whole section on RI triggers but the closest I see is this, neither of the links appear to refer to bulk operations: Optimize referential integrity checks http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php http://archives.postgresql.org/pgsql-hackers/2007-04/msg00744.php No, both of those are talking about the same thing, ie, (1) making the are-the-keys-unchanged optimization work when NULLs are present, and (2) not testing for this case twice. There's an entry in the Triggers section * Add deferred trigger queue file Right now all deferred trigger information is stored in backend memory. This could exhaust memory for very large trigger queues. This item involves dumping large queues into files. but as already noted, this is a pretty myopic answer (at least for RI triggers). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PERFORM] Memory question on win32 systems
Hello, in my particular case I need to configure Postgres to handle only a few concurrent connections, but I need it to be blazingly fast, so I need it to cache everything possible. I've changed the config file and multiplied all memory-related values by 10, still Postgres uses only less than 50 Mb of my RAM. I have 4 Gigs of RAM, how do I force Postgres to use a higher part of such memory in order to cache more indexes, queries and so on? Thanks!
Re: [HACKERS] Hint Bits and Write I/O
Simon Riggs wrote: Hmm, I think the question is: How many hint bits need to be set before we mark the buffer dirty? (N) Should it be 1, as it is now? Should it be never? Never is a long time. As N increases, clog accesses increase. So it would seem there is likely to be an optimal value for N. After further thought, I begin to think that the number of times we set a dirty hint-bit shouldn't influence the decision of whether to dirty the page too much. Instead, we should look at the *age* of the last xid which modified the tuple. The idea is that the clog pages showing the status of young xids are far more likely to be cached that the pages for older xids. This makes a lost hint-bit update much cheaper for young than for old xids, because we probably won't waste any IO if we have to set the hint-bit again later, because the buffer was evicted from shared_buffers before being written out. Additionally, I think we should put some randomness into the decision, to spread the IO caused by hit-bit updates after a batch load. All in all, I envision a formula like chance_of_dirtying = min(1, alpha *floor((next_xid - last_modifying_xid)/clog_page_size) /clog_buffers ) This means that a hint-bit update never triggers dirtying if the last modifying xid belongs to the same clog page as the next unused xid - which sounds good, since that clog page gets touched on every commit and abort, and therefore is cached nearly for sure. For xids on older pages, the chance of dirtying grows (more aggresivly for larger alpha values). For alpha = 1, a hint-bit update dirties a buffer for sure only if the xid is older than clog_page_size*clog_buffers. regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] intercepting WAL writes
Hello, I'm new to the core PostgreSQL code, so pardon the question if the answer is really obvious, and I'm just missing it, but I've got a relatively large web application that uses PostgreSQL as a back-end database, and we're heavily using memcached to cache frequently accessed data. I'm looking at modifying PostgreSQL (in some way) to push changes directly to our memcache servers, in hopes of moving towards a system where only writes are actually sent to the databases, and reads are exclusively sent to the memcache servers. I'm guessing that I could intercept the WAL writes, and use this information to push out to my memcache servers, similar to a replication model. Can somebody point to the most logical place in the code to intercept the WAL writes? (just a rough direction would be enough)- or if this doesn't make sense at all, another suggestion on where to get the data? (I'm trying to avoid doing it using triggers). Thanks, Mike
Re: [HACKERS] intercepting WAL writes
On Wed, May 28, 2008 at 7:11 PM, Mike [EMAIL PROTECTED] wrote: Can somebody point to the most logical place in the code to intercept the WAL writes? (just a rough direction would be enough) XLogInsert or if this doesn't make sense at all, another suggestion on where to get the data? (I'm trying to avoid doing it using triggers). Without triggers, you don't have many options. With triggers, you could use pg_memcache. If you take it from the WAL, you'll have to do a bit of decoding to make it usable in the context you're looking for, which is quite a bit of work. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Memory question on win32 systems
On Wed, May 28, 2008 at 7:05 PM, Sabbiolina [EMAIL PROTECTED] wrote: Hello, in my particular case I need to configure Postgres to handle only a few concurrent connections, but I need it to be blazingly fast, so I need it to cache everything possible. I've changed the config file and multiplied all memory-related values by 10, still Postgres uses only less than 50 Mb of my RAM. How are you measuring this? I have 4 Gigs of RAM, how do I force Postgres to use a higher part of such memory in order to cache more indexes, queries and so on? Post the settings values you're using and people will be better able to help you. -Doug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] intercepting WAL writes
On Wed, 2008-05-28 at 19:11 -0400, Mike wrote: Can somebody point to the most logical place in the code to intercept the WAL writes? (just a rough direction would be enough)- or if this doesn’t make sense at all, another suggestion on where to get the data? (I’m trying to avoid doing it using triggers). Why are you avoiding triggers? One solution might be to use Slony to just create the log, and then read the log of events into memcached rather than another PostgreSQL instance. http://slony.info/documentation/logshipping.html Those logs might be easier to process than the WAL. Also, why do you need to intercept the WAL writes, and not just read from a WAL archive? Does this need to be synchronous? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Catching exceptions from COPY
Is it feasible to add the ability to catch exceptions from COPY? Darren -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] intercepting WAL writes
On Wed, May 28, 2008 at 8:30 PM, Mike [EMAIL PROTECTED] wrote: When you say a bit of decoding, is that because the data written to the logs is after the query parser/planner? Or because it's written in several chunks? Or? Because that's the actual recovery record. There is no SQL text, just the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE, XLOG_XACT_COMMIT, ...) and the data as it relates to that operation. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Catching exceptions from COPY
On Wed, 28 May 2008, Darren Reed wrote: Is it feasible to add the ability to catch exceptions from COPY? Depends on what you consider feasible. There's a start to a plan for that on the TODO list: http://www.postgresql.org/docs/faqs.TODO.html but it's not trivial to implement. It's also possible to do this right now using pgloader: http://pgfoundry.org/projects/pgloader/ That requires some setup and there's overhead to passing through that loading layer. A third possibility is to write a short script specifically aimed at your copy need that breaks your input files into smaller chunks and loads them, kicking back the ones that don't load, or breaking them into even smaller chunks until you've found the problem line or lines. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] intercepting WAL writes
On Wed, May 28, 2008 at 8:30 PM, Mike [EMAIL PROTECTED] wrote: When you say a bit of decoding, is that because the data written to the logs is after the query parser/planner? Or because it's written in several chunks? Or? Because that's the actual recovery record. There is no SQL text, just the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE, XLOG_XACT_COMMIT, ...) and the data as it relates to that operation. Oh- right- that makes sense. I installed and started looking at the source code for xlogviewer and xlogdump; seems like a reasonable place to start. Thanks for your help, Mike -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] Re: [HACKERS] How embarrassing: optimization of a one-shot query doesn't work
Are there any head fixes proposed for it? I am seeing some scaling problems with EAStress which uses JDBC with 8.3.0 and this one could be the reason why I am seeing some problems.. I will be happy to try it out and report on it.. The setup is ready right now if someone can point me to a patch that I can try it out and hopefully see if the patch fixes my problem. -Jignesh Dave Cramer wrote: It's pretty easy to test. prepare the query and run explain analyze on the prepared statement. Dave On 10-Apr-08, at 5:47 AM, Thomas Burdairon wrote: Is there any patch available for this one? I'm encountering troubles with some JDBC queries and I'd like to test it before asking some help on the JDBC list. Thanks. Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] Re: [HACKERS] How embarrassing: optimization of a one-shot query doesn't work
Tom Lane wrote: Jignesh K. Shah [EMAIL PROTECTED] writes: Are there any head fixes proposed for it? It's been fixed in CVS for a month. We just haven't pushed a release yet. Let me try it out and see what I find out in my EAStress workload. Regards, Jignesh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [JDBC] Re: [HACKERS] How embarrassing: optimization of a one-shot query doesn't work
Jignesh K. Shah [EMAIL PROTECTED] writes: Are there any head fixes proposed for it? It's been fixed in CVS for a month. We just haven't pushed a release yet. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] intercepting WAL writes
And you will have a chance to encounter full page writes, whole page image, which could be produced during the hot backup and the first modification to the data page after a checkpoint (if you turn full page write option on by GUC). 2008/5/29 Mike [EMAIL PROTECTED]: On Wed, May 28, 2008 at 8:30 PM, Mike [EMAIL PROTECTED] wrote: When you say a bit of decoding, is that because the data written to the logs is after the query parser/planner? Or because it's written in several chunks? Or? Because that's the actual recovery record. There is no SQL text, just the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE, XLOG_XACT_COMMIT, ...) and the data as it relates to that operation. Oh- right- that makes sense. I installed and started looking at the source code for xlogviewer and xlogdump; seems like a reasonable place to start. Thanks for your help, Mike -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] WITH RECURSIVE patch V0.1
Tom, I think this patch is plenty complicated enough without adding useless restrictive options. +1 for no additonal GUC options. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Estimating recursive query cost
All, I'm really uncomfortable with just having recursive queries return a cost of 1000 or some similar approach. That's always been a problem for SRFs and it looks to be a bigger problem for WR. However, it doesn't seem like the computer science establishment has made a lot of headway in this regard either. Most approaches I found abstracts for would cost more CPU to calculate than the query was likely to take in order to execute. Several of the stupider looking ones are for no apparent reason patented. However, since we know for certain that the recursive query is going to be executed, and we don't have multiple choices of execution paths for it, it seems like our primary concern for estimation purposes is what portion of the table will be returned by the query, i.e. should we use a table scan or an index scan, if an appropriate index is available? Or will that not be calculated at the recursive query level? --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Thu, May 29, 2008 at 2:02 AM, Simon Riggs [EMAIL PROTECTED] wrote: I'm not happy that the VACUUM waits. It might wait a very long time and cause worse overall performance than the impact of the second scan. Lets not get too paranoid about the wait. It's a minor detail in the whole theory. I would suggest that the benefit of avoiding second scan would be huge. Remember, its just not a scan, it also dirties those blocks again, forcing them write to disk. Also, if you really have a situation where vacuum needs to wait for very long, then you are already in trouble. The long running transactions would prevent vacuuming many tuples. I think we can easily tweak the wait so that it doesn't wait indefinitely. If the wait times out, vacuum can still proceed, but it can mark the DEAD line pointers as DEAD_RECLAIMED. It would then have a choice of making a second pass and reclaiming the DEAD line pointers (like it does today). So the idea is to have one pass per VACUUM, but make that one pass do the first pass of *this* VACUUM and the second pass of the *last* VACUUM. We mark the xid of the VACUUM in pg_class as you suggest, but we do it after VACUUM has completed the pass. The trick is to correctly know if the last vacuum removed the index pointers or not. There could be several ways to do that. But you need to explain in detail how it would work in cases of vacuum failures and database crash. In single pass we mark DEAD line pointers as RECENTLY_DEAD. If the last VACUUM xid is old enough we mark RECENTLY_DEAD as UNUSED, as well, during this first pass. If last xid is not old enough we do second pass to remove them. Lets not call them RECENTLY_DEAD :-) DEAD is already stricter than that. We need something even more strong. That's why I used DEAD_RECLAIMED, to note that the line pointer is DEAD and the index pointer may have been removed as well. That has the effect that large tables that are infrequently VACUUMed will need only a single scan. Smaller tables that require almost continual VACUUMing will probably do two scans, but who cares? Yeah, I think we need to target the large table case. The second pass is obviously much more costly for large tables. I think the timed-wait answers your concern. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #4204: COPY to table with FK has memory leak
On Wed, 2008-05-28 at 18:17 -0400, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: AFAICS we must aggregate the trigger checks. We would need a special property of triggers that allowed them to be aggregated when two similar checks arrived. We can then use hash aggregation to accumulate them. We might conceivably need to spill to disk also, since the aggregation may not always be effective. But in most cases the tables against which FK checks are made are significantly smaller than the tables being loaded. Once we have hash aggregated them, that is then the first part of a hash join to the target table. Well we can't aggregate them as they're created because later modifications could delete or update the original records. The SQL spec requires that FK checks be effective at the end of the command. Well, thats what we need to do. We just need to find a way... Currently, we store trigger entries by htid. I guess we need to aggregate them on the actual values looked up. The SQL spec also says that the contents of the FK check table should be taken as at the start of the command, so we should be safe to aggregate the values prior to the check. As already suggested in work on Read Only Tables, we could optimise them away to being constraint checks. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Thu, 2008-05-29 at 09:57 +0530, Pavan Deolasee wrote: On Thu, May 29, 2008 at 2:02 AM, Simon Riggs [EMAIL PROTECTED] wrote: I'm not happy that the VACUUM waits. It might wait a very long time and cause worse overall performance than the impact of the second scan. Lets not get too paranoid about the wait. It's a minor detail in the whole theory. I would suggest that the benefit of avoiding second scan would be huge. Remember, its just not a scan, it also dirties those blocks again, forcing them write to disk. Also, if you really have a situation where vacuum needs to wait for very long, then you are already in trouble. The long running transactions would prevent vacuuming many tuples. I think we can easily tweak the wait so that it doesn't wait indefinitely. If the wait times out, vacuum can still proceed, but it can mark the DEAD line pointers as DEAD_RECLAIMED. It would then have a choice of making a second pass and reclaiming the DEAD line pointers (like it does today). Which is exactly what I suggested. Don't wait, just check. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers