Re: [HACKERS] invalid magic number in log segment
On 12/11/2013 09:44 AM, Erikjan Rijkers wrote: I don't know whether the below constitutes a bug, but: Daily (sometimes even more often) I recompile 9.4devel (after git pull) to run a large dev database (100 GB or so). To avoid frequent initdb and many-hour-restore of data, I do this only when the following two #defines are unchanged: CATALOG_VERSION_NO in src/include/catalog/catversion.h, and PG_CONTROL_VERSION in src/include/catalog/pg_control.h the goal being to always run the latest db, without having to reload the ~100 GB dev db unexpectedly at inconvenient times. Generally, this works OK. However, the last few weeks I sometimes get, after such recompiling, 'invalid magic number' errors from which I don't know how to recover (it means, apparently, an initdb is needed and I have then to reload the database). 2013-12-11 00:15:25.627 CET 25304 LOG: received smart shutdown request 2013-12-11 00:15:25.631 CET 25306 LOG: shutting down 2013-12-11 00:15:25.904 CET 25306 LOG: database system is shut down 2013-12-11 08:11:59.858 CET 25490 LOG: database system was shut down at 2013-12-11 00:15:25 CET 2013-12-11 08:11:59.901 CET 25490 LOG: invalid magic number D078 in log segment 000100630034, offset 0 2013-12-11 08:11:59.901 CET 25490 LOG: invalid primary checkpoint record 2013-12-11 08:11:59.901 CET 25490 LOG: invalid magic number D078 in log segment 000100630034, offset 0 2013-12-11 08:11:59.901 CET 25490 LOG: invalid secondary checkpoint record 2013-12-11 08:11:59.901 CET 25490 PANIC: could not locate a valid checkpoint record 2013-12-11 08:12:00.326 CET 25492 FATAL: the database system is starting up 2013-12-11 08:12:01.328 CET 25493 FATAL: the database system is starting up 2013-12-11 08:12:01.682 CET 25489 LOG: startup process (PID 25490) was terminated by signal 6: Aborted 2013-12-11 08:12:01.682 CET 25489 LOG: aborting startup due to startup process failure My question is two-fold: 1. (general:) is this 'invalid magic number' unexpected, and should it be reported always? The magic number it's complaining about is the constant stored in the WAL logs, that acts as a version number. Think of it as the CATALOG_VERSION_NO, but for the WAL log format. It's bumped whenever the WAL format changes. 2. (for my setup specifically:) is there any way that I can recognize, beforehand, at the code base level, such an impending 'invalid magic number' state? Keep an eye on the XLOG_PAGE_MAGIC constaint in xlog_internal.h. Can de db be recovered from easily? (although this dev database is expendable, it takes many hours to rebuild; I'd like to avoid that if possible). Yes, you can use pg_resetxlog to recover. BTW, you could also use pg_upgrade. That should work also when catversion has been bumped. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 11/12/13 19:34, Simon Riggs wrote: Realistically, I never heard of an Oracle DBA doing advanced statistical mathematics before setting the sample size on ANALYZE. You use the default and bump it up if the sample is insufficient for the data. I'm not sure that Oracle's stats and optimizer design is an example to be envied - pretty much all Oracle DBA's I've encountered will apply hints all queries to get the plan they want... Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 11 December 2013 06:36, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I think this feature will be used in a lot of scenarios in which PITR is currently used. We have to judge which is better, we get something potential or to protect stupid. And we had better to wait author's comment... I'd say just document that it wouldn't make sense to use it for PITR. There may be some use case we can't see yet, so specifically prohibiting a use case that is not dangerous seems too much at this point. I will no doubt be reminded of these words in the future... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On 2013-12-10 19:11:03 -0500, Robert Haas wrote: Committed #1 (again). Thanks! Regarding this: + /* XXX: we could also do this unconditionally, the space is used anyway + if (copy_oid) + HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); I would like to put in a big +1 for doing that unconditionally. I didn't make that change before committing, but I think it'd be a very good idea. Ok. I wasn't sure if it wouldn't be wierd to include the oid in the tuple logged for a replica identity that doesn't cover the oid. But the downside is pretty small... Will send a patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Question about sorting internals
Hi, before I'll go any further - this is only thought-experiment. I do not plan to use such queries in real-life applications. I was just presented with a question that I can't answer in any logical way. There are two simple queries: #v+ with rok2005 (miesiac,wynik) as (VALUES (1,1),(2,2) ,(4,4),(5,NULL),(6,6)) ,rok2004 (miesiac,wynik) as (VALUES (1,3) ,(3,3),(4,5) ,(6,6)) SELECT distinct on (miesiac) * FROM ( SELECT miesiac, 2005 as rok, wynik FROM rok2005 union all SELECT miesiac, 2004 as rok, wynik FROM rok2004 ) as polaczone ORDER BY miesiac, wynik desc; #v- #v+ with rok2005 (miesiac,wynik) as (VALUES (1,1),(2,2) ,(4,4),(5,NULL),(6,6)) ,rok2004 (miesiac,wynik) as (VALUES (1,3) ,(3,3),(4,5) ,(6,6)) SELECT distinct on (miesiac) * FROM ( SELECT miesiac, 2004 as rok, wynik FROM rok2004 union all SELECT miesiac, 2005 as rok, wynik FROM rok2005 ) as polaczone ORDER BY miesiac, wynik desc; #v- They differ only in order of queries in union all part. The thing is that they return the same result. Why isn't one of them returning 2005 for 6th miesiac? I know I'm not sorting using rok, which means I'm getting undefined functionality. Fine. But what exactly is happening that regardless of order of rows in subquery, I get the same, always lower, rok in output? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ signature.asc Description: Digital signature
Re: [HACKERS] Question about sorting internals
Hi deepesz, You might want to see their EXPLAIN VERBOSE outputs. Having one of them (2004 one) lesser number of rows, might be getting picked up as first relation being union and thus ends up having it's rows before the second one. Explain output would make it more clear. Also, try having same number of rows in both the relations. On Wed, Dec 11, 2013 at 3:26 PM, hubert depesz lubaczewski dep...@depesz.com wrote: Hi, before I'll go any further - this is only thought-experiment. I do not plan to use such queries in real-life applications. I was just presented with a question that I can't answer in any logical way. There are two simple queries: #v+ with rok2005 (miesiac,wynik) as (VALUES (1,1),(2,2) ,(4,4),(5,NULL),(6,6)) ,rok2004 (miesiac,wynik) as (VALUES (1,3) ,(3,3),(4,5) ,(6,6)) SELECT distinct on (miesiac) * FROM ( SELECT miesiac, 2005 as rok, wynik FROM rok2005 union all SELECT miesiac, 2004 as rok, wynik FROM rok2004 ) as polaczone ORDER BY miesiac, wynik desc; #v- #v+ with rok2005 (miesiac,wynik) as (VALUES (1,1),(2,2) ,(4,4),(5,NULL),(6,6)) ,rok2004 (miesiac,wynik) as (VALUES (1,3) ,(3,3),(4,5) ,(6,6)) SELECT distinct on (miesiac) * FROM ( SELECT miesiac, 2004 as rok, wynik FROM rok2004 union all SELECT miesiac, 2005 as rok, wynik FROM rok2005 ) as polaczone ORDER BY miesiac, wynik desc; #v- They differ only in order of queries in union all part. The thing is that they return the same result. Why isn't one of them returning 2005 for 6th miesiac? I know I'm not sorting using rok, which means I'm getting undefined functionality. Fine. But what exactly is happening that regardless of order of rows in subquery, I get the same, always lower, rok in output? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Completing PL support for Event Triggers
Peter Eisentraut pete...@gmx.net writes: I think you are mistaken. My patch includes all changes between your v1 and v2 patch. I mistakenly remembered that we did remove all the is_event_trigger business from the plperl patch too, when it's not the case. Sorry about this confusion. My vote is for “ready for commit” then. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On 10 December 2013 19:55 Alvaro Herrera wrote: Haribabu kommi escribió: To detect provided data and xlog directories are same or not, I reused the Existing make_absolute_path() code as follows. 1. Moved the make_absolute_path() function from miscinit.c to path.c and Changed all error reporting functions. And also it returns NULL incase of any error. 2. Added a new file called fe_path.c which contains make_absolute_path() only for frontend code. Whatever you do, please don't add #include lines to postgres_fe.h. Add them to whatever .c files that need to include the new header, instead. (This results in a longer patch, yes, but that consideration shouldn't drive anything. There is a desire to include as less headers as possible in each source file, and adding more include lines to postgres_fe.h means the new header will be included by every single frontend file, even those not in core.) See a nearby patch by Bruce Momjian to deal with getpwnam() and getpwuid() failures; perhaps the idea of returning an error string should be designed similarly in both these patches. Also consider using the psprintf stuff, which works on both backend and frontend, avoiding malloc etc so that code can be shared by both frontend and backend, eliminating the duplicity. The make_absolute_path() function moving to port is changed in similar way as Bruce Momjian approach. The psprintf is used to store the error string which Occurred in the function. But psprintf is not used for storing the absolute path As because it is giving problems in freeing the allocated memory in SelectConfigFiles. Because the same memory is allocated in a different code branch from guc_malloc. After adding the make_absolute_path() function with psprintf stuff in path.c file It is giving linking problem in compilation of ecpg. I am not able to find the problem. So I added another file abspath.c in port which contains these two functions. Updated patches are attached in the mail. Please provide your suggestions. Regards, Hari babu. same_dir_error_v2.patch Description: same_dir_error_v2.patch make_abs_path_v2.patch Description: make_abs_path_v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about sorting internals
On Wed, Dec 11, 2013 at 03:34:38PM +0530, Ashutosh Bapat wrote: Hi deepesz, You might want to see their EXPLAIN VERBOSE outputs. Having one of them (2004 one) lesser number of rows, might be getting picked up as first relation being union and thus ends up having it's rows before the second one. Explain output would make it more clear. Also, try having same number of rows in both the relations. Explains: QUERY PLAN Unique (cost=0.44..0.48 rows=9 width=12) (actual time=0.030..0.035 rows=6 loops=1) Output: rok2004.miesiac, (2004), rok2004.wynik CTE rok2004 - Values Scan on *VALUES* (cost=0.00..0.06 rows=5 width=8) (actual time=0.001..0.003 rows=5 loops=1) Output: *VALUES*.column1, *VALUES*.column2 CTE rok2005 - Values Scan on *VALUES*_1 (cost=0.00..0.05 rows=4 width=8) (actual time=0.000..0.001 rows=4 loops=1) Output: *VALUES*_1.column1, *VALUES*_1.column2 - Sort (cost=0.32..0.35 rows=9 width=12) (actual time=0.029..0.031 rows=9 loops=1) Output: rok2004.miesiac, (2004), rok2004.wynik Sort Key: rok2004.miesiac, rok2004.wynik Sort Method: quicksort Memory: 25kB - Append (cost=0.00..0.18 rows=9 width=12) (actual time=0.007..0.018 rows=9 loops=1) - CTE Scan on rok2004 (cost=0.00..0.10 rows=5 width=12) (actual time=0.006..0.011 rows=5 loops=1) Output: rok2004.miesiac, 2004, rok2004.wynik - CTE Scan on rok2005 (cost=0.00..0.08 rows=4 width=12) (actual time=0.002..0.004 rows=4 loops=1) Output: rok2005.miesiac, 2005, rok2005.wynik Total runtime: 0.077 ms (18 rows) QUERY PLAN Unique (cost=0.44..0.48 rows=9 width=12) (actual time=0.024..0.027 rows=6 loops=1) Output: rok2005.miesiac, (2005), rok2005.wynik CTE rok2004 - Values Scan on *VALUES* (cost=0.00..0.06 rows=5 width=8) (actual time=0.001..0.003 rows=5 loops=1) Output: *VALUES*.column1, *VALUES*.column2 CTE rok2005 - Values Scan on *VALUES*_1 (cost=0.00..0.05 rows=4 width=8) (actual time=0.001..0.003 rows=4 loops=1) Output: *VALUES*_1.column1, *VALUES*_1.column2 - Sort (cost=0.32..0.35 rows=9 width=12) (actual time=0.023..0.024 rows=9 loops=1) Output: rok2005.miesiac, (2005), rok2005.wynik Sort Key: rok2005.miesiac, rok2005.wynik Sort Method: quicksort Memory: 25kB - Append (cost=0.00..0.18 rows=9 width=12) (actual time=0.004..0.015 rows=9 loops=1) - CTE Scan on rok2005 (cost=0.00..0.08 rows=4 width=12) (actual time=0.003..0.006 rows=4 loops=1) Output: rok2005.miesiac, 2005, rok2005.wynik - CTE Scan on rok2004 (cost=0.00..0.10 rows=5 width=12) (actual time=0.001..0.006 rows=5 loops=1) Output: rok2004.miesiac, 2004, rok2004.wynik Total runtime: 0.053 ms (18 rows) So, it looks like rowcount is the one thing that's different. Not entirely sure how the logic would be to make rowcount differ. After some more talk on #postgresql, it looks like I will have to spend some time with debugger to see what's happening there. Best regards, depesz signature.asc Description: Digital signature
Re: [HACKERS] same-address mappings vs. relative pointers
On Dec5, 2013, at 15:44 , Andres Freund and...@2ndquadrant.com wrote: There might be some ugly compiler dependent magic we could do. Depending on how we decide to declare offsets. Like (very, very roughly) #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ type relptr_type; \ Offset relptr_off; } And then, for accessing have: #define relptr_access(seg, off) \ typeof(off.relptr_type)* (((char *)seg-base_address) + off.relptr_off) But boy, that's ugly. Well, uglyness we can live with, especially if it's less ugly than the alternatives. But I'm afraid is also unportable - typeof() is a GCC extension, not a part of ANSI C, no? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] same-address mappings vs. relative pointers
On 2013-12-11 11:42:25 +0100, Florian Pflug wrote: On Dec5, 2013, at 15:44 , Andres Freund and...@2ndquadrant.com wrote: There might be some ugly compiler dependent magic we could do. Depending on how we decide to declare offsets. Like (very, very roughly) #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ type relptr_type; \ Offset relptr_off; } And then, for accessing have: #define relptr_access(seg, off) \ typeof(off.relptr_type)* (((char *)seg-base_address) + off.relptr_off) But boy, that's ugly. Well, uglyness we can live with, especially if it's less ugly than the alternatives. But I'm afraid is also unportable - typeof() is a GCC extension, not a part of ANSI C, no? Yes (although there's C11 stuff to do equivalent stuff afair) - I was thinking of only doing it for compilers we support that dark magic for and fall back to returning a void* for the others. We'll probably miss a cast or two required on !gcc that way, but it's still likely to be less error prone. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On Wed, Dec 11, 2013 at 12:58 AM, Simon Riggs si...@2ndquadrant.com wrote: Yes, it is not a perfect statistical sample. All sampling is subject to an error that is data dependent. Well there's random variation due to the limitations of dealing with a sample. And then there's systemic biases due to incorrect algorithms. You wouldn't be happy if the samples discarded every row with NULLs or every row older than some date etc. These things would not be corrected by larger samples. That's the kind of error we're talking about here. But the more I think about things the less convinced I am that there is a systemic bias introduced by reading the entire block. I had assumed larger rows would be selected against but that's not really true, they're just selected against relative to the number of bytes they occupy which is the correct frequency to sample. Even blocks that are mostly empty don't really bias things. Picture a table that consists of 100 blocks with 100 rows each (value A) and another 100 blocks with only 1 row each (value B). The rows with value B have a 50% chance of being in any given block which is grossly inflated however each block selected with value A will produce 100 rows. So if you sample 10 blocks you'll get 100x10xA and 1x10xB which will be the correct proportion. I'm not actually sure there is any systemic bias here. The larger number of rows per block generate less precise results but from my thought experiments they seem to still be accurate? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why the buildfarm is all pink
On 2013-12-10 19:55:12 -0500, Tom Lane wrote: I was surprised to see that my back-patches of the recent SubLink unpleasantness were failing on many of the buildfarm members, but only in the 9.1 and 9.0 branches. The difficulty appears to be that the EXPLAIN output for the new test query changes depending on whether or not tenk1 has been analyzed yet. In 9.2 and up, it reliably has been, because create_index runs first and that script does this: create_index.sql:901:vacuum analyze tenk1;-- ensure we get consistent plans here so depending on timing, one of those might have gotten the job done, or maybe autovacuum would show up in time to save the day. We need a more consistent strategy for this :-( Agreed, although I have no clue how it should look like. As a further datapoint I'll add that installcheck already regularly fails in HEAD if you have a HS standby connected via SR and hot_standby_feedback=on on the standby. Some plans just change from index(only) scans to sequential scans, presumably because of the lower xmin horizon changed the stats. Since there's nothing running on the standby in those cases, there has to be a pretty damn tiny window here somewhere. I've wondered whether we could fix that by a) more explicit vacuum/analyzes b) a function waiting for quiescent state. Arguably we could just define that being unsupported, but given there's no testing of recovery but that at all that doesn't seem like a good idea. There's also fun in running with vacuum_defer_cleanup_age 0, but I don't think there's much chance of supporting that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Hi, Stephen Frost sfr...@snowman.net writes: * Jeff Davis (pg...@j-davis.com) wrote: What is stopping Extension Templates, as proposed, from being this special extension creation mode? What would be a better design? The extra catalog tables which store SQL scripts in text columns is one of my main objections to the as-proposed Extension Templates. I view those scripts as a poor man's definition of database objects which are defined properly in the catalog already. I have a very hard time to understand this objection. PL/SQL functions are just a SQL script stored as-is in the catalogs. That applies the same way to any other PL language too, with scripts stored as-is in the catalogs in different languages. Even views are stored in a textual way in the catalogs, albeit in a specific pre-processed format, it's still a text blob that could pass for a script in a backend specific language, parsed by the rewriter. So while I hear your objection to the script in catalog idea Stephen, I think we should move forward. We don't have the luxury of only applying patches where no compromise has to be made, where everyone is fully happy with the solution we find as a community. The other big issue is that there isn't an easy way to see how we could open up the ability to create extensions to non-superusers with this approach. The main proposal here is to only allow the owner of a template to install it as an extension. For superusers, we can implement the needed SET ROLE command automatically in the CREATE EXTENSION command. Is there another security issue that this “same role” approach is not solving? I don't think so. It seems like the porting issue is just a matter of finding someone to write a tool to reliably translate packages from PGXN into a form suitable to be sent using SQL commands; which we would need anyway for this special mode. I already mentionned that's on my roadmap, part of the vision I'm trying to implement here. My goal is to deliver the full solution for 9.4, and this Extension Templates facility is the missing in-core bits of it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] same-address mappings vs. relative pointers
On Dec11, 2013, at 11:47 , Andres Freund and...@2ndquadrant.com wrote: On 2013-12-11 11:42:25 +0100, Florian Pflug wrote: On Dec5, 2013, at 15:44 , Andres Freund and...@2ndquadrant.com wrote: There might be some ugly compiler dependent magic we could do. Depending on how we decide to declare offsets. Like (very, very roughly) #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ type relptr_type; \ Offset relptr_off; } And then, for accessing have: #define relptr_access(seg, off) \ typeof(off.relptr_type)* (((char *)seg-base_address) + off.relptr_off) But boy, that's ugly. Well, uglyness we can live with, especially if it's less ugly than the alternatives. But I'm afraid is also unportable - typeof() is a GCC extension, not a part of ANSI C, no? Yes (although there's C11 stuff to do equivalent stuff afair) - I was thinking of only doing it for compilers we support that dark magic for and fall back to returning a void* for the others. We'll probably miss a cast or two required on !gcc that way, but it's still likely to be less error prone. Would it? For this to catch type mismatches, you'd both need to develop on a typeof-supporting compiler *and* don't cast the result of relptr_access(). But you can't really do that, because the code will then fail on compilers which don't support typeof()... What we could do, I guess, is to pass the type to relptr_access() and to relptr(), and let the compiler verify that they are the same. Something like #define relptr(type) union { \ type relptr_type; \ Offset relptr_off; \ } #define relptr_access(type, seg, rptr) \ (type)( \ (rptr.relptr_type - (type)0), \ ((char*)seg-base_address) + rptr.relptr_off \ ) And, yes, ouch ;-) best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] same-address mappings vs. relative pointers
On 2013-12-11 12:37:56 +0100, Florian Pflug wrote: On Dec11, 2013, at 11:47 , Andres Freund and...@2ndquadrant.com wrote: On 2013-12-11 11:42:25 +0100, Florian Pflug wrote: Yes (although there's C11 stuff to do equivalent stuff afair) - I was thinking of only doing it for compilers we support that dark magic for and fall back to returning a void* for the others. We'll probably miss a cast or two required on !gcc that way, but it's still likely to be less error prone. Would it? For this to catch type mismatches, you'd both need to develop on a typeof-supporting compiler *and* don't cast the result of relptr_access(). But you can't really do that, because the code will then fail on compilers which don't support typeof()... Yea, right. What we could do, I guess, is to pass the type to relptr_access() and to relptr(), and let the compiler verify that they are the same. Tom and I actually added a macro that's helpful for that recently: AssertVariableIsOfType(). With that we should be able to get something reasonable, failing at compile time, with a useful error message even ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On Wed, Dec 11, 2013 at 11:01 AM, Greg Stark st...@mit.edu wrote: I'm not actually sure there is any systemic bias here. The larger number of rows per block generate less precise results but from my thought experiments they seem to still be accurate? So I've done some empirical tests for a table generated by: create table sizeskew as (select i,j,repeat('i',i) from generate_series(1,1000) as i, generate_series(1,1000) as j); I find that using the whole block doesn't cause any problem with the avg_width field for the repeat column.That does reinforce my belief that we might not need any particularly black magic here. It does however cause a systemic error in the histogram bounds. It seems the median is systematically overestimated by more and more the larger the number of rows per block are used: 1: 524 4: 549 8: 571 12: 596 16: 602 20: 618 (total sample slightly smaller than normal) 30: 703 (substantially smaller sample) So there is something clearly wonky in the histogram stats that's affected by the distribution of the sample. The only thing I can think of is maybe the most common elements are being selected preferentially from the early part of the sample which is removing a substantial part of the lower end of the range. But even removing 100 from the beginning shouldn't be enough to push the median above 550. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On Wed, Dec 11, 2013 at 12:08 PM, Greg Stark st...@mit.edu wrote: The only thing I can think of is maybe the most common elements are being selected preferentially from the early part of the sample which is removing a substantial part of the lower end of the range. But even removing 100 from the beginning shouldn't be enough to push the median above 550. Just to follow up here. I think what's going is that not only are the most_common_vals being preferentially taken from the beginning of the sample but also their frequency is being massively overestimated. All values have a frequency of about .001 but the head of the MCV has a frequency as high as .10 in some of my tests. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add transforms feature
On Tue, Dec 10, 2013 at 10:35 PM, Peter Eisentraut pete...@gmx.net wrote: On Fri, 2013-12-06 at 11:28 +0100, Dimitri Fontaine wrote: Here is an idea. Add a GUC that basically says something like use_transforms = on|off. You can then attach that to individual functions, which is the right granularity, because only the function knows whether its code expects transforms or not. But you can use the full power of GUC to configure it any way you want. Here is an updated patch that implements this, makes some of the documentation improvements that you suggested, and rebases everything. I'm still kinda unimpressed by this. Behavior-changing GUC, uggh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] should we add a XLogRecPtr/LSN SQL type?
Hi, There's already a couple of SQL function dealing with XLogRecPtrs and the logical replication work will add a couple of more. Currently each of those funtions taking/returning an LSN does sprintf/scanf to print/parse the strings. Which both is awkward and potentially noticeable performancewise. It seems relatively simple to add a proper type, with implicit casts from text, instead? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extra functionality to createuser
On Tue, Dec 10, 2013 at 9:55 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Dec 10, 2013 at 12:20 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Dec 7, 2013 at 11:39 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Dec 6, 2013 at 10:31 AM, Peter Eisentraut pete...@gmx.net wrote: How about only one role name per -g option, but allowing the -g option to be repeated? I think that might simplify the problem and patch, but do you think it is okay to have inconsistency for usage of options between Create User statement and this utility? Yes. In general, command-line utilities use a very different syntax for options-passing that SQL commands. Trying to make them consistent feels unnecessary or perhaps even counterproductive. And the proposed syntax is certainly a convention common to many other command-line utilities, so I think it's fine. Okay, the new way for syntax suggested by Peter has simplified the problem. Please find the updated patch and docs for multiple -g options. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/11/2013 02:08 PM, Greg Stark wrote: On Wed, Dec 11, 2013 at 11:01 AM, Greg Stark st...@mit.edu wrote: I'm not actually sure there is any systemic bias here. The larger number of rows per block generate less precise results but from my thought experiments they seem to still be accurate? So I've done some empirical tests for a table generated by: create table sizeskew as (select i,j,repeat('i',i) from generate_series(1,1000) as i, generate_series(1,1000) as j); I find that using the whole block doesn't cause any problem with the avg_width field for the repeat column.That does reinforce my belief that we might not need any particularly black magic here. How large a sample did you use? Remember that the point of doing block-level sampling instead of the current approach would be to allow using a significantly smaller sample (in # of blocks), and still achieve the same sampling error. If the sample is large enough, it will mask any systemic bias caused by block-sampling, but the point is to reduce the number of sampled blocks. The practical question here is this: What happens to the quality of the statistics if you only read 1/2 the number of blocks than you normally would, but included all the rows in the blocks we read in the sample? How about 1/10 ? Or to put it another way: could we achieve more accurate statistics by including all rows from the sampled rows, while reading the same number of blocks? In particular, I wonder if it would help with estimating ndistinct. It generally helps to have a larger sample for ndistinct estimation, so it might be beneficial. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On Wed, Dec 11, 2013 at 7:41 AM, Andres Freund and...@2ndquadrant.com wrote: There's already a couple of SQL function dealing with XLogRecPtrs and the logical replication work will add a couple of more. Currently each of those funtions taking/returning an LSN does sprintf/scanf to print/parse the strings. Which both is awkward and potentially noticeable performancewise. It seems relatively simple to add a proper type, with implicit casts from text, instead? I'm pretty sure that this was discussed last year, and I voted for it -- except for the implicit casts part, perhaps -- but more people voted against it, so it died. I still think that was a mistake, but I just work here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?
On 11-12-2013 09:41, Andres Freund wrote: There's already a couple of SQL function dealing with XLogRecPtrs and the logical replication work will add a couple of more. Currently each of those funtions taking/returning an LSN does sprintf/scanf to print/parse the strings. Which both is awkward and potentially noticeable performancewise. While discussing pg_xlog_location_diff function, Robert posted a lsn datatype [1]. At that time we wouldn't go that far (a new datatype) to cover only one function. If your proposal is just validation, I think generic validation functions is the way to follow. However, if you are thinking in adding operators, the lsn datatype should be implemented. It seems relatively simple to add a proper type, with implicit casts from text, instead? Do you want to change the function signatures too? [1] http://www.postgresql.org/message-id/ca+tgmozrmnn0evesd-kxb9e-mvdmwoti6guujuvqp_8q2c5...@mail.gmail.com -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
On Tue, Dec 10, 2013 at 11:40 PM, Fujii Masao masao.fu...@gmail.com wrote: While I was investigaing PQhost() for that approach, I found several problems of PQhost(). (1) PQhost() can return Unix-domain socket directory path even in the platform that doesn't support Unix-domain socket. (2) In the platform that doesn't support Unix-domain socket, when neither host nor hostaddr are specified, the default host 'localhost' is used to connect to the server and PQhost() must return that, but it doesn't. I think changing PQhost() so that it returns DefaultHost rather than conn-pgunixsocket when we don't HAVE_UNIX_SOCKETS is a back-patchable bug fix, and I'd say go for it. (3) PQhost() cannot return the hostaddr. However, I'm much less sure whether this is something that we want to do at all. It seems like this might be a definition of what the function does, and I'm not sure whether the new definition is what everyone will want. On the other hand, I'm also not sure that it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
On Wed, Dec 11, 2013 at 10:26 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 10, 2013 at 11:40 PM, Fujii Masao masao.fu...@gmail.com wrote: While I was investigaing PQhost() for that approach, I found several problems of PQhost(). (1) PQhost() can return Unix-domain socket directory path even in the platform that doesn't support Unix-domain socket. (2) In the platform that doesn't support Unix-domain socket, when neither host nor hostaddr are specified, the default host 'localhost' is used to connect to the server and PQhost() must return that, but it doesn't. I think changing PQhost() so that it returns DefaultHost rather than conn-pgunixsocket when we don't HAVE_UNIX_SOCKETS is a back-patchable bug fix, and I'd say go for it. Agreed. (3) PQhost() cannot return the hostaddr. However, I'm much less sure whether this is something that we want to do at all. It seems like this might be a definition of what the function does, and I'm not sure whether the new definition is what everyone will want. On the other hand, I'm also not sure that it isn't. If we don't change PQhost() in that way, we cannot fix the following problem of wrong output of \conninfo, for example. $ psql -d hostaddr=127.0.0.1 =# \conninfo You are connected to database postgres as user postgres via socket in /tmp at port 5432. Regards. -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On Sun, Oct 20, 2013 at 7:21 AM, Magnus Hagander mag...@hagander.net wrote: It seemed neater to me to create a new flag, so that in principle any vacuum() code path can request autovacuum_work_mem, rather than having lazyvacuum.c code call IsAutoVacuumWorkerProcess() for the same purpose. To date, that's only been done within vacuumlazy.c for things like logging. Hmm. I'm not entirely sure I agree that that makes it neater :) We could also look at autovacuum_vacuum_cost_limit etc above, but those just override what the non-autovac parameters do. But since the parameter is called maintenance_work_mem in that case, I think that would make it harder to read. But I'd suggest just a: int vac_work_mem = (IsAutoVacuumWorkerProcess() autovacuum_work_mem != -1) ? autovacuum_work_mem : maintenance_work_mem; and not sending around a boolean flag through a bunch of places when it really means just the same thing, +1 for that change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
On Wed, Dec 11, 2013 at 8:45 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Dec 11, 2013 at 10:26 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 10, 2013 at 11:40 PM, Fujii Masao masao.fu...@gmail.com wrote: While I was investigaing PQhost() for that approach, I found several problems of PQhost(). (1) PQhost() can return Unix-domain socket directory path even in the platform that doesn't support Unix-domain socket. (2) In the platform that doesn't support Unix-domain socket, when neither host nor hostaddr are specified, the default host 'localhost' is used to connect to the server and PQhost() must return that, but it doesn't. I think changing PQhost() so that it returns DefaultHost rather than conn-pgunixsocket when we don't HAVE_UNIX_SOCKETS is a back-patchable bug fix, and I'd say go for it. Agreed. (3) PQhost() cannot return the hostaddr. However, I'm much less sure whether this is something that we want to do at all. It seems like this might be a definition of what the function does, and I'm not sure whether the new definition is what everyone will want. On the other hand, I'm also not sure that it isn't. If we don't change PQhost() in that way, we cannot fix the following problem of wrong output of \conninfo, for example. $ psql -d hostaddr=127.0.0.1 =# \conninfo You are connected to database postgres as user postgres via socket in /tmp at port 5432. Yeah, that's true. But the whole point of having both host and hostaddr seems to be that you can lie about where you're connecting. If you set host=some.pretty.domain.name hostaddr=1.2.3.4, the point is to say that you're connecting to the first while, under the covers, actually connecting to the second. Now, I am unclear what value this has, but someone at some point evidently thought it was a good idea, so we need to be careful about changing it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add transforms feature
On 12/11/2013 01:40 PM, Robert Haas wrote: On Tue, Dec 10, 2013 at 10:35 PM, Peter Eisentraut pete...@gmx.net wrote: On Fri, 2013-12-06 at 11:28 +0100, Dimitri Fontaine wrote: Here is an idea. Add a GUC that basically says something like use_transforms = on|off. You can then attach that to individual functions, which is the right granularity, because only the function knows whether its code expects transforms or not. But you can use the full power of GUC to configure it any way you want. Here is an updated patch that implements this, makes some of the documentation improvements that you suggested, and rebases everything. I'm still kinda unimpressed by this. Behavior-changing GUC, uggh. It should work ok if we could somehow check that the GUC is set on the function and fall back to session GUC in case it is not. Not sure if this is possible though. The need from this arises from calling other functions from a new func. At the moment if there is a new function defined as CREATE FUNCTION f_uses_xforms() AS $$ ... $$ SET use_transforms=on; calls a legacy function which will break if transforms are used then the _old_ function declaration needs to be modified to add (use_transforms=off) It is much easier than debugging/rewriting the function, but this is something I'd like us to be able to avoid. PS. maybe we could resurrect the WITH (attribute, ...) available in CREATE FUNCTION syntax for passing function-specific flags ? Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about sorting internals
On Wed, Dec 11, 2013 at 4:56 AM, hubert depesz lubaczewski dep...@depesz.com wrote: before I'll go any further - this is only thought-experiment. I do not plan to use such queries in real-life applications. I was just presented with a question that I can't answer in any logical way. There are two simple queries: #v+ with rok2005 (miesiac,wynik) as (VALUES (1,1),(2,2) ,(4,4),(5,NULL),(6,6)) ,rok2004 (miesiac,wynik) as (VALUES (1,3) ,(3,3),(4,5) ,(6,6)) SELECT distinct on (miesiac) * FROM ( SELECT miesiac, 2005 as rok, wynik FROM rok2005 union all SELECT miesiac, 2004 as rok, wynik FROM rok2004 ) as polaczone ORDER BY miesiac, wynik desc; #v- #v+ with rok2005 (miesiac,wynik) as (VALUES (1,1),(2,2) ,(4,4),(5,NULL),(6,6)) ,rok2004 (miesiac,wynik) as (VALUES (1,3) ,(3,3),(4,5) ,(6,6)) SELECT distinct on (miesiac) * FROM ( SELECT miesiac, 2004 as rok, wynik FROM rok2004 union all SELECT miesiac, 2005 as rok, wynik FROM rok2005 ) as polaczone ORDER BY miesiac, wynik desc; #v- They differ only in order of queries in union all part. The thing is that they return the same result. Why isn't one of them returning 2005 for 6th miesiac? The query planner sees that in order for the output ordering to match the ORDER BY clause, it's got to sort by miesiac, wynik desc. The DISTINCT ON clause can be implemented very cheaply after that - every time the value of miesiac changes, it emits only the first of the rows with the new value. So it's a good plan. However, because the sort happens before the unique step, the results you get are dependent on what order the sort happens to emit the rows. Our sort algorithms are not stable, so there's no particular guarantee about the order in which rows will pop out, beyond the fact that they must obey the sort key. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why the buildfarm is all pink
Tom Lane t...@sss.pgh.pa.us wrote: I haven't touched matview.sql here; that seems like a distinct issue. I'll fix that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On Dec10, 2013, at 15:32 , Claudio Freire klaussfre...@gmail.com wrote: On Tue, Dec 10, 2013 at 11:02 AM, Greg Stark st...@mit.edu wrote: On 10 Dec 2013 08:28, Albe Laurenz laurenz.a...@wien.gv.at wrote: Doesn't all that assume a normally distributed random variable? I don't think so because of the law of large numbers. If you have a large population and sample it the sample behaves like a normal distribution when if the distribution of the population isn't. No, the large population says that if you have an AVERAGE of many samples of a random variable, the random variable that is the AVERAGE behaves like a normal. Actually, that's the central limit theorem, and it doesn't hold for all random variables, only for those with finite expected value and variance. The law of large numbers, in contrast, only tells you that the AVERAGE of n samples of a random variable will converge to the random variables' expected value as n goes to infinity (there are different versions of the law which guarantee different kinds of convergence, weak or strong). best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
On Mon, Dec 9, 2013 at 1:40 PM, knizhnik knizh...@garret.ru wrote: Hello! I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL: Documentation: http://www.garret.ru/imcs/user_guide.html Sources: http://www.garret.ru/imcs-1.01.tar.gz Any feedbacks, bug reports and suggestions are welcome. Vertical representation of data is stored in PostgreSQL shared memory. This is why it is important to be able to utilize all available physical memory. Now servers with Tb or more RAM are not something exotic, especially in financial world. But there is limitation in Linux with standard 4kb pages for maximal size of mapped memory segment: 256Gb. It is possible to overcome this limitation either by creating multiple segments - but it requires too much changes in PostgreSQL memory manager. Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the system). I found several messages related with MAP_HUGETLB flag, the most recent one was from 21 of November: http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org I wonder what is the current status of this patch? I looked over your extension. I think it's a pretty amazing example of the postgres extension and type systems -- up there with postgis. Very well done. How long did this take you to write? MAP_HUGETLB patch was marked 'returned with feedback'. https://commitfest.postgresql.org/action/patch_view?id=1308. It seems likely to be revived, perhaps in time for 9.4. Honestly, I think your efforts here provide more argument for adding huge tbl support. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
On 2013-12-11 08:56:43 -0500, Robert Haas wrote: $ psql -d hostaddr=127.0.0.1 =# \conninfo You are connected to database postgres as user postgres via socket in /tmp at port 5432. Yeah, that's true. But the whole point of having both host and hostaddr seems to be that you can lie about where you're connecting. If you set host=some.pretty.domain.name hostaddr=1.2.3.4, the point is to say that you're connecting to the first while, under the covers, actually connecting to the second. Now, I am unclear what value this has, but someone at some point evidently thought it was a good idea, so we need to be careful about changing it. One use case is accessing a particular host when using DNS round robin to standbys in combination with SSL. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On 25 November 2013 21:51, Peter Geoghegan p...@heroku.com wrote: On Sun, Nov 24, 2013 at 9:06 AM, Simon Riggs si...@2ndquadrant.com wrote: VACUUM uses 6 bytes per dead tuple. And autovacuum regularly removes dead tuples, limiting their numbers. In what circumstances will the memory usage from multiple concurrent VACUUMs become a problem? In those circumstances, reducing autovacuum_work_mem will cause more passes through indexes, dirtying more pages and elongating the problem workload. Yes, of course, but if we presume that the memory for autovacuum workers to do everything in one pass simply isn't there, it's still better to do multiple passes. That isn't clear to me. It seems better to wait until we have the memory. My feeling is this parameter is a fairly blunt approach to the problems of memory pressure on autovacuum and other maint tasks. I am worried that it will not effectively solve the problem. I don't wish to block the patch; I wish to get to an effective solution to the problem. A better aproach to handling memory pressure would be to globally coordinate workers so that we don't oversubscribe memory, allocating memory from a global pool. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add transforms feature
On Wed, Dec 11, 2013 at 9:19 AM, Hannu Krosing ha...@2ndquadrant.com wrote: On 12/11/2013 01:40 PM, Robert Haas wrote: On Tue, Dec 10, 2013 at 10:35 PM, Peter Eisentraut pete...@gmx.net wrote: On Fri, 2013-12-06 at 11:28 +0100, Dimitri Fontaine wrote: Here is an idea. Add a GUC that basically says something like use_transforms = on|off. You can then attach that to individual functions, which is the right granularity, because only the function knows whether its code expects transforms or not. But you can use the full power of GUC to configure it any way you want. Here is an updated patch that implements this, makes some of the documentation improvements that you suggested, and rebases everything. I'm still kinda unimpressed by this. Behavior-changing GUC, uggh. It should work ok if we could somehow check that the GUC is set on the function and fall back to session GUC in case it is not. Not sure if this is possible though. The need from this arises from calling other functions from a new func. At the moment if there is a new function defined as CREATE FUNCTION f_uses_xforms() AS $$ ... $$ SET use_transforms=on; calls a legacy function which will break if transforms are used then the _old_ function declaration needs to be modified to add (use_transforms=off) Yeah, exactly. It is much easier than debugging/rewriting the function, but this is something I'd like us to be able to avoid. PS. maybe we could resurrect the WITH (attribute, ...) available in CREATE FUNCTION syntax for passing function-specific flags ? It's a thought. Or you could put some annotation in the function body, as we do in PL/pgsql with the #option syntax. Of course, making everyone decorate their new functions with references to the transforms they want to use isn't wonderful either, but it might be good at least to have the option. You could allow the use of all installed transforms by default, but let people say WITH (transforms='') if they don't want to use them or WITH (transforms='comma, separated, list') if the want to require certain ones. Unfortunately, that'll probably mean that virtually all portable code for procedural languages has to include some form of this incantation, just as any nearly any PL/pgsql function has to include SET search_path = '' if it wants to be not trivially subvertable. It's annoying to grow more such decoration, but the alternative seems to be hoping that nobody wants to write portable code that uses non-core types, and that doesn't seem better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On Wed, Dec 11, 2013 at 9:43 AM, Simon Riggs si...@2ndquadrant.com wrote: On 25 November 2013 21:51, Peter Geoghegan p...@heroku.com wrote: On Sun, Nov 24, 2013 at 9:06 AM, Simon Riggs si...@2ndquadrant.com wrote: VACUUM uses 6 bytes per dead tuple. And autovacuum regularly removes dead tuples, limiting their numbers. In what circumstances will the memory usage from multiple concurrent VACUUMs become a problem? In those circumstances, reducing autovacuum_work_mem will cause more passes through indexes, dirtying more pages and elongating the problem workload. Yes, of course, but if we presume that the memory for autovacuum workers to do everything in one pass simply isn't there, it's still better to do multiple passes. That isn't clear to me. It seems better to wait until we have the memory. My feeling is this parameter is a fairly blunt approach to the problems of memory pressure on autovacuum and other maint tasks. I am worried that it will not effectively solve the problem. I don't wish to block the patch; I wish to get to an effective solution to the problem. A better aproach to handling memory pressure would be to globally coordinate workers so that we don't oversubscribe memory, allocating memory from a global pool. This is doubtless true, but that project is at least two if not three orders of magnitude more complex than what's being proposed here, and I don't think we should make the perfect the enemy of the good. Right now, maintenance_work_mem controls the amount of memory that we're willing to use for either a vacuum operation or an index build. Those things don't have much to do with each other, so it's not hard for me to imagine that someone might want to configure different memory usage for one than the other. This patch would allow that, and I think that's good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
On Wed, Dec 11, 2013 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-11 08:56:43 -0500, Robert Haas wrote: $ psql -d hostaddr=127.0.0.1 =# \conninfo You are connected to database postgres as user postgres via socket in /tmp at port 5432. Yeah, that's true. But the whole point of having both host and hostaddr seems to be that you can lie about where you're connecting. If you set host=some.pretty.domain.name hostaddr=1.2.3.4, the point is to say that you're connecting to the first while, under the covers, actually connecting to the second. Now, I am unclear what value this has, but someone at some point evidently thought it was a good idea, so we need to be careful about changing it. One use case is accessing a particular host when using DNS round robin to standbys in combination with SSL. Ah, interesting point. And it's not inconceivable that some application out there could be using PQhost() to retrieve the host from an existing connection object and reusing that value for a new connection, in which case redefining it to sometimes return hostaddr would break things. So I think we shouldn't do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why the buildfarm is all pink
Kevin Grittner kgri...@ymail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: I haven't touched matview.sql here; that seems like a distinct issue. I'll fix that. Done. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
From: Amit Kapila amit.kapil...@gmail.com I think it is better to keep it like what I suggested above, because in that case it will assign default name even if postgres -C fails due to some reason. 2. What will happen if user doesn't change the name in event_source or kept the same name, won't it hit the same problem again? So shouldn't it try to generate different name by appending version string to it? I re-considered that. As you suggested, I think I'll do as follows. Would this be OK? [pg_ctl.c] evtHandle = RegisterEventSource(NULL, *event_source ? event_source : PostgreSQL PG_MAJORVERSION); [guc.c] {event_source, PGC_POSTMASTER, LOGGING_WHERE, ... PostgreSQL PG_MAJORVERSION, NULL, NULL, NULL [elog.c] Writing the default value in this file was redundant, because event_source cannot be NULL. So change evtHandle = RegisterEventSource(NULL, event_source ? event_source : PostgreSQL); to evtHandle = RegisterEventSource(NULL, event_source); Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why the buildfarm is all pink
Andres Freund and...@2ndquadrant.com writes: On 2013-12-10 19:55:12 -0500, Tom Lane wrote: We need a more consistent strategy for this :-( Agreed, although I have no clue how it should look like. As a further datapoint I'll add that installcheck already regularly fails in HEAD if you have a HS standby connected via SR and hot_standby_feedback=on on the standby. Some plans just change from index(only) scans to sequential scans, presumably because of the lower xmin horizon changed the stats. Since there's nothing running on the standby in those cases, there has to be a pretty damn tiny window here somewhere. The case in create_index does a vacuum analyze tenk1 and expects to get an index-only scan in the very next SQL command. So any delay in considering the table all-visible could break that test. I'm not sure if that's what you're talking about though. We could easily create some more delay for that case, for instance by moving the vacuum step to copy.sql as I was idly speculating about upthread. Do you remember offhand where the failures are? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
On Mon, Dec 09, 2013 at 11:40:41PM +0400, knizhnik wrote: Hello! I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL: Documentation: http://www.garret.ru/imcs/user_guide.html Sources: http://www.garret.ru/imcs-1.01.tar.gz Any feedbacks, bug reports and suggestions are welcome. Vertical representation of data is stored in PostgreSQL shared memory. This is why it is important to be able to utilize all available physical memory. Hi, This is very neat! The question I have, which applies to the matview support as well, is How can we transparently substitute usage of the in-memory columnar store/matview in a SQL query?. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 11 December 2013 12:08, Greg Stark st...@mit.edu wrote: So there is something clearly wonky in the histogram stats that's affected by the distribution of the sample. ...in the case where the avg width changes in a consistent manner across the table. Well spotted. ISTM we can have a specific cross check for bias in the sample of that nature. We just calculate the avg width per block and then check for correlation of the avg width against block number. If we find bias we can calculate how many extra blocks to sample and from where. There may be other biases also, so we can check for them and respond accordingly. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
From: Kevin Grittner kgri...@ymail.com It seems to be a fairly common term of art for a problem which requires a restart or reconnection. FATAL is used when the problem is severe enough that the process or connection must end. It seems to me to be what should consistently be used when a client connection or its process must be terminated for a reason other than a client-side request to terminate. What do you think of #5 and #6 when matching the above criteria? 5. FATAL: terminating walreceiver process due to administrator command 6. FATAL: terminating background worker \%s\ due to administrator command These are output when the DBA shuts down the database server and there's no client connection. That is, these don't meet the criteria. I believe these should be suppressed, or use LOG instead of FATAL. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about sorting internals
hubert depesz lubaczewski dep...@depesz.com writes: There are two simple queries: ... They differ only in order of queries in union all part. The thing is that they return the same result. Why isn't one of them returning 2005 for 6th miesiac? With such a small amount of data, you're getting an in-memory quicksort, and a well-known property of quicksort is that it isn't stable --- that is, there are no guarantees about the order in which it will return items that have equal keys. In this case it's evidently making different partitioning choices, as a consequence of the different arrival order of the rows, that just by chance end up with the 6/2004/6 row being returned before the 6/2005/6 row in both cases. You could trace through the logic and see exactly how that's happening, but I doubt it'd be a very edifying exercise. If you want to get well-defined results with DISTINCT ON, you should make the ORDER BY sort by a candidate key. Anything less opens you to uncertainty about which rows the DISTINCT will select. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
On 2013-12-12 00:31:25 +0900, MauMau wrote: What do you think of #5 and #6 when matching the above criteria? 5. FATAL: terminating walreceiver process due to administrator command 6. FATAL: terminating background worker \%s\ due to administrator command Those are important if they happen outside a shutdown. So, if you really want to remove them from there, you'd need to change the signalling to handle the cases differently. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On 11 December 2013 14:50, Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 11, 2013 at 9:43 AM, Simon Riggs si...@2ndquadrant.com wrote: On 25 November 2013 21:51, Peter Geoghegan p...@heroku.com wrote: On Sun, Nov 24, 2013 at 9:06 AM, Simon Riggs si...@2ndquadrant.com wrote: VACUUM uses 6 bytes per dead tuple. And autovacuum regularly removes dead tuples, limiting their numbers. In what circumstances will the memory usage from multiple concurrent VACUUMs become a problem? In those circumstances, reducing autovacuum_work_mem will cause more passes through indexes, dirtying more pages and elongating the problem workload. Yes, of course, but if we presume that the memory for autovacuum workers to do everything in one pass simply isn't there, it's still better to do multiple passes. That isn't clear to me. It seems better to wait until we have the memory. My feeling is this parameter is a fairly blunt approach to the problems of memory pressure on autovacuum and other maint tasks. I am worried that it will not effectively solve the problem. I don't wish to block the patch; I wish to get to an effective solution to the problem. A better aproach to handling memory pressure would be to globally coordinate workers so that we don't oversubscribe memory, allocating memory from a global pool. This is doubtless true, but that project is at least two if not three orders of magnitude more complex than what's being proposed here, and I don't think we should make the perfect the enemy of the good. It looks fairly easy to estimate the memory needed for an auto vacuum, since we know the scale factor and the tuple estimate. We can then use the memory estimate to alter the scheduling of work. And/or we can use actual memory usage and block auto vac workers if they need more memory than is currently available because of other workers. We would still benefit from a new parameter in the above sketch, but it would achieve something useful in practice. That's about 2-3 days work and I know Peter can hack it. So the situation is not perfection-sought-blocking-good, this is more like fairly poor solution being driven through when a better solution is available within the time and skills available. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Stephen Frost sfr...@snowman.net writes: The extra catalog tables which store SQL scripts in text columns is one of my main objections to the as-proposed Extension Templates. I view those scripts as a poor man's definition of database objects which are defined properly in the catalog already. I have a very hard time to understand this objection. Why? I think it has considerable force. PL/SQL functions are just a SQL script stored as-is in the catalogs. Those are the exception not the rule. Even views are stored in a textual way in the catalogs, albeit in a specific pre-processed format, This is utter nonsense. That representation has nothing to do with the original text of the CREATE VIEW command, and the fact that we store it as an ASCII string rather than some binary blob has more to do with debuggability than anything else. The important point is that we can (and sometimes do) transform the view to something else based on semantic understanding of what's in it. And we also have the ability to figure out what the view depends on, something that is mighty hard to get out of a text blob. (The fact that we don't have that for SQL functions is a serious minus of our approach to functions.) Stephen is concerned that a pure textual representation lacks any deep semantic understanding of what's in the extension, and I think that's indeed something to be concerned about. It's perhaps not a 100% show stopper, but it's something to be avoided unless the benefits of storing pure text are overwhelming. Which you don't seem to have convinced people of. So while I hear your objection to the script in catalog idea Stephen, I think we should move forward. We don't have the luxury of only applying patches where no compromise has to be made, where everyone is fully happy with the solution we find as a community. You've got that backwards. We do have the luxury of rejecting new features until people are generally satisfied that the basic design is right. There's no overlord decreeing that this must be in 9.4. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
MauMau maumau...@gmail.com wrote: From: Kevin Grittner kgri...@ymail.com FATAL is used when the problem is severe enough that the process or connection must end. It seems to me to be what should consistently be used when a client connection or its process must be terminated for a reason other than a client-side request to terminate. What do you think of #5 and #6 when matching the above criteria? 5. FATAL: terminating walreceiver process due to administrator command 6. FATAL: terminating background worker \%s\ due to administrator command Those are client connections and their backends terminated for a reason other than the client side of the connection requesting it. If we don't classify those as FATAL then the definition of FATAL becomes much more fuzzy. What would you define it to mean? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why the buildfarm is all pink
On 2013-12-11 10:07:19 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-12-10 19:55:12 -0500, Tom Lane wrote: We need a more consistent strategy for this :-( Agreed, although I have no clue how it should look like. As a further datapoint I'll add that installcheck already regularly fails in HEAD if you have a HS standby connected via SR and hot_standby_feedback=on on the standby. Some plans just change from index(only) scans to sequential scans, presumably because of the lower xmin horizon changed the stats. Since there's nothing running on the standby in those cases, there has to be a pretty damn tiny window here somewhere. The case in create_index does a vacuum analyze tenk1 and expects to get an index-only scan in the very next SQL command. So any delay in considering the table all-visible could break that test. I'm not sure if that's what you're talking about though. We could easily create some more delay for that case, for instance by moving the vacuum step to copy.sql as I was idly speculating about upthread. Do you remember offhand where the failures are? No, but they are easy enough to reproduce. Out of 10 runs, I've attached the one with the most failures and checked that it seems to contain all the failures from other runs. All of them probably could be fixed by moving things around, but I am not sure how maintainable that approach is :/ Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services --- /home/andres/src/postgresql/src/test/regress/expected/create_index.out 2013-12-08 19:57:01.646559353 +0100 +++ /home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/create_index.out 2013-12-11 16:54:00.043641015 +0100 @@ -2720,11 +2720,15 @@ SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) ORDER BY unique1; - QUERY PLAN - Index Only Scan using tenk1_unique1 on tenk1 - Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) -(2 rows) +QUERY PLAN +--- + Sort + Sort Key: unique1 + - Bitmap Heap Scan on tenk1 + Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[])) + - Bitmap Index Scan on tenk1_unique1 + Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) +(6 rows) SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) == --- /home/andres/src/postgresql/src/test/regress/expected/subselect.out 2013-12-11 16:42:18.791039738 +0100 +++ /home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/subselect.out 2013-12-11 16:54:06.823376218 +0100 @@ -727,7 +727,7 @@ - Seq Scan on public.tenk1 b Output: b.unique1, b.unique2, b.two, b.four, b.ten, b.twenty, b.hundred, b.thousand, b.twothousand, b.fivethous, b.tenthous, b.odd, b.even, b.stringu1, b.stringu2, b.string4 SubPlan 1 - - Index Only Scan using tenk1_unique1 on public.tenk1 a + - Seq Scan on public.tenk1 a Output: a.unique1 (10 rows) == --- /home/andres/src/postgresql/src/test/regress/expected/join.out 2013-12-08 19:57:01.649892559 +0100 +++ /home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/join.out 2013-12-11 16:54:08.439979745 +0100 @@ -3354,16 +3354,16 @@ explain (costs off) select count(*) from tenk1 a, tenk1 b join lateral (values(a.unique1)) ss(x) on b.unique2 = ss.x; -QUERY PLAN --- + QUERY PLAN +- Aggregate - Hash Join Hash Cond: (*VALUES*.column1 = b.unique2) - Nested Loop - - Index Only Scan using tenk1_unique1 on tenk1 a + - Seq Scan on tenk1 a - Values Scan on *VALUES* - Hash - - Index Only Scan using tenk1_unique2 on tenk1 b + - Seq Scan on tenk1 b (8 rows) select count(*) from tenk1 a, == --- /home/andres/src/postgresql/src/test/regress/expected/rowtypes.out 2013-12-08 19:14:10.614689754 +0100 +++ /home/andres/build/postgres/dev-assert/vpath/src/test/regress/results/rowtypes.out 2013-12-11 16:54:31.072429133 +0100 @@ -240,11 +240,15 @@ select thousand, tenthous from tenk1 where (thousand, tenthous) = (997, 5000) order by thousand, tenthous; -QUERY PLAN -
Re: [HACKERS] ANALYZE sampling is too good
Greg Stark st...@mit.edu writes: So I've done some empirical tests for a table generated by: create table sizeskew as (select i,j,repeat('i',i) from generate_series(1,1000) as i, generate_series(1,1000) as j); I find that using the whole block doesn't cause any problem with the avg_width field for the repeat column.That does reinforce my belief that we might not need any particularly black magic here. It does however cause a systemic error in the histogram bounds. It seems the median is systematically overestimated by more and more the larger the number of rows per block are used: Hm. You can only take N rows from a block if there actually are at least N rows in the block. So the sampling rule I suppose you are using is select up to N rows from each sampled block --- and that is going to favor the contents of blocks containing narrower-than-average rows. Now in this case, it looks like that ought to favor rows with *smaller* i values, but you say the median goes up not down. So I'm not sure what's going on. I thought at first that TOAST compression might be part of the explanation, but TOAST shouldn't kick in on rows with raw representation narrower than 2KB. Did you do a run with no upper limit on the number of rows per block? Because I'm not sure that tests with a limit in place are a good guide to what happens without it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add transforms feature
Robert Haas robertmh...@gmail.com writes: On Tue, Dec 10, 2013 at 10:35 PM, Peter Eisentraut pete...@gmx.net wrote: Here is an updated patch that implements this, makes some of the documentation improvements that you suggested, and rebases everything. I'm still kinda unimpressed by this. Behavior-changing GUC, uggh. We should have learned by now that those are usually a bad idea. In this case, we've got changes in the behavior of function calling, which seems like not only a nightmare for debugging but a fertile source of security issues. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
Hello! Implementation of IMCS itself took me about two months (with testing and writing documentation). But huge part of the code was previously written by me for other projects, so I have reused them. Most of the time I have spent in integration of this code with PostgreSQL (I was not so familiar with it before). Certainly implementations of columnar store for Oracle (Oracle Database In-Memory Option), DB2 (BLU Acceleration), ... are more convenient for users: them can execute normal SQL queries and do not require users to learn new functions and approach. But it requires complete redesign of query engine (or providing alternative implementation). I was not able to do it. This is why I try to provide advantages of vertical data representation (vector operation, parallel execution, data skipping) as well as advantages of fast access to in-memory data as standard PostgreSQL extension. There are obviously some limitations and queries look more complicated than in case of standard SQL... But from the other side it is possible to write queries which are hardly to be expressed using standard SQL. For example calculating split-adjusted prices can not be done in SQL without using stored procedures. To make usage of IMCS functions as simple as possible I defined a larger number of various operators for most popular operations. For example Volume-Weighted-Average-Price can be calculated just as: select Volume//Close as VWAP from Quote_get(); It is even shore than analog SQL statement: select sum(Close*Volume)/sum(Volume) as VWAP from Quote; Concerning integration with PostgreSQL, there were several problems. Some of them seems to have no easy solution, but other are IMHO imperfections in PostgreSQL which I hope will be fixed sometime: 1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my computer. Just defining insertion per-row trigger with empty procedure increase time of insertion of 6 million records twice - from 7 till 15 seconds. If trigger procedure is not empty, then time is increased proportionally number of performed calls. In my case inserting data with propagation it in columnar store using trigger takes about 80 seconds. But if I first load data without triggers in PostgreSQL table and then insert it in columnar store using load function (implemented in C), then time will be 7+9=16 seconds. Certainly I realize that plpgsql is interpreted language. But for example also interpreted Python is able to do 100 times more calls per second. Unfortunately profiler doesn;t show some bottleneck - looks like long calltime is caused by large overhead of initializing and resetting memory context and copying arguments data. 2. Inefficient implementation of expanding composite type columns using (foo()).* clause. In this case function foo() will be invoked as much times as there are fields in the returned composite type. Even in case of placing call in FROM list (thanks to lateral joins in 9.3), PostgreSQL still sometimes performs redundant calls which can be avoided using hack with adding OFFSET 1 clause. 3. 256Gb limit for used shared memory segment size at Linux. Concerning last problem - I have included in IMCS distributive much simpler patch which just set MAP_HUGETLB flags when a) is it defined in system headers b) requested memory size is larger than 256Gb In this case right now PostgreSQL will just fail to start. But certainly it is more correct to trigger this flag through configuration parameter, because large pages can minimize MMU overhead and so increase speed even if size of used memory is less than 256Gb (this is why Oracle is widely using it). . Вызов функции занимает прядка 2 микросекунд. Т.е. если я напишу триггер с пустой процедурой, то вставка 6 миллионов объектов займёт 15 секунд. Это при том, что без триггера вставка занимает всего 7 секунд... On 12/11/2013 06:33 PM, Merlin Moncure wrote: On Mon, Dec 9, 2013 at 1:40 PM, knizhnik knizh...@garret.ru wrote: Hello! I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL: Documentation: http://www.garret.ru/imcs/user_guide.html Sources: http://www.garret.ru/imcs-1.01.tar.gz Any feedbacks, bug reports and suggestions are welcome. Vertical representation of data is stored in PostgreSQL shared memory. This is why it is important to be able to utilize all available physical memory. Now servers with Tb or more RAM are not something exotic, especially in financial world. But there is limitation in Linux with standard 4kb pages for maximal size of mapped memory segment: 256Gb. It is possible to overcome this limitation either by creating multiple segments - but it requires too much changes in PostgreSQL memory manager. Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the system). I found several messages related with MAP_HUGETLB flag, the most recent one was from 21 of November:
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
From: Tom Lane t...@sss.pgh.pa.us Jim Nasby j...@nasby.net writes: On 12/9/13 5:56 PM, Tom Lane wrote: How so? FATAL means an error that terminates your session, which is exactly what these are. Except in these cases the user never actually got a working session; their request was denied. To be clear, from the client standpoint it's certainly fatal, but not from the server's point of view. This is fully expected behavior as far as the server is concerned. (Obviously it might be an error that caused the shutdown/recovery, but that's something different.) Right, but as already pointed out in this thread, these messages are worded from the client's point of view. The client never got a working connection seems to me to be an empty distinction. If you got SIGTERM'd before you could issue your first query, should that not be FATAL because you'd not gotten any work done? More generally, we also say FATAL for all sorts of entirely routine connection failures, like wrong password or mistyped user name. People don't seem to have a problem with those. Even if some do complain, the costs of changing that behavior after fifteen-years-and-counting would certainly exceed any benefit. I agree that #1-#3 are of course reasonable when there's any client the user runs. The problem is that #1 (The database system is starting up) is output in the server log by pg_ctl. In that case, there's no client the user is responsible for. Why does a new DBA have to be worried about that FATAL message? He didn't do anything wrong. I thought adding options='-c log_min_messages=PANIC' to the connection string for PQping() in pg_ctl.c would vanish the message, but it didn't. The reason is that connection options take effect in PostgresMain(), which is after checking the FATAL condition in ProcessStartupPacket(). Do you think there is any good solution? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
Robert Haas robertmh...@gmail.com writes: On Wed, Dec 11, 2013 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: One use case is accessing a particular host when using DNS round robin to standbys in combination with SSL. Ah, interesting point. And it's not inconceivable that some application out there could be using PQhost() to retrieve the host from an existing connection object and reusing that value for a new connection, in which case redefining it to sometimes return hostaddr would break things. So I think we shouldn't do that. I think the only reasonable way to fix this is to improve the logic in psql, not turn PQhost() into a mess with no understandable definition. If that means we need to add a separate PQhostaddr() query function, so be it. We won't be able to fix the complained-of bug in back branches, but I'd rather live with that (it's basically just cosmetic anyway) than risk introducing perhaps-not-so-cosmetic bugs into other existing applications. In general, I think the definition of these query functions ought to be what was the value of this parameter when the connection was made. As such, I'm not even sure that the pgunixsocket behavior that's in PQhost now is a good idea, much less that we should extend that hack to cover DefaultHost. There is room also for a function defined as give me a textual description of what I'm connected to, which is not meant to reflect any single connection parameter but rather the total behavior. Right now I think PQhost is on the borderline of doing this instead of just reporting the host parameter, but I think rather than pushing it across that border we'd be better off to invent a function explicitly charged with doing that. That would give us room to do something actually meaningful with host+hostaddr cases, for instance. I think really what ought to be printed in such cases is something like host-name (address IP-address); leaving out the former would be unhelpful but leaving out the latter is outright misleading. On the other hand, I'm not sure how much of a translatability problem it'd be to wedge such a description into a larger message. Might be better to just put the logic into psql. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On 2013-12-10 19:11:03 -0500, Robert Haas wrote: Committed #1 (again). Regarding this: + /* XXX: we could also do this unconditionally, the space is used anyway + if (copy_oid) + HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); I would like to put in a big +1 for doing that unconditionally. I didn't make that change before committing, but I think it'd be a very good idea. Patch attached. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 47c93d8e7afbcaef268de66246571cdc2f134c97 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Wed, 11 Dec 2013 17:20:27 +0100 Subject: [PATCH] Dep't of second thoughts: Always include oids in WAL logged replica identities. Since replica identities are logged using the normal format for heap tuples, the space for oids in WITH OIDS tables was already used, so there's little point in only including the oid if it is included in the configured IDENTITY. Per comment from Robert Haas. --- src/backend/access/heap/heapam.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 249fffe..e647453 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6638,7 +6638,6 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool * TupleDesc idx_desc; char replident = relation-rd_rel-relreplident; HeapTuple key_tuple = NULL; - bool copy_oid = false; bool nulls[MaxHeapAttributeNumber]; Datum values[MaxHeapAttributeNumber]; int natt; @@ -6698,7 +6697,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool * int attno = idx_rel-rd_index-indkey.values[natt]; if (attno == ObjectIdAttributeNumber) - copy_oid = true; + /* copied below */ + ; else if (attno 0) elog(ERROR, system column in index); else @@ -6709,8 +6709,12 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool * *copy = true; RelationClose(idx_rel); - /* XXX: we could also do this unconditionally, the space is used anyway */ - if (copy_oid) + /* + * Always copy oids if the table has them, even if not included in the + * index. The space in the logged tuple is used anyway, so there's little + * point in not including the information. + */ + if (relation-rd_rel-relhasoids) HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); /* -- 1.8.5.rc2.dirty -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
Hi, I depends on what you mean by transparently substitute. I f you want to be able to execute standard SQL queries using columnar store, then it seems to be impossible without rewriting of executor. I provided another approach based on calling standard functions which perform manipulations not with scalar types but with timeseries. For example instead of standard SQL select sum(ClosePrice) from Quote; I will have to write: select cs_sum(ClosePrice) from Quote_get(); It looks similar but not quite the same. And for more complex queries difference is larger. For example the query select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office); can be written as select agg_val,cs_cut(group_by,'c22c30c10') from (select (cs_project_agg(ss1.*)).* from (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q, cs_hash_sum(q.score*q.volenquired, q.trader||q.desk||q.office) s1, cs_hash_sum(q.volenquired, q.trader||q.desk||q.office) s2) ss1) ss2; Looks too complex, doesn't it? But first two lines are responsible to perform reverse mapping: from vertical data representation to normal horisontal tuples. The good thing is that this query is executed more than 1000 times faster (with default PostgreSQL configuration parameters except shared shared_buffers which was set large enough to fit all data in memory). On 12/11/2013 07:14 PM, k...@rice.edu wrote: On Mon, Dec 09, 2013 at 11:40:41PM +0400, knizhnik wrote: Hello! I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL: Documentation: http://www.garret.ru/imcs/user_guide.html Sources: http://www.garret.ru/imcs-1.01.tar.gz Any feedbacks, bug reports and suggestions are welcome. Vertical representation of data is stored in PostgreSQL shared memory. This is why it is important to be able to utilize all available physical memory. Hi, This is very neat! The question I have, which applies to the matview support as well, is How can we transparently substitute usage of the in-memory columnar store/matview in a SQL query?. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why the buildfarm is all pink
Kevin Grittner kgri...@ymail.com writes: Kevin Grittner kgri...@ymail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: I haven't touched matview.sql here; that seems like a distinct issue. I'll fix that. Done. Thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
MauMau maumau...@gmail.com writes: I agree that #1-#3 are of course reasonable when there's any client the user runs. The problem is that #1 (The database system is starting up) is output in the server log by pg_ctl. In that case, there's no client the user is responsible for. Why does a new DBA have to be worried about that FATAL message? He didn't do anything wrong. FATAL doesn't mean the DBA did something wrong. It means we terminated a client session. The fundamental problem IMO is that you want to complicate the definition of what these things mean as a substitute for DBAs learning something about Postgres. That seems like a fool's errand from here. They're going to have to learn what FATAL means sooner or later, and making it more complicated just raises the height of that barrier. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
From: Andres Freund and...@2ndquadrant.com On 2013-12-12 00:31:25 +0900, MauMau wrote: 5. FATAL: terminating walreceiver process due to administrator command 6. FATAL: terminating background worker \%s\ due to administrator command Those are important if they happen outside a shutdown. So, if you really want to remove them from there, you'd need to change the signalling to handle the cases differently. How are they important? If someone mistakenly sends SIGTERM to walreceiver and background workers, they are automatically launched by postmaster or startup process later like other background processes. But other background processes such as walsender, bgwriter, etc. don't emit FATAL messages. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why the buildfarm is all pink
Andres Freund and...@2ndquadrant.com writes: On 2013-12-11 10:07:19 -0500, Tom Lane wrote: Do you remember offhand where the failures are? No, but they are easy enough to reproduce. Out of 10 runs, I've attached the one with the most failures and checked that it seems to contain all the failures from other runs. All of them probably could be fixed by moving things around, but I am not sure how maintainable that approach is :/ Thanks for doing the legwork. These all seem to be cases where the planner decided against doing an index-only scan on tenk1, which is presumably because its relallvisible fraction is too low. But these are later in the test series than the vacuum analyze tenk1 that's currently present in create_index, and most of them are even later than the database-wide VACUUM in sanity_check. So those vacuums are failing to mark the table as all-visible, even though it's not changed since the COPY test. This seems odd. Do you know why your slave server is holding back the xmin horizon so much? After looking at this, I conclude that moving the vacuums earlier would probably make things worse not better, because the critical interval seems to be from the COPY TO tenk1 command to the vacuum command. So the idea of putting vacuums into the COPY test is a bad one, and I'll proceed with the patch I posted yesterday for moving the ANALYZE steps around. I think fixing what you're seeing is going to be a different issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
I wrote: Hm. You can only take N rows from a block if there actually are at least N rows in the block. So the sampling rule I suppose you are using is select up to N rows from each sampled block --- and that is going to favor the contents of blocks containing narrower-than-average rows. Oh, no, wait: that's backwards. (I plead insufficient caffeine.) Actually, this sampling rule discriminates *against* blocks with narrower rows. You previously argued, correctly I think, that sampling all rows on each page introduces no new bias because row width cancels out across all sampled pages. However, if you just include up to N rows from each page, then rows on pages with more than N rows have a lower probability of being selected, but there's no such bias against wider rows. This explains why you saw smaller values of i being undersampled. Had you run the test series all the way up to the max number of tuples per block, which is probably a couple hundred in this test, I think you'd have seen the bias go away again. But the takeaway point is that we have to sample all tuples per page, not just a limited number of them, if we want to change it like this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
From: Kevin Grittner kgri...@ymail.com 5. FATAL: terminating walreceiver process due to administrator command 6. FATAL: terminating background worker \%s\ due to administrator command Those are client connections and their backends terminated for a reason other than the client side of the connection requesting it. If we don't classify those as FATAL then the definition of FATAL becomes much more fuzzy. What would you define it to mean? I'm sorry to cause you trouble, but my understanding is that those are not client connections. They are just background server processes; walreceiver is started by startup process, and background workers are started by extension modules. Am I misunderstanding something? According to Table 18-1 in the manual: http://www.postgresql.org/docs/current/static/runtime-config-logging.html the definition of FATAL is: FATAL Reports an error that caused the current session to abort. This does not apply to the above messages, because there is no error. The DBA just shut down the database server, and the background processes terminated successfully. If some message output is desired, LOG's definition seems the nearest: LOG Reports information of interest to administrators, e.g., checkpoint activity. So, I thought ereport(LOG, ...); proc_exit(0) is more appropriate than ereport(FATAL, ...). Is this so strange? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
On 12/11/2013 08:48 AM, Tom Lane wrote: The fundamental problem IMO is that you want to complicate the definition of what these things mean as a substitute for DBAs learning something about Postgres. That seems like a fool's errand from here. They're going to have to learn what FATAL means sooner or later, and making it more complicated just raises the height of that barrier. I don't think it works to change the NOTICE/ERROR/FATAL tags; for one thing, I can hear the screaming about people's log scripts from here. However, it would really be useful to have an extra tag (in addition to the ERROR or FATAL) for If you're seeing this message, something has gone seriously wrong on the server. Just stuff like corruption messages, backend crashes, etc. Otherwise we're requiring users to come up with an alphabet soup of regexes to filter out the noise error messages from the really, really important ones. Speaking as someone who does trainings for new DBAs, the part where I do what to look for in the logs requires over an hour and still doesn't cover everything. And doesn't internationalize. That's nasty. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
On 12/10/13, 5:08 PM, Tom Lane wrote: Having said that, I can't get very excited about this feature anyway, so I'm fine with rejecting the patch. I'm not sure that enough people care to justify any added overhead at all. The long and the short of it is that network traffic generally is what it is, for any given query workload, and so it's not clear what's the point of counting it. Also, if we add this, the next guy is going to want to add CPU statistics, memory statistics, etc. Is there a reason why you can't get this directly from the OS? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying wide tables in psql
Hi. I've improved the patch. It works in expanded mode when either format option is set to wrapped (\pset format wrapped), or we have no pager, or pager doesn't chop long lines (so you can still use the trick). Target output width is taken from either columns option (\pset columns 70), or environment variable $COLUMNS, or terminal size. And it's also compatible with any border style (\pset border 0|1|2). Here are some examples: postgres=# \x 1 postgres=# \pset format wrapped postgres=# \pset border 0 postgres=# select * from wide_table; * Record 1 value afadsafasd fasdf asdfasd fsad fas df sadf sad f sadf sadf sa df sadfsadfa sd fsad fsa df sadf asd fa sfd sadfsadf asdf sad f sadf sad fadsf * Record 2 value afadsafasd fasdf asdfasd postgres=# \pset border 1 postgres=# \pset columns 70 postgres=# select * from wide_table; -[ RECORD 1 ]- value | afadsafasd fasdf asdfasd fsad fas df sadf sad f sadf sadf sa | df sadfsadfasd fsad fsa df sadf asd fa sfd sadfsadf asdf sad f | sadf sad fadsf -[ RECORD 2 ]- value | afadsafasd fasdf asdfasd postgres=# \pset border 2 postgres=# \pset columns 60 postgres=# select * from wide_table; +-[ RECORD 1 ]-+ | value | afadsafasd fasdf asdfasd fsad fas df sadf sad f | | | sadf sadf sa df sadfsadfasd fsad fsa df sadf as | | | d fa sfd sadfsadf asdf sad f sadf sad fadsf | +-[ RECORD 2 ]-+ | value | afadsafasd fasdf asdfasd | +---+--+ Regards, Sergey 2013/12/10 Jeff Janes jeff.ja...@gmail.com On Mon, Dec 2, 2013 at 10:45 PM, Sergey Muraviov sergey.k.murav...@gmail.com wrote: Hi. Psql definitely have a problem with displaying wide tables. Even in expanded mode, they look horrible. So I tried to solve this problem. I get compiler warnings: print.c: In function 'print_aligned_vertical': print.c:1238: warning: ISO C90 forbids mixed declarations and code print.c: In function 'print_aligned_vertical': print.c:1238: warning: ISO C90 forbids mixed declarations and code But I really like this and am already benefiting from it. No point in having the string of hyphens between every record wrap to be 30 lines long just because one field somewhere down the list does so. And configuring the pager isn't much of a solution because the pager doesn't know that the hyphens are semantically different than the other stuff getting thrown at it. Cheers, Jeff -- Best regards, Sergey Muraviov From be9f01777599dc5e84c417e5cae56459677a88d4 Mon Sep 17 00:00:00 2001 From: Sergey Muraviov sergey.k.murav...@gmail.com Date: Wed, 11 Dec 2013 20:17:26 +0400 Subject: [PATCH] wrapped tables in expanded mode --- src/bin/psql/print.c | 123 --- 1 file changed, 118 insertions(+), 5 deletions(-) diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 736225c..4c37f7d 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -124,6 +124,7 @@ const printTextFormat pg_utf8format = /* Local functions */ static int strlen_max_width(unsigned char *str, int *target_width, int encoding); +static bool IsWrappingNeeded(const printTableContent *cont, bool is_pager); static void IsPagerNeeded(const printTableContent *cont, const int extra_lines, bool expanded, FILE **fout, bool *is_pager); @@ -1234,6 +1235,45 @@ print_aligned_vertical(const printTableContent *cont, FILE *fout) fprintf(fout, %s\n, cont-title); } + if (IsWrappingNeeded(cont, is_pager)) + { + int output_columns = 0; + /* + * Choose target output width: \pset columns, or $COLUMNS, or ioctl + */ + if (cont-opt-columns 0) + output_columns = cont-opt-columns; + else + { + if (cont-opt-env_columns 0) +output_columns = cont-opt-env_columns; +#ifdef TIOCGWINSZ + else + { +struct winsize screen_size; + +if (ioctl(fileno(stdout), TIOCGWINSZ, screen_size) != -1) + output_columns = screen_size.ws_col; + } +#endif + } + + output_columns -= hwidth; + + if (opt_border == 0) + output_columns -= 1; + else + { + output_columns -= 3; /* -+- */ + + if (opt_border 1) +output_columns -= 4; /* +--+ */ + } + + if ((output_columns 0) (dwidth output_columns)) + dwidth = output_columns; + } + /* print records */ for (i = 0, ptr = cont-cells; *ptr; i++, ptr++) { @@ -1294,12 +1334,49 @@ print_aligned_vertical(const printTableContent *cont, FILE *fout) if (!dcomplete) { -if (opt_border 2) - fprintf(fout, %s\n, dlineptr[line_count].ptr); +if (dlineptr[line_count].width dwidth) +{ + int offset = 0; + int chars_to_output = dlineptr[line_count].width; + while (chars_to_output 0) + { + int target_width, bytes_to_output; + + if (offset 0) +
Re: [HACKERS] autovacuum_work_mem
On Wed, Dec 11, 2013 at 10:41 AM, Simon Riggs si...@2ndquadrant.com wrote: It looks fairly easy to estimate the memory needed for an auto vacuum, since we know the scale factor and the tuple estimate. We can then use the memory estimate to alter the scheduling of work. And/or we can use actual memory usage and block auto vac workers if they need more memory than is currently available because of other workers. We would still benefit from a new parameter in the above sketch, but it would achieve something useful in practice. That's about 2-3 days work and I know Peter can hack it. So the situation is not perfection-sought-blocking-good, this is more like fairly poor solution being driven through when a better solution is available within the time and skills available. I don't agree with that assessment. Anything that involves changing the scheduling of autovacuum is a major project that will legitimately provoke much controversy. Extensive testing will be needed to prove that the new algorithm doesn't perform worse than the current algorithm in any important cases. I have my doubts about whether that can be accomplished in an entire release cycle, let alone 2-3 days. In contrast, the patch proposed does something that is easy to understand, clearly safe, and an improvement over what we have now. Quite apart from the amount of development time required, I think that the idea that we would use the availability of memory to schedule work is highly suspect. You haven't given any details on what you think that algorithm might look like, and I doubt that any simple solution will do. If running more autovacuum workers drives the machine into swap, then we shouldn't, but we have no way of calculating what size memory allocation will cause that to happen. But NOT running autovacuum workers isn't safe either, because it could cause table bloat that them drives the machine into swap. We have no way of knowing whether that will happen either. So I think your contention that we have the necessary information available to make an intelligent decision is incorrect. Regardless, whether or not a more complex change is within Peter's technical capabilities is utterly irrelevant to whether we should adopt the proposed patch. Your phrasing seems to imply that you would not ask such a thing of a less-talented individual, and I think that is utterly wrong. Peter's technical acumen does not give us the right to ask him to write a more complex patch as a condition of getting a simpler one accepted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut pete...@gmx.net wrote: On 12/10/13, 5:08 PM, Tom Lane wrote: Having said that, I can't get very excited about this feature anyway, so I'm fine with rejecting the patch. I'm not sure that enough people care to justify any added overhead at all. The long and the short of it is that network traffic generally is what it is, for any given query workload, and so it's not clear what's the point of counting it. Also, if we add this, the next guy is going to want to add CPU statistics, memory statistics, etc. Is there a reason why you can't get this directly from the OS? I would say that its more of a convenience to track the usage directly from the database instead of setting up OS infrastructure to store it. That said, it should be possible to directly do it from OS level. Can we think of adding this to pgtop, though? I am just musing here. Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/12/13 06:22, Tom Lane wrote: I wrote: Hm. You can only take N rows from a block if there actually are at least N rows in the block. So the sampling rule I suppose you are using is select up to N rows from each sampled block --- and that is going to favor the contents of blocks containing narrower-than-average rows. Oh, no, wait: that's backwards. (I plead insufficient caffeine.) Actually, this sampling rule discriminates *against* blocks with narrower rows. You previously argued, correctly I think, that sampling all rows on each page introduces no new bias because row width cancels out across all sampled pages. However, if you just include up to N rows from each page, then rows on pages with more than N rows have a lower probability of being selected, but there's no such bias against wider rows. This explains why you saw smaller values of i being undersampled. Had you run the test series all the way up to the max number of tuples per block, which is probably a couple hundred in this test, I think you'd have seen the bias go away again. But the takeaway point is that we have to sample all tuples per page, not just a limited number of them, if we want to change it like this. regards, tom lane Surely we want to sample a 'constant fraction' (obviously, in practice you have to sample an integral number of rows in a page!) of rows per page? The simplest way, as Tom suggests, is to use all the rows in a page. However, if you wanted the same number of rows from a greater number of pages, you could (for example) select a quarter of the rows from each page. In which case, when this is a fractional number: take the integral number of rows, plus on extra row with a probability equal to the fraction (here 0.25). Either way, if it is determined that you need N rows, then keep selecting pages at random (but never use the same page more than once) until you have at least N rows. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/12/13 06:22, Tom Lane wrote: I wrote: Hm. You can only take N rows from a block if there actually are at least N rows in the block. So the sampling rule I suppose you are using is select up to N rows from each sampled block --- and that is going to favor the contents of blocks containing narrower-than-average rows. Oh, no, wait: that's backwards. (I plead insufficient caffeine.) Actually, this sampling rule discriminates *against* blocks with narrower rows. You previously argued, correctly I think, that sampling all rows on each page introduces no new bias because row width cancels out across all sampled pages. However, if you just include up to N rows from each page, then rows on pages with more than N rows have a lower probability of being selected, but there's no such bias against wider rows. This explains why you saw smaller values of i being undersampled. Had you run the test series all the way up to the max number of tuples per block, which is probably a couple hundred in this test, I think you'd have seen the bias go away again. But the takeaway point is that we have to sample all tuples per page, not just a limited number of them, if we want to change it like this. regards, tom lane Hmm... In my previous reply, which hasn't shown up yet! I realized I made a mistake! The fraction/probability could any of 0.25. 0.50, and 0.75. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
On Wed, Dec 11, 2013 at 10:08 AM, knizhnik knizh...@garret.ru wrote: 1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my computer. Just defining insertion per-row trigger with empty procedure increase time of insertion of 6 million records twice - from 7 till 15 seconds. If trigger procedure is not empty, then time is increased proportionally number of performed calls. In my case inserting data with propagation it in columnar store using trigger takes about 80 seconds. But if I first load data without triggers in PostgreSQL table and then insert it in columnar store using load function (implemented in C), then time will be 7+9=16 seconds. Yeah. For this problem, we either unfortunately have to try to try to use standard sql functions in such away that supports inlining (this is a black art mostly, and fragile), or move logic out of the function and into the query via things like window functions, or just deal with the performance hit. postgres flavored SQL is pretty much the most productive language on the planet AFAIC, but the challenge is always performance, performance. Down the line, I am optimistic per call function overhead can be optimized, probably by expanding what can be inlined somehow. The problem is that this requires cooperation from the language executors this is not currently possible through the SPI interface, so I really don't know. Certainly I realize that plpgsql is interpreted language. But for example also interpreted Python is able to do 100 times more calls per second. Unfortunately profiler doesn;t show some bottleneck - looks like long calltime is caused by large overhead of initializing and resetting memory context and copying arguments data. 2. Inefficient implementation of expanding composite type columns using (foo()).* clause. In this case function foo() will be invoked as much times as there are fields in the returned composite type. Even in case of placing call in FROM list (thanks to lateral joins in 9.3), PostgreSQL still sometimes performs redundant calls which can be avoided using hack with adding OFFSET 1 clause. Yeah, this is long standing headache. LATERAL mostly deals with this but most cases (even with pre-9.3) can be worked around one way or another. 3. 256Gb limit for used shared memory segment size at Linux. I figure this will be solved fairly soon. It's a nice problem to have. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On 12/11/2013 09:57 AM, Robert Haas wrote: I don't agree with that assessment. Anything that involves changing the scheduling of autovacuum is a major project that will legitimately provoke much controversy. Extensive testing will be needed to prove that the new algorithm doesn't perform worse than the current algorithm in any important cases. I have my doubts about whether that can be accomplished in an entire release cycle, let alone 2-3 days. In contrast, the patch proposed does something that is easy to understand, clearly safe, and an improvement over what we have now. +1 There is an inherent tuning and troubleshooting challenge in anything involving a feedback loop. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Wed, Dec 11, 2013 at 6:27 AM, Simon Riggs si...@2ndquadrant.com wrote: On 11 December 2013 06:36, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I think this feature will be used in a lot of scenarios in which PITR is currently used. We have to judge which is better, we get something potential or to protect stupid. And we had better to wait author's comment... I'd say just document that it wouldn't make sense to use it for PITR. There may be some use case we can't see yet, so specifically prohibiting a use case that is not dangerous seems too much at this point. I will no doubt be reminded of these words in the future... Hi all, I tend to agree with Simon, but I confess that I don't liked to delay a server with standby_mode = 'off'. The main goal of this patch is delay the Streaming Replication, so if the slave server isn't a hot-standby I think makes no sense to delay it. Mitsumasa suggested to add StandbyModeRequested in conditional branch to skip this situation. I agree with him! And I'll change 'recoveryDelay' (functions, variables) to 'standbyDelay'. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] logical changeset generation v6.8
On Wed, Dec 11, 2013 at 11:25 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-10 19:11:03 -0500, Robert Haas wrote: Committed #1 (again). Regarding this: + /* XXX: we could also do this unconditionally, the space is used anyway + if (copy_oid) + HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); I would like to put in a big +1 for doing that unconditionally. I didn't make that change before committing, but I think it'd be a very good idea. Patch attached. Committed with kibitzing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
On Wed, Dec 11, 2013 at 11:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Dec 11, 2013 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: One use case is accessing a particular host when using DNS round robin to standbys in combination with SSL. Ah, interesting point. And it's not inconceivable that some application out there could be using PQhost() to retrieve the host from an existing connection object and reusing that value for a new connection, in which case redefining it to sometimes return hostaddr would break things. So I think we shouldn't do that. I think the only reasonable way to fix this is to improve the logic in psql, not turn PQhost() into a mess with no understandable definition. If that means we need to add a separate PQhostaddr() query function, so be it. We won't be able to fix the complained-of bug in back branches, but I'd rather live with that (it's basically just cosmetic anyway) than risk introducing perhaps-not-so-cosmetic bugs into other existing applications. I can't argue with that. In general, I think the definition of these query functions ought to be what was the value of this parameter when the connection was made. As such, I'm not even sure that the pgunixsocket behavior that's in PQhost now is a good idea, much less that we should extend that hack to cover DefaultHost. Well, returning /tmp on Windows is just stupid. I don't see why we should feel bad about changing that. A bug is a bug. There is room also for a function defined as give me a textual description of what I'm connected to, which is not meant to reflect any single connection parameter but rather the total behavior. Right now I think PQhost is on the borderline of doing this instead of just reporting the host parameter, but I think rather than pushing it across that border we'd be better off to invent a function explicitly charged with doing that. That would give us room to do something actually meaningful with host+hostaddr cases, for instance. I think really what ought to be printed in such cases is something like host-name (address IP-address); leaving out the former would be unhelpful but leaving out the latter is outright misleading. On the other hand, I'm not sure how much of a translatability problem it'd be to wedge such a description into a larger message. Might be better to just put the logic into psql. libpq needs to expose enough functionality to make this simple for psql, but psql should be the final arbiter of the output format. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Shouldn't we remove annoying FATAL messages from server log?
Josh Berkus j...@agliodbs.com writes: However, it would really be useful to have an extra tag (in addition to the ERROR or FATAL) for If you're seeing this message, something has gone seriously wrong on the server. Just stuff like corruption messages, backend crashes, etc. Right, we've discussed that idea elsewhere; there's a basically orthogonal classification that needs to happen. Pretty much all PANICs are high priority from a DBA's perspective, but only a subset of either FATAL or ERROR are. Somebody needs to do the legwork to determine just what kind of classification scheme we want and propose at least an initial set of ereports to be so marked. One thought I had was that we could probably consider the default behavior (in the absence of any call of an explicit criticality-marking function) to be like this: for ereport(), it's critical if a PANIC and otherwise not for elog(), it's critical if = ERROR level, otherwise not. The rationale for this is that we generally use elog for not-supposed-to-happen cases, so those are probably interesting. If we start getting complaints about some elog not being so interesting, we can convert it to an ereport so it can include an explicit marking call. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Completing PL support for Event Triggers
On 12/11/13, 5:06 AM, Dimitri Fontaine wrote: Peter Eisentraut pete...@gmx.net writes: I think you are mistaken. My patch includes all changes between your v1 and v2 patch. I mistakenly remembered that we did remove all the is_event_trigger business from the plperl patch too, when it's not the case. Sorry about this confusion. My vote is for “ready for commit” then. PL/Perl was committed. Please update the commit fest entry with your plans about PL/Python. (Returned with Feedback or move to next CF or close and create a separate entry?) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
Atri Sharma atri.j...@gmail.com writes: On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut pete...@gmx.net wrote: Is there a reason why you can't get this directly from the OS? I would say that its more of a convenience to track the usage directly from the database instead of setting up OS infrastructure to store it. The thing that I'm wondering is why the database would be the right place to be measuring it at all. If you've got a network usage problem, aggregate usage across everything on the server is probably what you need to be worried about, and PG can't tell you that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
k...@rice.edu k...@rice.edu wrote: The question I have, which applies to the matview support as well, is How can we transparently substitute usage of the in-memory columnar store/matview in a SQL query?. My take on that regarding matviews is: (1) It makes no sense to start work on this without a far more sophisticated concept of matview freshness (or staleness, as some products prefer to call it). (2) Work on query rewrite to use sufficiently fresh matviews to optimize the execution of a query and work on freshness tracking are orthogonal to work on incremental maintenance. I have no plans to work on either matview freshness or rewrite, as there seems to be several years worth of work to get incremental maintenance up to a level matching other products. I welcome anyone else to take on those other projects. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On Wed, Dec 11, 2013 at 10:43 AM, Tom Lane t...@sss.pgh.pa.us wrote: So while I hear your objection to the script in catalog idea Stephen, I think we should move forward. We don't have the luxury of only applying patches where no compromise has to be made, where everyone is fully happy with the solution we find as a community. You've got that backwards. We do have the luxury of rejecting new features until people are generally satisfied that the basic design is right. There's no overlord decreeing that this must be in 9.4. I strongly agree. PostgreSQL has succeeded because we try not to do things at all until we're sure we know how to do them right. Sometimes we lag behind in features or performance as a result of that - but the upside is that when we say something now works, it does. Moreover, it means that the number of bad design decisions we're left to support off into eternity is comparatively small. Those things are of great benefit to our community. I can certainly understand Dimitri's frustration, in that he's written several versions of this patch and none have been accepted. But what that means is that none of those approaches have consensus behind them, which is another way of saying that, as a community, we really *don't* know the best way to solve this problem, and our community policy in that situation is to take no action until we do. I've certainly had my own share of disappointments about patches I've written which I believed, and in some cases still believe, to be really good work, and I'd really like to be able to force them through. But that's not how it works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
Robert Haas robertmh...@gmail.com writes: On Wed, Dec 11, 2013 at 11:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: In general, I think the definition of these query functions ought to be what was the value of this parameter when the connection was made. As such, I'm not even sure that the pgunixsocket behavior that's in PQhost now is a good idea, much less that we should extend that hack to cover DefaultHost. Well, returning /tmp on Windows is just stupid. I don't see why we should feel bad about changing that. A bug is a bug. What I was suggesting was we should take out the pgunixsocket fallback, not make it even more complicated. That probably implies that we need still another accessor function to get the socket path. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On Thu, Nov 21, 2013 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-11-21 15:59:35 -0500, Robert Haas wrote: * Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId? It seems quite possible that people think they've delt with frozen xmin entirely after checking, but they still might get FrozenTransactionId back in a pg_upgraded cluster. The reason I originally wrote the patch the way I did, rather than the way that you prefer, is that it minimizes the number of places where we might perform extra tests that are known not to be needed in context. These code paths are hot. The patch as sent shouldn't really do that in any of paths I know to be hot - it uses *RawXmin() there. If you do this sort of thing, then after macro expansion we may end up with a lot of things like: (flags FROZEN) || (rawxid == 2) ? 2 : rawxid. I want to avoid that. But in which cases would that actually be slower? There'll be no additional code executed if the hint bits for frozen are set, and in case not it will usually safe us an external function call to TransactionIdPrecedes(). Dunno. It's at least more code generation. That macros is intended, specifically, to be a test for flag bits, and I think it should do precisely that. If that's not what you want, then don't use that macro. That's a fair argument. Although there's several HeapTupleHeader* macros that muck with stuff besides infomask. Sure, but that doesn't mean they ALL have to. * Existing htup_details boolean checks contain an 'Is', but HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid, HeapTupleHeaderXminFrozen don't contain any verb. Not sure. We could say XminIsComitted, XminIsInvalid, XminIsFrozen, etc. I don't particularly care for it, but I can see the argument for it. I don't have a clear preference either, I just noticed the inconsistency and wasn't sure whether it was intentional. It was intentional enough. :-) I think once we have this we should start opportunistically try to freeze tuples during vacuum using OldestXmin instead of FreezeLimit if the page is already dirty. Separate patch, but yeah, something like that. If we have to mark the page all-visible, we might as well freeze it while we're there. We should think about how it interacts with Heikki's freeze-without-write patch though. Definitely separate yes. And I agree, it's partially moot if Heikki's patch gets in, but I am not sure it will make it into 9.4. There seems to be quite some work left. I haven't heard anything further from Heikki, so I'm thinking we should proceed with this approach. It seems to be the path of least resistance, if not essential, for making CLUSTER freeze everything automatically, a change almost everyone seems to really want. Even if we did have Heikki's stuff, making cluster freeze more aggressively is still a good argument for doing this. The pages can then be marked all-visible (something Bruce is working on) and never need to be revisited. Without this, I don't think we can get there. If we also handle the vacuum-dirtied-it-already case as you propose here, I think we'd have quite a respectable improvement in vacuum behavior for 9.4, even without Heikki's stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/12/13 07:22, Gavin Flower wrote: On 12/12/13 06:22, Tom Lane wrote: I wrote: Hm. You can only take N rows from a block if there actually are at least N rows in the block. So the sampling rule I suppose you are using is select up to N rows from each sampled block --- and that is going to favor the contents of blocks containing narrower-than-average rows. Oh, no, wait: that's backwards. (I plead insufficient caffeine.) Actually, this sampling rule discriminates *against* blocks with narrower rows. You previously argued, correctly I think, that sampling all rows on each page introduces no new bias because row width cancels out across all sampled pages. However, if you just include up to N rows from each page, then rows on pages with more than N rows have a lower probability of being selected, but there's no such bias against wider rows. This explains why you saw smaller values of i being undersampled. Had you run the test series all the way up to the max number of tuples per block, which is probably a couple hundred in this test, I think you'd have seen the bias go away again. But the takeaway point is that we have to sample all tuples per page, not just a limited number of them, if we want to change it like this. regards, tom lane Surely we want to sample a 'constant fraction' (obviously, in practice you have to sample an integral number of rows in a page!) of rows per page? The simplest way, as Tom suggests, is to use all the rows in a page. However, if you wanted the same number of rows from a greater number of pages, you could (for example) select a quarter of the rows from each page. In which case, when this is a fractional number: take the integral number of rows, plus on extra row with a probability equal to the fraction (here 0.25). Either way, if it is determined that you need N rows, then keep selecting pages at random (but never use the same page more than once) until you have at least N rows. Cheers, Gavin Yes the fraction/probability, could actually be one of: 0.25, 0.50, 0.75. But there is a bias introduced by the arithmetic average size of the rows in a page. This results in block sampling favouring large rows, as they are in a larger proportion of pages. For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes, using 400 byte pages. In the pathologically worst case, assuming maximum packing density and no page has both types: the large rows would occupy 500 pages and the smaller rows 50 pages. So if one selected 11 pages at random, you get about 10 pages of large rows and about one for small rows! In practice, it would be much less extreme - for a start, not all blocks will be fully packed, most blocks would have both types of rows, and there is usually greater variation in row size - but still a bias towards sampling larger rows. So somehow, this bias needs to be counteracted. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
On Wed, Dec 11, 2013 at 2:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Dec 11, 2013 at 11:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: In general, I think the definition of these query functions ought to be what was the value of this parameter when the connection was made. As such, I'm not even sure that the pgunixsocket behavior that's in PQhost now is a good idea, much less that we should extend that hack to cover DefaultHost. Well, returning /tmp on Windows is just stupid. I don't see why we should feel bad about changing that. A bug is a bug. What I was suggesting was we should take out the pgunixsocket fallback, not make it even more complicated. That probably implies that we need still another accessor function to get the socket path. Well, I guess. I have a hard time seeing whatever rejiggering we want to do in master as a reason not to back-patch that fix, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On 12/11/2013 09:17 PM, Robert Haas wrote: On Thu, Nov 21, 2013 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-11-21 15:59:35 -0500, Robert Haas wrote: Separate patch, but yeah, something like that. If we have to mark the page all-visible, we might as well freeze it while we're there. We should think about how it interacts with Heikki's freeze-without-write patch though. Definitely separate yes. And I agree, it's partially moot if Heikki's patch gets in, but I am not sure it will make it into 9.4. There seems to be quite some work left. I haven't heard anything further from Heikki, so I'm thinking we should proceed with this approach. +1. It seems unlikely that my patch is going to make it into 9.4. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
Robert Haas robertmh...@gmail.com writes: On Wed, Dec 11, 2013 at 2:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Well, returning /tmp on Windows is just stupid. I don't see why we should feel bad about changing that. A bug is a bug. What I was suggesting was we should take out the pgunixsocket fallback, not make it even more complicated. That probably implies that we need still another accessor function to get the socket path. Well, I guess. I have a hard time seeing whatever rejiggering we want to do in master as a reason not to back-patch that fix, though. I guess as long as the pgunixsocket thing is in there, it makes sense to substitute DefaultHost for it on Windows, but are we sure that's something to back-patch? Right now, as I was saying, PQhost is in some gray area where it's not too clear what its charter is. It's not what was the host parameter, for sure, but we haven't tried to make it an accurate description of the connection either. It's a bit less accurate on Windows than elsewhere, but do we want to risk breaking anything to only partially resolve that? More generally, if we do go over in 9.4 to the position that PQhost reports the host parameter and nothing but, I'm not sure that introducing a third behavior into the back branches is something that anybody will thank us for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
Gavin Flower gavinflo...@archidevsys.co.nz wrote: For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes, using 400 byte pages. In the pathologically worst case, assuming maximum packing density and no page has both types: the large rows would occupy 500 pages and the smaller rows 50 pages. So if one selected 11 pages at random, you get about 10 pages of large rows and about one for small rows! With 10 * 2 = 20 large rows, and 1 * 20 = 20 small rows. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.3 reference constraint regression
In 9.3 I can delete the parent of a parent-child relation if the child row is an uncommitted insert and I first update the parent. USER1: drop table child; drop table parent; create table parent (i int, c char(3)); create unique index parent_idx on parent (i); insert into parent values (1, 'AAA'); create table child (i int references parent(i)); USER2: BEGIN; insert into child values (1); USER1: BEGIN; update parent set c=lower(c); delete from parent; COMMIT; USER2: COMMIT; Note that the problem also happens if the update is set i=i. I was expecting this update to block as the UPDATE is on a unique index that can be used in a foreign key. The i=i update should get a UPDATE lock and not a NO KEY UPDATE lock as I believe the c=... update does.
Re: [HACKERS] autovacuum_work_mem
On Wed, Dec 11, 2013 at 7:41 AM, Simon Riggs si...@2ndquadrant.com wrote: That's about 2-3 days work and I know Peter can hack it. So the situation is not perfection-sought-blocking-good, this is more like fairly poor solution being driven through when a better solution is available within the time and skills available. I think that that's a very optimistic assessment of the amount of work required. Even by the rose-tinted standards of software project time estimation. A ton of data is required to justify fundamental infrastructural changes like that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
On Wed, Dec 11, 2013 at 2:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Dec 11, 2013 at 2:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Well, returning /tmp on Windows is just stupid. I don't see why we should feel bad about changing that. A bug is a bug. What I was suggesting was we should take out the pgunixsocket fallback, not make it even more complicated. That probably implies that we need still another accessor function to get the socket path. Well, I guess. I have a hard time seeing whatever rejiggering we want to do in master as a reason not to back-patch that fix, though. I guess as long as the pgunixsocket thing is in there, it makes sense to substitute DefaultHost for it on Windows, but are we sure that's something to back-patch? Well, it seems like a clear case of returning a ridiculous value, but I'm willing to be talked out of it if someone can explain how it would break things. I guess it's possible someone could have code out that that tests for the exact value /tmp and does something based on that, but that seems a stretch - and if they did have such code, it would probably just handle it by substituting localhost anyway. Right now, as I was saying, PQhost is in some gray area where it's not too clear what its charter is. It's not what was the host parameter, for sure, but we haven't tried to make it an accurate description of the connection either. It's a bit less accurate on Windows than elsewhere, but do we want to risk breaking anything to only partially resolve that? I guess it depends on how risky we think it is. More generally, if we do go over in 9.4 to the position that PQhost reports the host parameter and nothing but, I'm not sure that introducing a third behavior into the back branches is something that anybody will thank us for. It doesn't seem very plausible to say that we're just going to redefine it that way, unless we're planning to bump the soversion. But maybe we should decide what we *are* going to do in master first, before deciding what to back-patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On 11 December 2013 17:57, Robert Haas robertmh...@gmail.com wrote: Extensive testing will be needed to prove that the new algorithm doesn't perform worse than the current algorithm in any important cases. Agreed, but the amount of testing seems equivalent in both cases, assuming we weren't going to skip it for this patch. Let me repeat the question, so we are clear... In what circumstances will the memory usage from multiple concurrent VACUUMs become a problem? In those circumstances, reducing autovacuum_work_mem will cause more passes through indexes, dirtying more pages and elongating the problem workload. I agree that multiple concurrent VACUUMs could be a problem but this doesn't solve that, it just makes things worse. The *only* time this parameter would have any effect looks like when it will make matters worse. With considerable regret, I don't see how this solves the problem at hand. We can and should do better. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/12/13 08:14, Gavin Flower wrote: On 12/12/13 07:22, Gavin Flower wrote: On 12/12/13 06:22, Tom Lane wrote: I wrote: Hm. You can only take N rows from a block if there actually are at least N rows in the block. So the sampling rule I suppose you are using is select up to N rows from each sampled block --- and that is going to favor the contents of blocks containing narrower-than-average rows. Oh, no, wait: that's backwards. (I plead insufficient caffeine.) Actually, this sampling rule discriminates *against* blocks with narrower rows. You previously argued, correctly I think, that sampling all rows on each page introduces no new bias because row width cancels out across all sampled pages. However, if you just include up to N rows from each page, then rows on pages with more than N rows have a lower probability of being selected, but there's no such bias against wider rows. This explains why you saw smaller values of i being undersampled. Had you run the test series all the way up to the max number of tuples per block, which is probably a couple hundred in this test, I think you'd have seen the bias go away again. But the takeaway point is that we have to sample all tuples per page, not just a limited number of them, if we want to change it like this. regards, tom lane Surely we want to sample a 'constant fraction' (obviously, in practice you have to sample an integral number of rows in a page!) of rows per page? The simplest way, as Tom suggests, is to use all the rows in a page. However, if you wanted the same number of rows from a greater number of pages, you could (for example) select a quarter of the rows from each page. In which case, when this is a fractional number: take the integral number of rows, plus on extra row with a probability equal to the fraction (here 0.25). Either way, if it is determined that you need N rows, then keep selecting pages at random (but never use the same page more than once) until you have at least N rows. Cheers, Gavin Yes the fraction/probability, could actually be one of: 0.25, 0.50, 0.75. But there is a bias introduced by the arithmetic average size of the rows in a page. This results in block sampling favouring large rows, as they are in a larger proportion of pages. For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes, using 400 byte pages. In the pathologically worst case, assuming maximum packing density and no page has both types: the large rows would occupy 500 pages and the smaller rows 50 pages. So if one selected 11 pages at random, you get about 10 pages of large rows and about one for small rows! In practice, it would be much less extreme - for a start, not all blocks will be fully packed, most blocks would have both types of rows, and there is usually greater variation in row size - but still a bias towards sampling larger rows. So somehow, this bias needs to be counteracted. Cheers, Gavin Actually, I just thought of a possible way to overcome the bias towards large rows. 1. Calculate (a rough estimate may be sufficient, if not too 'rough') the size of the smallest row. 2. Select a page at random (never selecting the same page twice) 3. Then select rows at random within the page (never selecting the same row twice). For each row selected, accept it with the probability equal to (size of smallest row)/(size of selected row). I think you find that will almost completely offset the bias towards larger rows! 4. If you do not have sufficient rows, and you still have pages not yet selected, goto 2 Note that it will be normal for for some pages not to have any rows selected, especially for large tables! Cheers, Gavin P.S. I really need to stop thinking about this problem, and get on with my assigned project!!!
Re: [HACKERS] ANALYZE sampling is too good
On Tue, Dec 10, 2013 at 4:48 PM, Peter Geoghegan p...@heroku.com wrote: Why would I even mention that to a statistician? We want guidance. But yes, I bet I could give a statistician an explanation of statistics target that they'd understand without too much trouble. Actually, I think that if we told a statistician about the statistics target, his or her response would be: why would you presume to know ahead of time what statistics target is going to be effective? I suspect that the basic problem is that it isn't adaptive. I think that if we could somehow characterize the quality of our sample as we took it, and then cease sampling when we reached a certain degree of confidence in its quality, that would be helpful. It might not even matter that the sample was clustered from various blocks. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE sampling is too good
On 12/12/13 08:31, Kevin Grittner wrote: Gavin Flower gavinflo...@archidevsys.co.nz wrote: For example, assume 1000 rows of 200 bytes and 1000 rows of 20 bytes, using 400 byte pages. In the pathologically worst case, assuming maximum packing density and no page has both types: the large rows would occupy 500 pages and the smaller rows 50 pages. So if one selected 11 pages at random, you get about 10 pages of large rows and about one for small rows! With 10 * 2 = 20 large rows, and 1 * 20 = 20 small rows. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Sorry, I've simply come up with well argued nonsense! Kevin, you're dead right. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] -d option for pg_isready is broken
Robert Haas robertmh...@gmail.com writes: On Wed, Dec 11, 2013 at 2:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: More generally, if we do go over in 9.4 to the position that PQhost reports the host parameter and nothing but, I'm not sure that introducing a third behavior into the back branches is something that anybody will thank us for. It doesn't seem very plausible to say that we're just going to redefine it that way, unless we're planning to bump the soversion. Well, we didn't bump the soversion (nor touch the documentation) in commit f6a756e4, which is basically what I'm suggesting we ought to revert. It was nothing but a quick hack at the time, and hindsight is saying it was a bad idea. Admittedly, it was long enough ago that there might be some grandfather status attached to the current behavior; but that argument can't be made for changing its behavior still further. But maybe we should decide what we *are* going to do in master first, before deciding what to back-patch. Right. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Robert Haas robertmh...@gmail.com writes: You've got that backwards. We do have the luxury of rejecting new features until people are generally satisfied that the basic design is right. There's no overlord decreeing that this must be in 9.4. I strongly agree. PostgreSQL has succeeded because we try not to do things at all until we're sure we know how to do them right. I still agree to the principle, or I wouldn't even try. Not in details, because the current design passed all the usual criteria a year ago. http://www.postgresql.org/message-id/6466.1354817...@sss.pgh.pa.us I can certainly understand Dimitri's frustration, in that he's written several versions of this patch and none have been accepted. But what The design was accepted, last year. It took a year to review it, which is fair enough, only to find new problems again. Circles at their best. You just said on another thread that perfect is the enemy of good. What about applying the same line of thoughts to this patch? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On Wed, Dec 11, 2013 at 2:37 PM, Simon Riggs si...@2ndquadrant.com wrote: On 11 December 2013 17:57, Robert Haas robertmh...@gmail.com wrote: Extensive testing will be needed to prove that the new algorithm doesn't perform worse than the current algorithm in any important cases. Agreed, but the amount of testing seems equivalent in both cases, assuming we weren't going to skip it for this patch. Let me repeat the question, so we are clear... In what circumstances will the memory usage from multiple concurrent VACUUMs become a problem? In those circumstances, reducing autovacuum_work_mem will cause more passes through indexes, dirtying more pages and elongating the problem workload. I agree that multiple concurrent VACUUMs could be a problem but this doesn't solve that, it just makes things worse. That's not the problem the patch is designed to solve. It's intended for the case where you want to allow more or less memory to autovacuum than you do for index builds. There's no principled reason that anyone should want those things to be the same. It is not difficult to imagine situations in which you would want one set to a very different value than the other. In particular it seems quite likely to me that the amount of memory appropriate for index builds might be vastly more than is needed by autovacuum. For example, in a data-warehousing environment where updates are rare but large index builds by the system's sole user are frequent, someone might want to default index builds to 64GB of RAM (especially after Noah's patch to allow huge allocations for the tuple array while sorting) but only need 256MB for autovacuum. In general, I'm reluctant to believe that Peter proposed this patch just for fun. I assume this is a real-world problem that Heroku encounters in their environment. If not, well then that's different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum_work_mem
On 12/11/2013 11:37 AM, Simon Riggs wrote: On 11 December 2013 17:57, Robert Haas robertmh...@gmail.com wrote: Extensive testing will be needed to prove that the new algorithm doesn't perform worse than the current algorithm in any important cases. Agreed, but the amount of testing seems equivalent in both cases, assuming we weren't going to skip it for this patch. No performance testing is required for this patch. The effect of memory limits on vacuum are already well-known and well-understood. With considerable regret, I don't see how this solves the problem at hand. We can and should do better. I strongly disagree. The problem we are dealing with currently is that two resource limits which should have *always* been independent of each other are currently conflated into a single GUC variable. This forces users to remember to set maintenance_work_mem interactively every time they want to run a manual VACUUM, because the setting in postgresql.conf is needed to tune autovacuum. In other words, we are having an issue with *non-atomic data*, and this patch partially fixes that. Would it be better to have an admissions-control policy engine for launching autovacuum which takes into account available RAM, estimated costs of concurrent vacuums, current CPU activity, and which tables are in cache? Yes. And if you started on that now, you might have it ready for 9.5. And, for that matter, accepting this patch by no means blocks doing something more sophisticated in the future. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Dimitri, * Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote: Stephen Frost sfr...@snowman.net writes: The extra catalog tables which store SQL scripts in text columns is one of my main objections to the as-proposed Extension Templates. I view those scripts as a poor man's definition of database objects which are defined properly in the catalog already. I have a very hard time to understand this objection. PL/SQL functions are just a SQL script stored as-is in the catalogs. That applies the same way to any other PL language too, with scripts stored as-is in the catalogs in different languages. Sure- but in those cases only the actual function (which is, by definition, for an *interpreted* language..) is stored as text, not the definition of the function (eg: the CREATE FUNCTION statement), nor all of the metadata, dependency information, etc. Also, what you're proposing would result in having *both* in the same catalog- the canonical form defined in pg_proc and friends, and the SQL text blob in the extension template catalog and I simply do not see value in that. So while I hear your objection to the script in catalog idea Stephen, I think we should move forward. We don't have the luxury of only applying patches where no compromise has to be made, where everyone is fully happy with the solution we find as a community. I understand that you wish to push this forward regardless of anyone's concerns. While I appreciate your frustration and the time you've spent on this, that isn't going to change my opinion of this approach. The other big issue is that there isn't an easy way to see how we could open up the ability to create extensions to non-superusers with this approach. The main proposal here is to only allow the owner of a template to install it as an extension. For superusers, we can implement the needed SET ROLE command automatically in the CREATE EXTENSION command. Is there another security issue that this “same role” approach is not solving? I don't think so. This isn't kind, and for that I'm sorry, but this feels, to me, like a very hand-wavey well, I think this would solve all the problems answer to the concerns raised. I can't answer offhand if this would really solve all of the issues because I've not tried to implement it or test it out, but I tend to doubt that it would. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Reference to parent query from ANY sublink
Kevin Grittner kgri...@ymail.com wrote: I applied it to master and ran the regression tests, and one of the subselect tests failed. This query: SELECT '' AS six, f1 AS Correlated Field, f3 AS Second Field FROM SUBSELECT_TBL upper WHERE f1 IN (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3); [ ... ] during the `make check` or `make install-check` [ ... ] is missing the last two rows. Oddly, if I go into the database later and try it, the rows show up. It's not immediately apparent to me what's wrong. Using the v2 patch, with the default statistics from table creation, the query modified with an alias of lower for the second reference, just for clarity, yields a plan which generates incorrect results: Hash Join (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 rows=3 loops=1) Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = lower.f2)) - Seq Scan on subselect_tbl upper (cost=0.00..27.70 rows=1770 width=16) (actual time=0.006..0.007 rows=8 loops=1) - Hash (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 rows=5 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB - HashAggregate (cost=32.12..34.12 rows=200 width=12) (actual time=0.014..0.018 rows=6 loops=1) - Seq Scan on subselect_tbl lower (cost=0.00..27.70 rows=1770 width=12) (actual time=0.002..0.004 rows=8 loops=1) Total runtime: 0.111 ms As soon as there is a VACUUM and/or ANALYZE it generates a plan more like what the OP was hoping for: Hash Semi Join (cost=1.20..2.43 rows=6 width=12) (actual time=0.031..0.036 rows=5 loops=1) Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = lower.f2)) - Seq Scan on subselect_tbl upper (cost=0.00..1.08 rows=8 width=16) (actual time=0.004..0.007 rows=8 loops=1) - Hash (cost=1.08..1.08 rows=8 width=12) (actual time=0.012..0.012 rows=7 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB - Seq Scan on subselect_tbl lower (cost=0.00..1.08 rows=8 width=12) (actual time=0.003..0.005 rows=8 loops=1) Total runtime: 0.074 ms By comparison, without the patch this is the plan: Seq Scan on subselect_tbl upper (cost=0.00..5.59 rows=4 width=12) (actual time=0.022..0.037 rows=5 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 3 SubPlan 1 - Seq Scan on subselect_tbl lower (cost=0.00..1.12 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=8) Filter: ((upper.f2)::double precision = f3) Rows Removed by Filter: 4 Total runtime: 0.066 ms When I run the query with fresh statistics and without EXPLAIN both ways, the unpatched is consistently about 10% faster. So pulling up the subquery can yield an incorrect plan, and even when it yields the desired plan with the semi-join it is marginally slower than using the subplan, at least for this small data set. I think it's an interesting idea, but it still needs work. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers