Re: [HACKERS] Fix pgstattuple/pgstatindex to use regclass-type as the argument
(2013/07/04 3:58), Fujii Masao wrote: On Wed, Jun 26, 2013 at 12:39 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 20, 2013 at 2:32 PM, Fujii Masao masao.fu...@gmail.com wrote: Since pg_relpages(oid) doesn't exist, pg_relpages() is in the same situation as pgstatindex(), i.e., we cannot just replace pg_relpages(text) with pg_relpages(regclass) for the backward-compatibility. How do you think we should solve the pg_relpages() problem? Rename? Just add pg_relpages(regclass)? Adding a function with a new name seems likely to be smoother, since that way you don't have to worry about problems with function calls being thought ambiguous. Could you let me know the example that this problem happens? For the test, I just implemented the regclass-version of pg_relpages() (patch attached) and tested some cases. But I could not get that problem. SELECT pg_relpages('hoge');-- OK SELECT pg_relpages(oid) FROM pg_class WHERE relname = 'hoge';-- OK SELECT pg_relpages(relname) FROM pg_class WHERE relname = 'hoge';-- OK In the attached patch, I cleaned up three functions to have two types of arguments for each, text and regclass. pgstattuple(text) pgstattuple(regclass) pgstatindex(text) pgstatindex(regclass) pg_relpages(text) pg_relpages(regclass) I still think a regclass argument is more appropriate for passing relation/index name to a function than text-type, but having both arguments in each function seems to be a good choice at this moment, in terms of backward-compatibility. Docs needs to be updated if this change going to be applied. Any comments? -- Satoshi Nagayasu sn...@uptime.jp Uptime Technologies, LLC. http://www.uptime.jp diff --git a/contrib/pgstattuple/Makefile b/contrib/pgstattuple/Makefile index fc893d8..957742a 100644 --- a/contrib/pgstattuple/Makefile +++ b/contrib/pgstattuple/Makefile @@ -4,7 +4,7 @@ MODULE_big = pgstattuple OBJS = pgstattuple.o pgstatindex.o EXTENSION = pgstattuple -DATA = pgstattuple--1.1.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql +DATA = pgstattuple--1.2.sql pgstattuple--1.0--1.1.sql pgstattuple--unpackaged--1.0.sql REGRESS = pgstattuple diff --git a/contrib/pgstattuple/expected/pgstattuple.out b/contrib/pgstattuple/expected/pgstattuple.out index ab28f50..eaba306 100644 --- a/contrib/pgstattuple/expected/pgstattuple.out +++ b/contrib/pgstattuple/expected/pgstattuple.out @@ -11,12 +11,24 @@ select * from pgstattuple('test'::text); 0 | 0 | 0 | 0 |0 | 0 | 0 | 0 |0 (1 row) +select * from pgstattuple('test'::name); + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent +---+-+---+---+--++++-- + 0 | 0 | 0 | 0 |0 | 0 | 0 | 0 |0 +(1 row) + select * from pgstattuple('test'::regclass); table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent ---+-+---+---+--++++-- 0 | 0 | 0 | 0 |0 | 0 | 0 | 0 |0 (1 row) +select * from pgstattuple('test'::regclass::oid); + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent +---+-+---+---+--++++-- + 0 | 0 | 0 | 0 |0 | 0 | 0 | 0 |0 +(1 row) + select * from pgstatindex('test_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+++-+---+--+ diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c index 97f897e..9ec74e7 100644 --- a/contrib/pgstattuple/pgstatindex.c +++ b/contrib/pgstattuple/pgstatindex.c @@ -40,11 +40,15 @@ extern Datum pgstatindex(PG_FUNCTION_ARGS); +extern Datum pgstatindexbyid(PG_FUNCTION_ARGS); extern Datum pg_relpages(PG_FUNCTION_ARGS); +extern Datum pg_relpagesbyid(PG_FUNCTION_ARGS); extern Datum pgstatginindex(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pgstatindex); +PG_FUNCTION_INFO_V1(pgstatindexbyid);
Re: [HACKERS] Fix pgstattuple/pgstatindex to use regclass-type as the argument
Hi Rushabh, (2013/07/16 14:58), Rushabh Lathia wrote: Hello Satoshi, I assigned myself for the reviewer of this patch. Issue status is waiting on author. Thank you for picking it up. Now looking at the discussion under the thread it seems like we are waiting for the suggestion for the new function name, right ? Yes. I am wondering why actually we need new name ? Can't we just overload the same function and provide two version of the functions ? I think the major reason is to avoid some confusion with old and new function arguments. My thought here is that having both arguments (text and regclass) for each function is a good choice to clean up interfaces with keeping the backward-compatibility. In the last thread Fujii just did the same for pg_relpages and it seems like an good to go approach, isn't it ? Am I missing anything here ? I just posted a revised patch to handle the issue in three functions of the pgstattuple module. Please take a look. Regards, -- Satoshi Nagayasu sn...@uptime.jp Uptime Technologies, LLC. http://www.uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] make dist error
I'm getting errors while executing make dist on git master head. $ make dist make dist rm -rf postgresql-9.4devel* =install= for x in `cd . find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -print`; do \ file=`expr X$x : 'X\./\(.*\)'`; \ if test -d ./$file ; then \ mkdir postgresql-9.4devel/$file chmod 777 postgresql-9.4devel/$file; \ else \ ln ./$file postgresql-9.4devel/$file /dev/null 21 \ || cp ./$file postgresql-9.4devel/$file; \ fi || exit; \ done make -C postgresql-9.4devel distprep [snip] /bin/tar chf postgresql-9.4devel.tar postgresql-9.4devel /bin/tar: postgresql-9.4devel/src/bin/pg_controldata/pg_crc.c: File removed before we read it /bin/tar: postgresql-9.4devel/src/bin/pg_resetxlog/pg_crc.c: File removed before we read it /bin/tar: postgresql-9.4devel/src/backend/tcop/.#postgres.c: File removed before we read it make: *** [postgresql-9.4devel.tar] Error 1 make: *** Deleting file `postgresql-9.4devel.tar' -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add visibility map information to pg_freespace.
(2013/07/09 19:55), Kyotaro HORIGUCHI wrote: Hello, I've brought visibilitymap extentions for pg_freespacemap and pgstattuple. At Mon, 08 Jul 2013 16:59:05 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote in 20130708.165905.118860769.horiguchi.kyot...@lab.ntt.co.jp I'll come again with the first implementation of it. And as for pg_freespacemap, I'll keep the current direction - adding column to present output records format of pg_freespace(). And documentation, if possible. pg_freespace_vm_v2.patch: Interface has been changed from the first patch. The version of pg_freespace() provided with vm information is named pg_freespace_with_vminfo() and shows output like following. | postgres=# select * from pg_freespace_with_vminfo('t'::regclass) limit 10; | blkno | avail | is_all_visible | ---+---+ | 0 |64 | t | 1 |32 | t | 2 |96 | t | 3 |64 | t | 4 |96 | t | 5 |96 | t | 6 | 128 | t | 7 |32 | t | 8 |96 | t I think we can simply add is_all_viible column to the existing pg_freespace(), because adding column would not break backward-compatibility in general. Any other thoughts? pgstattuple_vm_v1.patch: The first version of VM extension for pgstattuple. According to the previous discussion, the added column is named 'all_visible_percent'. | postgres=# select * from pgstattuple('t'); | -[ RECORD 1 ]---+- | table_len | 71770112 | tuple_count | 989859 | tuple_len | 31675488 | tuple_percent | 44.13 | dead_tuple_count| 99 | dead_tuple_len | 3168 | dead_tuple_percent | 0 | free_space | 31886052 | free_percent| 44.43 | all_visible_percent | 99.98 It seems working fine. And I added a regression test for pg_freespacemap and additional test cases for pgstattuple. Please take a look. Regards, -- Satoshi Nagayasu sn...@uptime.jp Uptime Technologies, LLC. http://www.uptime.jp diff --git a/contrib/pg_freespacemap/Makefile b/contrib/pg_freespacemap/Makefile index d794df2..09d6ff8 100644 --- a/contrib/pg_freespacemap/Makefile +++ b/contrib/pg_freespacemap/Makefile @@ -6,6 +6,8 @@ OBJS = pg_freespacemap.o EXTENSION = pg_freespacemap DATA = pg_freespacemap--1.1.sql pg_freespacemap--1.0--1.1.sql pg_freespacemap--unpackaged--1.0.sql +REGRESS = pg_freespacemap + ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/contrib/pg_freespacemap/expected/pg_freespacemap.out b/contrib/pg_freespacemap/expected/pg_freespacemap.out new file mode 100644 index 000..cde954d --- /dev/null +++ b/contrib/pg_freespacemap/expected/pg_freespacemap.out @@ -0,0 +1,100 @@ +create extension pg_freespacemap; +create table t1 ( uid integer primary key, uname text not null ); +select * from pg_freespace('t1'); + blkno | avail +---+--- +(0 rows) + +select * from pg_freespace('t1'::regclass); + blkno | avail +---+--- +(0 rows) + +select * from pg_freespace('t1', 1); + pg_freespace +-- +0 +(1 row) + +select * from pg_freespace_with_vminfo('t1'); + blkno | avail | is_all_visible +---+---+ +(0 rows) + +select * from pg_freespace_with_vminfo('t1'::regclass); + blkno | avail | is_all_visible +---+---+ +(0 rows) + +insert into t1 values ( 100, 'postgresql' ); +select * from pg_freespace('t1'); + blkno | avail +---+--- + 0 | 0 +(1 row) + +select * from pg_freespace('t1', 1); + pg_freespace +-- +0 +(1 row) + +select * from pg_freespace_with_vminfo('t1'); + blkno | avail | is_all_visible +---+---+ + 0 | 0 | f +(1 row) + +select * from pg_freespace('t1_pkey'); + blkno | avail +---+--- + 0 | 0 + 1 | 0 +(2 rows) + +select * from pg_freespace('t1_pkey', 1); + pg_freespace +-- +0 +(1 row) + +select * from pg_freespace('t1_pkey', 2); + pg_freespace +-- +0 +(1 row) + +select * from pg_freespace_with_vminfo('t1_pkey'); + blkno | avail | is_all_visible +---+---+ + 0 | 0 | f + 1 | 0 | f +(2 rows) + +vacuum t1; +select * from pg_freespace('t1'); + blkno | avail +---+--- + 0 | 8096 +(1 row) + +select * from pg_freespace_with_vminfo('t1'); + blkno | avail | is_all_visible +---+---+ + 0 | 8096 | t +(1 row) + +select * from pg_freespace('t1_pkey'); + blkno | avail +---+--- + 0 | 0 + 1 | 0 +(2 rows) + +select * from pg_freespace_with_vminfo('t1_pkey'); + blkno | avail | is_all_visible +---+---+ + 0 | 0 | f + 1 | 0 | f +(2 rows) + diff --git a/contrib/pg_freespacemap/sql/pg_freespacemap.sql b/contrib/pg_freespacemap/sql/pg_freespacemap.sql new file mode 100644 index 000..79a458d --- /dev/null +++
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
To clarify what state this is all in: Fabien's latest pgbench-throttle-v15.patch is the ready for a committer version. The last two revisions are just tweaking the comments at this point, and his version is more correct than my last one. Got it. I will take care of this. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix pgstattuple/pgstatindex to use regclass-type as the argument
Hi Satoshi, I spent some time on the revised version on the patch(pgstattuple_regclass_v2.diff) and here are my comments. .) Patch get applies cleanly on PG master branch .) Successful build and database creation .) Basic test coverage included in the patch .) make check running cleanly Basically goal of the patch is to allow specifying a relation/index with several expressions, 'relname', 'schemaname.relname' and oid in all pgstattuple functions. To achieve the same patch introduced another version of pgstattuple functions which takes regclass as input args. To make it backward compatible we kept the pgstatetuple functions with TEXT input arg. In the mail thread we decided that pgstattuple(text) will be depreciated in the future release and we need to document that. Which is missing in the patch. Apart from that few comments in the C code to explain why multiple version of the pgstattuple function ? would be really helpful for future understanding purpose. Thanks, On Tue, Jul 16, 2013 at 11:42 AM, Satoshi Nagayasu sn...@uptime.jp wrote: Hi Rushabh, (2013/07/16 14:58), Rushabh Lathia wrote: Hello Satoshi, I assigned myself for the reviewer of this patch. Issue status is waiting on author. Thank you for picking it up. Now looking at the discussion under the thread it seems like we are waiting for the suggestion for the new function name, right ? Yes. I am wondering why actually we need new name ? Can't we just overload the same function and provide two version of the functions ? I think the major reason is to avoid some confusion with old and new function arguments. My thought here is that having both arguments (text and regclass) for each function is a good choice to clean up interfaces with keeping the backward-compatibility. In the last thread Fujii just did the same for pg_relpages and it seems like an good to go approach, isn't it ? Am I missing anything here ? I just posted a revised patch to handle the issue in three functions of the pgstattuple module. Please take a look. Regards, -- Satoshi Nagayasu sn...@uptime.jp Uptime Technologies, LLC. http://www.uptime.jp -- Rushabh Lathia
[HACKERS] A general Q about index
Hi I want to know how an index is created and used. actually if you can show to me a simple start point, it would be great. Regards, Soroosh Sardari
Re: [HACKERS] changeset generation v5-01 - Patches git tree
On Sun, Jul 7, 2013 at 4:34 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-07-07 15:43:17 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: 3b) Add catcache 'filter' that ensures the cache stays unique and use that for the mapping I slightly prefer 3b) because it's smaller, what's your opinions? This is just another variation on the theme of kluging the catcache to do something it shouldn't. You're still building a catcache on a non-unique index, and that is going to lead to trouble. I don't think the lurking dangers really are present. The index essentially *is* unique since we filter away anything non-unique. The catcache code hardly can be confused by tuples it never sees. That would even work if we started preloading catcaches by doing scans of the entire underlying relation or by caching all of a page when reading one of its tuples. I can definitely see that there are aesthetical reasons against doing 3b), that's why I've also done 3a). So I'll chalk you up to voting for that... I also vote for (3a). I did a quick once over of 1, 2, and 3a and they look reasonable. Barring strenuous objections, I'd like to go ahead and commit these, or perhaps an updated version of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans
Consider this sequence of commands: create type rowtype as (c int, d int); create temp table t of rowtype; \c - drop type rowtype cascade; Since the switch to MVCC catalog scans, it exhibits the following error about 10% of the time on my system: CREATE TYPE CREATE TABLE You are now connected to database test as user nm. ERROR: XX000: cache lookup failed for relation 17009 LOCATION: getRelationDescription, objectaddress.c:2186 With \c, in general, you may end up executing commands under the new session before the old backend has finished exiting. For this test case specifically, the two backends' attempts to drop table t regularly overlap. The old backend would drop it within RemoveTempRelationsCallback(), and the new backend would cascade from rowtype to drop it. findDependentObjects() deals with concurrent deletion attempts by acquiring a lock on each object it will delete, then calling systable_recheck_tuple() to determine whether another deleter was successful while the current backend was waiting for the lock. systable_recheck_tuple() uses the scan snapshot, which really only works if that snapshot is SnapshotNow or some other that changes its decision in response to concurrent transaction commits. The switch to MVCC snapshots left this mutual exclusion protocol ineffective. Let's fix this by having systable_recheck_tuple() acquire a fresh catalog MVCC snapshot and recheck against that. I believe it would also be fully safe to use SnapshotNow here; however, I'm assuming we would otherwise manage to remove SnapshotNow entirely. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c index 2bfe78a..2fecc88 100644 --- a/src/backend/access/index/genam.c +++ b/src/backend/access/index/genam.c @@ -362,7 +362,8 @@ systable_getnext(SysScanDesc sysscan) * systable_recheck_tuple --- recheck visibility of most-recently-fetched tuple * * This is useful to test whether an object was deleted while we waited to - * acquire lock on it. + * acquire lock on it. We recheck visibility in the style of SnapshotNow by + * checking against a fresh catalog snapshot. * * Note: we don't actually *need* the tuple to be passed in, but it's a * good crosscheck that the caller is interested in the right tuple. @@ -370,30 +371,37 @@ systable_getnext(SysScanDesc sysscan) bool systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup) { + Snapshotfreshsnap; boolresult; + /* +* For a scan using a non-MVCC snapshot like SnapshotSelf, we would simply +* reuse the old snapshot. So far, the only caller uses MVCC snapshots. +*/ + freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan-heap_rel)); + if (sysscan-irel) { IndexScanDesc scan = sysscan-iscan; + Assert(IsMVCCSnapshot(scan-xs_snapshot)); Assert(tup == scan-xs_ctup); Assert(BufferIsValid(scan-xs_cbuf)); /* must hold a buffer lock to call HeapTupleSatisfiesVisibility */ LockBuffer(scan-xs_cbuf, BUFFER_LOCK_SHARE); - result = HeapTupleSatisfiesVisibility(tup, scan-xs_snapshot, - scan-xs_cbuf); + result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan-xs_cbuf); LockBuffer(scan-xs_cbuf, BUFFER_LOCK_UNLOCK); } else { HeapScanDesc scan = sysscan-scan; + Assert(IsMVCCSnapshot(scan-rs_snapshot)); Assert(tup == scan-rs_ctup); Assert(BufferIsValid(scan-rs_cbuf)); /* must hold a buffer lock to call HeapTupleSatisfiesVisibility */ LockBuffer(scan-rs_cbuf, BUFFER_LOCK_SHARE); - result = HeapTupleSatisfiesVisibility(tup, scan-rs_snapshot, - scan-rs_cbuf); + result = HeapTupleSatisfiesVisibility(tup, freshsnap, scan-rs_cbuf); LockBuffer(scan-rs_cbuf, BUFFER_LOCK_UNLOCK); } return result; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make dist error
On 07/16/2013 02:53 AM, Tatsuo Ishii wrote: I'm getting errors while executing make dist on git master head. $ make dist make dist rm -rf postgresql-9.4devel* =install= for x in `cd . find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -print`; do \ file=`expr X$x : 'X\./\(.*\)'`; \ if test -d ./$file ; then \ mkdir postgresql-9.4devel/$file chmod 777 postgresql-9.4devel/$file; \ else \ ln ./$file postgresql-9.4devel/$file /dev/null 21 \ || cp ./$file postgresql-9.4devel/$file; \ fi || exit; \ done make -C postgresql-9.4devel distprep [snip] /bin/tar chf postgresql-9.4devel.tar postgresql-9.4devel /bin/tar: postgresql-9.4devel/src/bin/pg_controldata/pg_crc.c: File removed before we read it /bin/tar: postgresql-9.4devel/src/bin/pg_resetxlog/pg_crc.c: File removed before we read it /bin/tar: postgresql-9.4devel/src/backend/tcop/.#postgres.c: File removed before we read it make: *** [postgresql-9.4devel.tar] Error 1 make: *** Deleting file `postgresql-9.4devel.tar' It's working for me: ... /bin/tar chf postgresql-9.4devel.tar postgresql-9.4devel gzip --best -c postgresql-9.4devel.tar postgresql-9.4devel.tar.gz bzip2 -c postgresql-9.4devel.tar postgresql-9.4devel.tar.bz2 rm -rf postgresql-9.4devel It looks like your source directory isn't completely clean. Before I did this I did: git clean -dfx cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make dist error
It's working for me: ... /bin/tar chf postgresql-9.4devel.tar postgresql-9.4devel gzip --best -c postgresql-9.4devel.tar postgresql-9.4devel.tar.gz bzip2 -c postgresql-9.4devel.tar postgresql-9.4devel.tar.bz2 rm -rf postgresql-9.4devel It looks like your source directory isn't completely clean. Before I did this I did: git clean -dfx Oh, I didn't know that make dist requires git clean. Thanks. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make dist error
On 07/16/2013 09:53 AM, Andrew Dunstan wrote: On 07/16/2013 02:53 AM, Tatsuo Ishii wrote: I'm getting errors while executing make dist on git master head. $ make dist make dist rm -rf postgresql-9.4devel* =install= for x in `cd . find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -print`; do \ file=`expr X$x : 'X\./\(.*\)'`; \ if test -d ./$file ; then \ mkdir postgresql-9.4devel/$file chmod 777 postgresql-9.4devel/$file;\ else \ ln ./$file postgresql-9.4devel/$file /dev/null 21 \ || cp ./$file postgresql-9.4devel/$file; \ fi || exit; \ done make -C postgresql-9.4devel distprep [snip] /bin/tar chf postgresql-9.4devel.tar postgresql-9.4devel /bin/tar: postgresql-9.4devel/src/bin/pg_controldata/pg_crc.c: File removed before we read it /bin/tar: postgresql-9.4devel/src/bin/pg_resetxlog/pg_crc.c: File removed before we read it /bin/tar: postgresql-9.4devel/src/backend/tcop/.#postgres.c: File removed before we read it make: *** [postgresql-9.4devel.tar] Error 1 make: *** Deleting file `postgresql-9.4devel.tar' It's working for me: ... /bin/tar chf postgresql-9.4devel.tar postgresql-9.4devel gzip --best -c postgresql-9.4devel.tar postgresql-9.4devel.tar.gz bzip2 -c postgresql-9.4devel.tar postgresql-9.4devel.tar.bz2 rm -rf postgresql-9.4devel It looks like your source directory isn't completely clean. Before I did this I did: git clean -dfx Incidentally, the buildfarm animal guaibasaurus builds make dist every build, via a bespoke buildfarm module. See for example http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=guaibasaurusdt=2013-07-16%2004%3A17%3A01stg=make-dist, so we should get early notice if anything breaks it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A general Q about index
soroosh sardari wrote Hi I want to know how an index is created and used. actually if you can show to me a simple start point, it would be great. Regards, Soroosh Sardari In the documentation there is a table of contents and listed on that table is a section named Indexes. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/A-general-Q-about-index-tp5763912p5763926.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A general Q about index
David Johnston wrote soroosh sardari wrote Hi I want to know how an index is created and used. actually if you can show to me a simple start point, it would be great. Regards, Soroosh Sardari In the documentation there is a table of contents and listed on that table is a section named Indexes. David J. Since you posted this to hacker you may mean you wish to know how to program them as oppose to use them in SQL. I have no clue to this regard. If you did mean use in SQL then the documentation is your friend and you also should not have posted this question to -hackers but to -general instead; probably should have posted there anyway to begin with and been more clear as to what you mean by created and used. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/A-general-Q-about-index-tp5763912p5763928.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans
On Tue, Jul 16, 2013 at 9:50 AM, Noah Misch n...@leadboat.com wrote: Consider this sequence of commands: create type rowtype as (c int, d int); create temp table t of rowtype; \c - drop type rowtype cascade; Since the switch to MVCC catalog scans, it exhibits the following error about 10% of the time on my system: CREATE TYPE CREATE TABLE You are now connected to database test as user nm. ERROR: XX000: cache lookup failed for relation 17009 LOCATION: getRelationDescription, objectaddress.c:2186 With \c, in general, you may end up executing commands under the new session before the old backend has finished exiting. For this test case specifically, the two backends' attempts to drop table t regularly overlap. The old backend would drop it within RemoveTempRelationsCallback(), and the new backend would cascade from rowtype to drop it. findDependentObjects() deals with concurrent deletion attempts by acquiring a lock on each object it will delete, then calling systable_recheck_tuple() to determine whether another deleter was successful while the current backend was waiting for the lock. systable_recheck_tuple() uses the scan snapshot, which really only works if that snapshot is SnapshotNow or some other that changes its decision in response to concurrent transaction commits. The switch to MVCC snapshots left this mutual exclusion protocol ineffective. Let's fix this by having systable_recheck_tuple() acquire a fresh catalog MVCC snapshot and recheck against that. I believe it would also be fully safe to use SnapshotNow here; however, I'm assuming we would otherwise manage to remove SnapshotNow entirely. I recommend reworking the header comment to avoid mention of SnapshotNow, since if we get rid of SnapshotNow, the reference might not be too clear to far-future hackers. + /* +* For a scan using a non-MVCC snapshot like SnapshotSelf, we would simply +* reuse the old snapshot. So far, the only caller uses MVCC snapshots. +*/ + freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan-heap_rel)); This comment is not very clear, because it doesn't describe what the code actually does, but rather speculates about what the code could do if the intention of some future caller were different. I recommend adding Assert(IsMVCCSnapshot(scan-xs_snapshot)) and changing the comment to something like this: For now, we don't handle the case of a non-MVCC scan snapshot. This is adequate for existing uses of this function, but might need to be changed in the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans
Robert Haas robertmh...@gmail.com writes: On Tue, Jul 16, 2013 at 9:50 AM, Noah Misch n...@leadboat.com wrote: Let's fix this by having systable_recheck_tuple() acquire a fresh catalog MVCC snapshot and recheck against that. I believe it would also be fully safe to use SnapshotNow here; however, I'm assuming we would otherwise manage to remove SnapshotNow entirely. I agree with Robert's comments, and in addition suggest that this code needs a comment about why it's safe to use the snapshot without doing RegisterSnapshot or equivalent. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make dist error
On 7/16/13 10:03 AM, Andrew Dunstan wrote: Incidentally, the buildfarm animal guaibasaurus builds make dist every build, via a bespoke buildfarm module. See for example http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=guaibasaurusdt=2013-07-16%2004%3A17%3A01stg=make-dist, so we should get early notice if anything breaks it. Ditto for my Jenkins: http://pgci.eisentraut.org/jenkins/job/postgresql_master_dist/ So this is well covered. The requirement to clean before make dist is perhaps underdocumented. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Differences in WHERE clause of SELECT
On Tue, Jul 9, 2013 at 7:12 PM, Josh Berkus j...@agliodbs.com wrote: Like uses the operator class text_pattern_ops which doesn't include an implict cast. This is wrong. Casts are associated with data types, not operator classes. For one thing, the implicit cast is from text -- integer, not the other way around, and there is no LIKE operator for integers. This is also wrong. There's no implicit cast between text and integer in either direction - only assignment casts. I think the reason why the first few examples work and the last one fails is that, in the first few cases, there are integer literals and unknown literals, and the operations in question are defined for integers (and for text), so we pick the integer interpretation and call it good. But in the non-working case, the operation is defined to work only on text, and an integer argument is supplied. So we have to cast, and there's no implicit cast, ergo we fail. The point is that we use a different procedure to decide what to do with a quoted literal ('1', or '3.14159', or 'foo') than we do to decide whether it's OK to cast a value of a type we already know. For example this fails: rhaas=# create table foo (a int, b varchar); CREATE TABLE rhaas=# insert into foo values ('1', '1'); INSERT 0 1 rhaas=# select * from foo where a = b; ERROR: operator does not exist: integer = character varying LINE 1: select * from foo where a = b; ^ This is just like the OP's first example (which worked) except that here there's no unknown literal, so we actually need to cast, and we refuse to do so and fail. I continue to be of the opinion that our behavior in this area is bone-headed. It's reasonable to reject integer = character varying on the basis that we don't know whether integer or character varying comparison semantics are wanted, and the two might give different answers (think: leading zeroes), so we'd better ask the user to clarify. But '1' LIKE 1 is not ambiguous; there is only one plausible meaning for that, and we ought to adopt it, per the patch I proposed previously. Rejecting SQL that other systems happily accept is unhelpful and unfriendly and it is costing us users and mind-share. /rant -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans
On 2013-07-16 09:50:07 -0400, Noah Misch wrote: With \c, in general, you may end up executing commands under the new session before the old backend has finished exiting. For this test case specifically, the two backends' attempts to drop table t regularly overlap. The old backend would drop it within RemoveTempRelationsCallback(), and the new backend would cascade from rowtype to drop it. findDependentObjects() deals with concurrent deletion attempts by acquiring a lock on each object it will delete, then calling systable_recheck_tuple() to determine whether another deleter was successful while the current backend was waiting for the lock. systable_recheck_tuple() uses the scan snapshot, which really only works if that snapshot is SnapshotNow or some other that changes its decision in response to concurrent transaction commits. The switch to MVCC snapshots left this mutual exclusion protocol ineffective. Nice catch. I wonder though, isn't that code unsafe in other ways as well? What if the pg_depend entry was rewritten inbetween? Consider somebody doing something like REASSIGN OWNED concurrently with a DROP. The DROP possibly will cascade to an entry which changed the owner already. And the recheck will then report that the object doesn't exist anymore and abort since it does a simple HeapTupleSatisfiesVisibility() and doesn't follow the ctid chain if the tuple has been updated... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL renegotiation
On Fri, Jul 12, 2013 at 8:51 PM, Noah Misch n...@leadboat.com wrote: On Fri, Jul 12, 2013 at 04:32:52PM -0400, Alvaro Herrera wrote: Now, should we support the 0.9.6-and-earlier mechanism? My inclination is no; even RHEL 3, the oldest supported Linux distribution, uses 0.9.7 (Heck, even Red Hat Linux 9, released on 2003). To see OpenSSL 0.9.6 you need to go back to Red Hat Linux 7.2, released on 2001 using a Linux kernel 2.4. Surely no one in their right mind would use a current Postgres release on such an ancient animal. Agreed. The OpenSSL Project last applied a security fix to 0.9.6 over eight years ago. Compatibility with 0.9.6 has zero or negative value. +1 from me as well, if any more are needed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Differences in WHERE clause of SELECT
Robert Haas robertmh...@gmail.com writes: I continue to be of the opinion that our behavior in this area is bone-headed. It's reasonable to reject integer = character varying on the basis that we don't know whether integer or character varying comparison semantics are wanted, and the two might give different answers (think: leading zeroes), so we'd better ask the user to clarify. But '1' LIKE 1 is not ambiguous; there is only one plausible meaning for that, and we ought to adopt it, per the patch I proposed previously. Rejecting SQL that other systems happily accept is unhelpful and unfriendly and it is costing us users and mind-share. I don't agree with this opinion. '1' + '2' might be claimed to have only one plausible meaning as well, but that doesn't mean that we should accept it. There is a very fundamental distinction between numbers and strings, and we'd do our users no service by pretending there isn't. What's more, your argument for this essentially rests on the assumption that LIKE represents only one possible operator choice, which is already false today (see \do ~~) and might be even more false in future. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checking variadic any argument in parser - should be array
2013/7/15 Andrew Dunstan and...@dunslane.net: On 07/14/2013 12:28 AM, Pavel Stehule wrote: Hello 2013/7/14 Andrew Dunstan and...@dunslane.net: On 06/29/2013 03:29 PM, Pavel Stehule wrote: 5. This patch has user visibility, i.e. now we are throwing an error when user only says VARIADIC NULL like: select concat(variadic NULL) is NULL; Previously it was working but now we are throwing an error. Well we are now more stricter than earlier with using VARIADIC + ANY, so I have no issue as such. But I guess we need to document this user visibility change. I don't know exactly where though. I searched for VARIADIC and all related documentation says it needs an array, so nothing harmful as such, so you can ignore this review comment but I thought it worth mentioning it. yes, it is point for possible issues in RELEASE NOTES, I am thinking ??? Well, writer of release notes should be aware of this. And I hope he will be. So no issue. Is the behaviour change really unavoidable? Is it really what we want? Nobody seems to have picked up on this except the author and the reviewer. I'd hate us to do this and then surprise people. I'm not sure how many people are using VARIADIC any, but I have started doing so and expect to do so more, and I suspect I'm not alone. It doesn't disallow NULL - it disallow nonarray types on this possition, because there are must be only array type values. Other possible usage created unambiguous behave. so SELECT varfx(VARIADIC NULL) -- is disallowed but SELECT varfx(VARIADIC NULL::text[]) -- is allowed Quite so, I understand exactly what the defined behaviour will be. for example, I can wrote SELECT varfx(10,20,30), but I cannot write SELECT varfx(VARIADIC 10,20,30) - because this behave should be undefined. Can me send, your use case, where this check is unwanted, please. The only question I raised was for the NULL case. If you're not saying VARIADIC NULL then I have no issue. NULL is allowed - but it should be typed. Anyway, nobody else seem to care much (and I suspect very few people are writing VARIADIC any functions anyway, apart from you and me). So I'll see about getting this committed shortly. exactly Regards Pavel cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses
On Jul 14, 2013 9:46 PM, Greg Smith g...@2ndquadrant.com wrote: I updated and re-reviewed that in 2011: http://www.postgresql.org/message-id/4d31ae64.3000...@2ndquadrant.com and commented on why I think the improvement was difficult to reproduce back then. The improvement didn't follow for me either. It would take a really amazing bit of data to get me to believe write sorting code is worthwhile after that. On large systems capable of dirtying enough blocks to cause a problem, the operating system and RAID controllers are already sorting block. And *that* sorting is also considering concurrent read requests, which are a lot more important to an efficient schedule than anything the checkpoint process knows about. The database doesn't have nearly enough information yet to compete against OS level sorting. That reasoning makes no sense. OS level sorting can only see the writes in the time window between PostgreSQL write, and being forced to disk. Spread checkpoints sprinkles the writes out over a long period and the general tuning advice is to heavily bound the amount of memory the OS willing to keep dirty. This makes probability of scheduling adjacent writes together quite low, the merging window being limited either by dirty_bytes or dirty_expire_centisecs. The checkpointer has the best long term overview of the situation here, OS scheduling only has the short term view of outstanding read and write requests. By sorting checkpoint writes it is much more likely that adjacent blocks are visible to OS writeback at the same time and will be issued together. I gave the linked patch a shot. I tried it with pgbench scale 100 concurrency 32, postgresql shared_buffers=3GB, checkpoint_timeout=5min, checkpoint_segments=100, checkpoint_completion_target=0.5, pgdata was on a 7200RPM HDD, xlog on Intel 320 SSD, kernel settings: dirty_background_bytes = 32M, dirty_bytes = 128M. first checkpoint on master: wrote 209496 buffers (53.7%); 0 transaction log file(s) added, 0 removed, 26 recycled; write=314.444 s, sync=9.614 s, total=324.166 s; sync files=16, longest=9.208 s, average=0.600 s IO while checkpointing: about 500 write iops at 5MB/s, 100% utilisation. first checkpoint with checkpoint sorting applied: wrote 205269 buffers (52.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=149.049 s, sync=0.386 s, total=149.559 s; sync files=39, longest=0.255 s, average=0.009 s IO while checkpointing: about 23 write iops at 12MB/s, 10% utilisation. Transaction processing rate for a 20min run went from 5200 to 7000. Looks to me that in this admittedly best case workload the sorting is working exactly as designed, converting mostly random IO into sequential. I have seen many real world workloads where this kind of sorting would have benefited greatly. I also did a I/O bound test with scalefactor 100 and checkpoint_timeout 30min. 2hour average tps went from 121 to 135, but I'm not yet sure if it's repeatable or just noise. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Differences in WHERE clause of SELECT
On Tue, Jul 16, 2013 at 4:40 PM, Robert Haas robertmh...@gmail.com wrote: But '1' LIKE 1 is not ambiguous What about '1' LIKE 01 ? What do people who write this actually expect LIKE to do? I gather they're actually treating it as a synonym for =? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic background workers
On Wed, Jul 3, 2013 at 11:15 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund escribió: Just as a datapoint, if you benchmark the numbers of forks that can be performed by a single process (i.e. postmaster) the number is easily in the 10s of thousands. Now forking that much has some scalability implications inside the kernel, but still. I'd be surprised if the actual fork is more than 5-10% of the current cost of starting a new backend. I played at having some thousands of registered bgworkers on my laptop, and there wasn't even that much load. So yeah, you can have lots of forks. Since no one seems to be objecting to this patch beyond the lack of documentation, I've added documentation and committed it, with appropriate rebasing and a few minor cleanups. One loose end is around the bgw_sighup and bgw_sigterm structure members. If you're registering a background worker for a library that is not loaded in the postmaster, you can't (safely) use these for anything, because it's possible (though maybe not likely) for the worker process to map the shared library at a different address than where they are mapped in the backend that requests the new process to be started. However, that doesn't really matter; AFAICS, you can just as well call pqsignal to set the handlers to anything you want from the main entrypoint before unblocking signals. So I'm inclined to say we should just remove bgw_sighup and bgw_sigterm altogether and tell people to do it that way. Alternatively, we could give them the same treatment that I gave bgw_main: let the user specify a function name and we'll search the appropriate DSO for it. But that's probably less convenient for anyone using this facility than just calling pqsignal() before unblocking signals, so I don't see any real reason to go that route. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans
On Tue, Jul 16, 2013 at 05:56:10PM +0200, Andres Freund wrote: On 2013-07-16 09:50:07 -0400, Noah Misch wrote: With \c, in general, you may end up executing commands under the new session before the old backend has finished exiting. For this test case specifically, the two backends' attempts to drop table t regularly overlap. The old backend would drop it within RemoveTempRelationsCallback(), and the new backend would cascade from rowtype to drop it. findDependentObjects() deals with concurrent deletion attempts by acquiring a lock on each object it will delete, then calling systable_recheck_tuple() to determine whether another deleter was successful while the current backend was waiting for the lock. systable_recheck_tuple() uses the scan snapshot, which really only works if that snapshot is SnapshotNow or some other that changes its decision in response to concurrent transaction commits. The switch to MVCC snapshots left this mutual exclusion protocol ineffective. Nice catch. I wonder though, isn't that code unsafe in other ways as well? What if the pg_depend entry was rewritten inbetween? Consider somebody doing something like REASSIGN OWNED concurrently with a DROP. The DROP possibly will cascade to an entry which changed the owner already. And the recheck will then report that the object doesn't exist anymore and abort since it does a simple HeapTupleSatisfiesVisibility() and doesn't follow the ctid chain if the tuple has been updated... I'm not seeing a problem with that particular route. Say we're examining a pg_depend tuple where the referencing object is a table and the referenced object is a role, the table's owner. We're dropping the role and cascade to the table. If the REASSIGNED OWNED assigns the table to a different role, then we are correct to treat the dependency as gone. If it's the same role (REASSIGNED OWNED BY alice TO alice, pointless but permitted), several of the rename implementations short-circuit that case and don't change catalog entries. But even if that optimization were omitted, shdepChangeDep() will have blocked against our previously-acquired deletion lock on the role. Code that adds or updates a dependency without locking both objects of the new pg_depend tuple is buggy independently. That being said, there may well be a related mechanism that can slip past the locking here. My brain turns to mush when I ponder findDependentObjects() too thoroughly. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL renegotiation
On Fri, Jul 12, 2013 at 08:51:52PM -0400, Noah Misch wrote: On Fri, Jul 12, 2013 at 04:32:52PM -0400, Alvaro Herrera wrote: Now, should we support the 0.9.6-and-earlier mechanism? My inclination is no; even RHEL 3, the oldest supported Linux distribution, uses 0.9.7 (Heck, even Red Hat Linux 9, released on 2003). To see OpenSSL 0.9.6 you need to go back to Red Hat Linux 7.2, released on 2001 using a Linux kernel 2.4. Surely no one in their right mind would use a current Postgres release on such an ancient animal. Agreed. The OpenSSL Project last applied a security fix to 0.9.6 over eight years ago. Compatibility with 0.9.6 has zero or negative value. You've made a persuasive case that we should actively break backward compatibility here. Would that be complicated to do? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Differences in WHERE clause of SELECT
On Tue, Jul 16, 2013 at 12:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't agree with this opinion. I know. '1' + '2' might be claimed to have only one plausible meaning as well, but that doesn't mean that we should accept it. That example clearly has multiple sensible interpretations, because surely we have no way of knowing whether the user wants +(int2,int2), +(int4,int4), +(int8,int8), +(real,real), +(money,money), etc., etc. There are 42 binary + operators, of which at least 10 or so are plausible interpretations of that call. I have never proposed changing the behavior of this case - though, to be fair, if we found a way to just decide on int4, say, I bet we'd get very few complaints. There is a very fundamental distinction between numbers and strings, and we'd do our users no service by pretending there isn't. What's more, your argument for this essentially rests on the assumption that LIKE represents only one possible operator choice, which is already false today (see \do ~~) and might be even more false in future. This is a better argument, but I'm still not buying it. Sure, there are four candidate functions there, but they all perform the same logical operation. It looks to me like bpcharlike and textlike both call the same underlying function, and on a quick glance namelike appears only trivially different. Only bytealike is significantly different, in that it operates on the underlying bytes rather than the hex-escaped (or backslash-escaped) version of those bytes. It's a pretty good bet that the user did not want an implicit cast to bytea. PostgreSQL requires more casts that any other mainstream database system on the planet, and it's one of the biggest barriers to migrating from other database systems to PostgreSQL. I understand that there is a consensus in this forum to pretend that our intolerance is a virtue, but I don't agree with that consensus. A request for =(text,int) is ambiguous; we really do not know what is meant. A request for ~~(int,text) is not ambiguous; we reject it out of pedantry, not necessity. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses
On 7/16/13 12:46 PM, Ants Aasma wrote: Spread checkpoints sprinkles the writes out over a long period and the general tuning advice is to heavily bound the amount of memory the OS willing to keep dirty. That's arguing that you can make this feature be useful if you tune in a particular way. That's interesting, but the goal here isn't to prove the existence of some workload that a change is useful for. You can usually find a test case that validates any performance patch as helpful if you search for one. Everyone who has submitted a sorted checkpoint patch for example has found some setup where it shows significant gains. We're trying to keep performance stable across a much wider set of possibilities though. Let's talk about default parameters instead, which quickly demonstrates where your assumptions fail. The server I happen to be running pgbench tests on today has 72GB of RAM running SL6 with RedHat derived kernel 2.6.32-358.11.1. This is a very popular middle grade server configuration nowadays. There dirty_background_ratio and dirty_background_ratio are 10 (percent). That means that roughly 7GB of RAM can be used for write caching. Note that this is a fairly low write cache tuning compared to a survey of systems in the field--lots of people have servers with earlier kernels where these numbers can be as high as 20 or even 40% instead. The current feasible tuning for shared_buffers suggests a value of 8GB is near the upper limit, beyond which cache related overhead makes increases counterproductive. Your examples are showing 53% of shared_buffers dirty at checkpoint time; that's typical. The checkpointer is then writing out just over 4GB of data. With that background what process here has more data to make decisions with? -The operating system has 7GB of writes it's trying to optimize. That potentially includes backend, background writer, checkpoint, temp table, statistics, log, and WAL data. The scheduler is also considering read operations. -The checkpointer process has 4GB of writes from rarely written shared memory it's trying to optimize. This is why if you take the opposite approach of yours today--go searching for workloads where sorting is counterproductive--those are equally easy to find. Any test of write speed I do starts with about 50 different scale/client combinations. Why do I suggest pgbench-tools as a way to do performance tests? It's because an automated sweep of client setups like it does is the minimum necessary to create enough variation in workload for changing the database's write path. It's really amazing how often doing that shows a proposed change is just shuffling the good and bad cases around. That's been the case for every sorting and fsync delay change submitted so far. I'm not even interested in testing today's submission because I tried that particular approach for a few months, twice so far, and it fell apart on just as many workloads as it helped. The checkpointer has the best long term overview of the situation here, OS scheduling only has the short term view of outstanding read and write requests. True only if shared_buffers is large compared to the OS write cache, which was not the case on the example I generated with all of a minute's work. I regularly see servers where Linux's Dirty area becomes a multiple of the dirty buffers written by a checkpoint. I can usually make that happen at will with CLUSTER and VACUUM on big tables. The idea that the checkpointer has a long-term view while the OS has a short one, that presumes a setup that I would say is possible but not common. kernel settings: dirty_background_bytes = 32M, dirty_bytes = 128M. You disclaimed this as a best case scenario. It is a low throughput / low latency tuning. That's fine, but if Postgres optimizes itself toward those cases it runs the risk of high throughput servers with large caches being detuned. I've posted examples before showing very low write caches like this leading to VACUUM running at 1/2 its normal speed or worse, as a simple example of where a positive change in one area can backfire badly on another workload. That particular problem was so common I updated pgbench-tools recently to track table maintenance time between tests, because that demonstrated an issue even when the TPS numbers all looked fine. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A general Q about index
On Tue, Jul 16, 2013 at 7:00 PM, David Johnston pol...@yahoo.com wrote: David Johnston wrote soroosh sardari wrote Hi I want to know how an index is created and used. actually if you can show to me a simple start point, it would be great. Regards, Soroosh Sardari In the documentation there is a table of contents and listed on that table is a section named Indexes. David J. Since you posted this to hacker you may mean you wish to know how to program them as oppose to use them in SQL. I have no clue to this regard. If you did mean use in SQL then the documentation is your friend and you also should not have posted this question to -hackers but to -general instead; probably should have posted there anyway to begin with and been more clear as to what you mean by created and used. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/A-general-Q-about-index-tp5763912p5763928.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Actually I mean the index in the source code, I want to know how index is used in the backend for execution of some query. sorry my question is a bit ambiguous, I start to read src/backend/access/index/genam.c, and I think maybe this file is not good way to start. let me write a bunch of question to illustrate my point, -How create an index for a table with specified field -How index is stored in a page -what module is responsible to fetch and swap out index pages -Is there a kind of abstract class for all variant of index?
Re: [HACKERS] Differences in WHERE clause of SELECT
On Tue, Jul 16, 2013 at 12:58 PM, Robert Haas robertmh...@gmail.com wrote: PostgreSQL requires more casts that any other mainstream database system on the planet, and it's one of the biggest barriers to migrating from other database systems to PostgreSQL. I understand that there is a consensus in this forum to pretend that our intolerance is a virtue, but I don't agree with that consensus. A request for =(text,int) is ambiguous; we really do not know what is meant. A request for ~~(int,text) is not ambiguous; we reject it out of pedantry, not necessity. You make some good points but I still wonder (I'm recalling your lpad discussion from a while back) if we're better off providing specific overloads and casts for portability pain points instead of changing casting mechanics. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: template-ify (binary) extensions
On 07/16/2013 01:27 AM, Robert Haas wrote: Andres points out that you can install adminpack to obtain local filesystem access, and that is true. But the system administrator can also refuse to allow adminpack, and/or use selinux or other mechanisms to prevent the postgres binary from writing a file with execute permissions. I think execute permissions (on the FS) are irrelevant. It's about loading a shared library. The noexec mount option can prevent that, though. But okay, you're saying we *have* and *want* a guarantee that even a superuser cannot execute arbitrary native code via libpq (at least in default installs w/o extensions). Andres made two contrib-free suggestions: with COPY TO BINARY, you get a header prepended, which I think is sufficient to prevent a dlopen() or LoadLibrary(). Text and CSV formats of COPY escape their output, so it's hard to write \000 or other control bytes. ESCAPE and DELIMITER also have pretty restrictive requirements. So COPY doesn't seem quite good enough to write a valid DSO. His second suggestion was tuplesort tapes. tuplesort.c says: We require the first unsigned int of a stored tuple to be the total size on-tape of the tuple That's kind of a header as well. Writing a proper DSO certainly does not sound trivial, either. From a security perspective, I wouldn't want to rely on that guarantee. Postgres writes too many files to be sure none of those can be abused to write a loadable DSO, IMO. Mounting $PGDATA 'noexec' and allowing the postgres user to write only to such noexec mounts sounds like a good layer. It's independent, though - it can be used whether or not the above guarantee holds. Things aren't quite so bad if we write the bits to a file first and then dynamically load the file. That way at least noexec or similar can provide protection. But it still seems like a pretty dangerous direction. I agree now. Thanks for elaborating. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Differences in WHERE clause of SELECT
On Tue, Jul 16, 2013 at 2:58 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Jul 16, 2013 at 12:58 PM, Robert Haas robertmh...@gmail.com wrote: PostgreSQL requires more casts that any other mainstream database system on the planet, and it's one of the biggest barriers to migrating from other database systems to PostgreSQL. I understand that there is a consensus in this forum to pretend that our intolerance is a virtue, but I don't agree with that consensus. A request for =(text,int) is ambiguous; we really do not know what is meant. A request for ~~(int,text) is not ambiguous; we reject it out of pedantry, not necessity. You make some good points but I still wonder (I'm recalling your lpad discussion from a while back) if we're better off providing specific overloads and casts for portability pain points instead of changing casting mechanics. TBH, I think we've gone too far in that direction already. We've got numerous cases where there are multiple function or operator declarations that exist only to work around shortcomings of the typecasting system. For example, we've got pg_size_pretty(bigint) and pg_size_pretty(numeric); if we adopted the patch that I proposed previously, one of those would be enough, and then it would work for integers, too. We could instead add a third version for integers, but where's the joy in that? We've currently got 10 different functions for || that include ||(anynonarray,text), ||(text,anynonarray), and ||(text,text). The first two of those are workarounds for the fact that the third one does not do what you want. And then there's concat(variadic any), which is yet a third way of hacking around the same general category of problem. The patch I proposed previously would allow concat to be declared as variadic text if we so desired, but it wouldn't help with the || case. I think that's OK; I never pretended that patch would fix everything, but I still think it's an improvement over what we have now. We can certainly continue to play whack-a-mole and dream up a new solution every time a really intolerable variant of this problem comes up. But that doesn't seem good to me. It means that every case behaves a little different from every other case, and the whole thing is kinda arcane and hard to understand, even for hackers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: template-ify (binary) extensions
On Tue, Jul 16, 2013 at 3:14 PM, Markus Wanner mar...@bluegap.ch wrote: But okay, you're saying we *have* and *want* a guarantee that even a superuser cannot execute arbitrary native code via libpq (at least in default installs w/o extensions). Yes, that's a good way of summarizing my position. I think I'd support having an extension that allows that, although I don't think I'd want such an extension installed on any machine I administer. But I oppose having it be something the server allows by default. YMMV. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: Non-recursive processing of AND/OR lists
Hello 2013/7/15 Gurjeet Singh gurj...@singh.im: On Sun, Jul 14, 2013 at 8:27 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jul 10, 2013 at 9:02 PM, Josh Berkus j...@agliodbs.com wrote: I think it's a waste of code to try to handle bushy trees. A list is not a particularly efficient representation of the pending list; this will probably be slower than recusing in the common case. I'd suggest keeping the logic to handle left-deep trees, which I find rather elegant, but ditching the pending list. Somehow I find it hard to believe that recursing would be more efficient than processing the items right there. The recursion is not direct either; transformExprRecurse() is going to call this function again, but after a few more switch-case comparisons. Agreed that there's overhead in allocating list items, but is it more overhead than pushing functions on the call stack? Not sure, so I leave it to others who understand such things better than I do. If by common-case you mean a list of just one logical AND/OR operator, then I agree that creating and destroying a list may incur overhead that is relatively very expensive. To that end, I have altered the patch, attached, to not build a pending list until we encounter a node with root_expr_kind in a right branch. We're getting bushy-tree processing with very little extra code, but if you deem it not worthwhile or adding complexity, please feel free to rip it out. Is there going to be further discussion of this patch, or do I return it? Considering it's not been updated, nor my comments responded to, in almost two weeks, I think we return it at this point. Sorry, I didn't notice that this patch was put back in 'Waiting on Author' state. I did a some performance tests of v5 and v6 version and there v5 is little bit faster than v6, and v6 has significantly higher stddev but I am not sure, if my test is correct - I tested a speed of EXPLAIN statement - result was forwarded to /dev/null Result of this test is probably related to tested pattern of expressions - in this case expr or expr or expr or expr or expr ... 10 000 exprs (ms) v | avg | stddev ---+-+ 5 | 1839.14 | 13.68 6 | 1871.77 | 48.02 ==v5 profile== 209064 43.5354 postgres equal 207849 43.2824 postgres process_equivalence 37453 7.7992 postgres datumIsEqual 3178 0.6618 postgres SearchCatCache 2350 0.4894 postgres AllocSetAlloc ==v6 profile== 193251 45.3998 postgres process_equivalence 178183 41.8599 postgres equal 30430 7.1488 postgres datumIsEqual 2819 0.6623 postgres SearchCatCache 1951 0.4583 postgres AllocSetAlloc I found so 9.4 planner is about 1% slower (for test that sent by Gurjeet), that than 9.2 planner, but it is not related to this patch v6 is clean and all regression tests was passed Regards Pavel Best regards, -- Gurjeet Singh http://gurjeet.singh.im/ EnterpriseDB Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Differences in WHERE clause of SELECT
Robert Haas robertmh...@gmail.com wrote: We can certainly continue to play whack-a-mole and dream up a new solution every time a really intolerable variant of this problem comes up. But that doesn't seem good to me. It means that every case behaves a little different from every other case, and the whole thing is kinda arcane and hard to understand, even for hackers. If you're building up a list of things that generate errors in PostgreSQL but not other DBMS products, make sure you have this: test=# create table t(d date); CREATE TABLE test=# insert into t values (NULL); INSERT 0 1 test=# insert into t values (COALESCE(NULL, NULL)); ERROR: column d is of type date but expression is of type text LINE 1: insert into t values (COALESCE(NULL, NULL)); ^ HINT: You will need to rewrite or cast the expression. From a user perspective, it's hard to explain why COALESCE(NULL, NULL) fails in a location that a bare NULL works. From the perspective of those working on the code, and looking at the problem from the inside out, it seems sane; but that's the only perspective from which it does. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_memory_barrier() doesn't compile, let alone work, for me
On Sun, Jul 14, 2013 at 09:26:38PM -0400, Robert Haas wrote: I'm pretty sure we've got latent memory-ordering risks in our existing code which we just haven't detected and fixed yet. Consider, for example, this exciting code from GetNewTransactionId: myproc-subxids.xids[nxids] = xid; mypgxact-nxids = nxids + 1; I don't believe that's technically safe even on an architecture like x86, because the compiler could decide to reorder those assignments. Of course there is probably no reason to do so, and even if it does you'd have to get really unlucky to see a user-visible failure, and if you did you'd probably misguess the cause. You're probably right. Note that it's not even just the compiler that might reorder them, the CPU/cache subsystem/memory bus all play their part in memory reordering. x86 is pretty forgiving, which is why it works. I found this to be a really good explanation of all the things that can go wrong with memory ordering. It also explains why, in the long run, memory barriers are not optimal. http://herbsutter.com/2013/02/11/atomic-weapons-the-c-memory-model-and-modern-hardware/ That talk discusses how the hardware world is converging on SC [1] as the memory model to use. And C11/C++11 atomics will implement this for the programmer. With these you can actually make guarentees. For example, by marking mypgxact-nxids as an atomic type the compiler will emit all the necessary markings to let the CPU know what you want, so everything works the way you expect it to. Even on arcane architechtures. No explicit barriers needed. Unfortunatly, it won't help on compilers that don't support it. [1] http://en.wikipedia.org/wiki/Sequential_consistency There are places where you put code in and verify it does what you want. With this one you can put test programs in and it can tell you all possibly results due to memory reordering. http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/help.html Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Proposal - Support for National Characters functionality
On Mon, Jul 15, 2013 at 05:11:40PM +0900, Tatsuo Ishii wrote: Does support for alternative multi-byte encodings have something to do with the Han unification controversy? I don't know terribly much about this, so apologies if that's just wrong. There's a famous problem regarding conversion between Unicode and other encodings, such as Shift Jis. There are lots of discussion on this. Here is the one from Microsoft: http://support.microsoft.com/kb/170559/EN-US Apart from Shift-JIS not being a well defined (it's more a family of encodings) it has the unusual feature of providing multiple ways to encode the same character. This is not even a Han unification issue, they have largely been addressed. For example, the square-root symbol exists twice (0x8795 and 0x81E3) and many other mathmatical symbols also. Here's the code page which you can browse online: http://msdn.microsoft.com/en-us/goglobal/cc305152 Which means to be round-trippable Unicode would have to double those characters, but this would make it hard/impossible to round-trip with any other character set that had those characters. No easy solution here. Something that has been done before [1] is to map the doubles to the custom area of the unicode space (0xe000-0x). It gives you round-trip support at the expense of having to handle those characters yourself. But since postgres doesn't do anything meaningful with unicode characters this might be acceptable. [1] Python does a similar trick to handle filenames coming from disk in an unknown encoding: http://docs.python.org/3/howto/unicode.html#files-in-an-unknown-encoding Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
On 07/08/2013 04:59 PM, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Well, Tom opined in http://www.postgresql.org/message-id/23249.1370878...@sss.pgh.pa.us that the current patch is okay. I have a mild opinion that it should instead print only SHR_LOCK when both bits are set, and one of the others when only one of them is set. But I don't have a strong opinion about this, and since Tom disagrees with me, feel free to exercise your own (Jeff's) judgement. FWIW, I think that's exactly what I did in the preliminary 9.3 patch that I committed to pg_filedump a few weeks ago. Could you take a look at what's there now and see if that's what you meant? So, is this getting committed today, or do we bounce it? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 07/11/2013 09:28 AM, Greg Stark wrote: Neither of these address Tom's concerns about API changes and future flexibility. I was assigned this patch in the rreviewers list and my inclination would be to take it but I wasn't about to overrule Tom. If he says he's ok with it then I'm fine going ahead and reviewing the code. If I still have commit bits I could even commit it. API changes? I can't find that issue in the discussion. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
Josh Berkus j...@agliodbs.com writes: On 07/08/2013 04:59 PM, Tom Lane wrote: FWIW, I think that's exactly what I did in the preliminary 9.3 patch that I committed to pg_filedump a few weeks ago. Could you take a look at what's there now and see if that's what you meant? So, is this getting committed today, or do we bounce it? I was hoping for a comment from Alvaro, but wouldn't have gotten to committing it today in any case. IMO this patch doesn't really belong in the commitfest queue, since pg_filedump isn't part of the community distribution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Review] Re: [HACKERS] minor patch submission: CREATE CAST ... AS EXPLICIT
On 07/09/2013 01:10 AM, Fabien COELHO wrote: Where are we with this patch? Fabien, are you going to submit an updated version which addresses the objections, or should I mark it Returned With Feedback? There is no need for an updated patch. I addressed the objections with words, not code:-) So, Tom, Robert, Cedric: can we have a verdict? Commit or no? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Review] Re: [HACKERS] minor patch submission: CREATE CAST ... AS EXPLICIT
Josh Berkus j...@agliodbs.com writes: On 07/09/2013 01:10 AM, Fabien COELHO wrote: Where are we with this patch? Fabien, are you going to submit an updated version which addresses the objections, or should I mark it Returned With Feedback? There is no need for an updated patch. I addressed the objections with words, not code:-) So, Tom, Robert, Cedric: can we have a verdict? Commit or no? My vote is still no, because of (1) the keyword-creep issue, and (2) the fact that this is proposing to invent non-standard syntax for functionality that's in the SQL standard. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [9.4 CF 1] SDO: Patches for Review
Hackers, As the Commitfest is past deadline, here is the disposition of 15 patches which were still marked Needs Review: Needs Review The following patches have not had even one substantial review. Please someone pick them up and review them ASAP: - Row-Level Security - KaiGai Kohei - Performance Improvement by reducing WAL for Update Operation - Amit Kapila/Hari Babu - Fix pgstattuple/pgstatindex to use regclass-type as the argument - Satoshi Nagasayu - Revive Line Type - Peter Eisentraut In Limbo These two patches are still under active review as of today, so we are waiting on the outcome of that before marking them one way or the other: - Lag Lead Window Functions Can Ignore Nulls - Nicholas White - SQL Command to edit postgresql.conf (ALTER SYSTEM patch) - Amit Kapila I couldn't determine what status the following patch had: - CREATE CAST ... AS EXPLICIT - Fabien Coelho Returned, Please Resubmit = Finally, during the commitfest we promise each submitter one serious review. In most cases, we try to do as much review is required to get the patches into committable state. However, it is now the end date of the commitfest, and the following patches did get one serious review, even if they were updated after the review. The authors of these patches are strongly encouraged to resubmit them for the September Commitfest, or to contact me and I will transfer the patches. Add support to IF NOT EXISTS to others CREATE statements - Fabrizio de Royes Mello TransactionIdIsInProgress() procno cache - Simon Riggs Patch to compute Max LSN of Data Pages (from 2012-11) - Amit Kapila/Hari Babu fail-back without fresh backup - Samrat Revagade, Sawada Masahiko updated emacs configuration - Peter Eisentraut Add more regression tests for ROLE (USER) - Robins Tharakan Add more regression tests for SCHEMA - Robins Tharakan Add more regression tests for SEQUENCE - Robins Tharakan -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Review] Re: [HACKERS] minor patch submission: CREATE CAST ... AS EXPLICIT
On 07/16/2013 03:12 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 07/09/2013 01:10 AM, Fabien COELHO wrote: Where are we with this patch? Fabien, are you going to submit an updated version which addresses the objections, or should I mark it Returned With Feedback? There is no need for an updated patch. I addressed the objections with words, not code:-) So, Tom, Robert, Cedric: can we have a verdict? Commit or no? My vote is still no, because of (1) the keyword-creep issue, and (2) the fact that this is proposing to invent non-standard syntax for functionality that's in the SQL standard. Ok, marking Returned with Feedback. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [9.4 CF 1]SDO: Patches awaiting Author
Hackers, The following patches were Waiting on Author as of today. As such, most have been marked returned with feedback. Authors are encouraged to update their patches and submit to the next Commitfest. There are a few exceptions to Returned with feedback, though: Could Not Determine Status: logical changeset generation v5 Andres Freund Pluggable toast compression Andres Freund Didn't Belong in CF: pg_filedump 9.3: checksums (and a few other fixes) The rest, all Returned with Feedback: Extension templates Dimitri Fontaine COPY tuning: allow batched inserts with SERIAL/default nextval()Simon Riggs Non-recursive processing of AND/OR listsGurjeet Singh Add basic regression tests for SET xxx Robins Tharakan Add basic regression tests for DISCARD Robins Tharakan Remove useless USE_PGXS support in contrib Peter Eisentraut Add visibility map information to pg_freespace Koytaro Horiguchi Reduce maximum error in tuples estimation after vacuum Koytaro Horiguchi Make recovery.conf parameters into GUCs Simon Riggs, et al transforms Peter Eisentraut -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogInsert scaling, revisited
On Tue, Jul 16, 2013 at 2:24 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jul 8, 2013 at 6:16 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Ok, I've committed this patch now. Finally, phew! I found that this patch causes the assertion failure. When I set up simple replication environment and promoted the standby before executing any transaction on the master, I got the following assertion failure. 2013-07-16 02:22:06 JST sby1 LOG: received promote request 2013-07-16 02:22:06 JST sby1 FATAL: terminating walreceiver process due to administrator command 2013-07-16 02:22:06 JST sby1 LOG: redo done at 0/2F0 2013-07-16 02:22:06 JST sby1 LOG: selected new timeline ID: 2 hrk:head-pgsql postgres$ 2013-07-16 02:22:06 JST sby1 LOG: archive recovery complete TRAP: FailedAssertion(!(readOff == (XLogCtl-xlblocks[firstIdx] - 8192) % ((uint32) (16 * 1024 * 1024))), File: xlog.c, Line: 7048) 2013-07-16 02:22:12 JST sby1 LOG: startup process (PID 37115) was terminated by signal 6: Abort trap 2013-07-16 02:22:12 JST sby1 LOG: terminating any other active server processes Note that this is also reproducible even when trying to recover only from archives without strrep. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL renegotiation
On Tue, Jul 16, 2013 at 10:41:44AM -0700, David Fetter wrote: On Fri, Jul 12, 2013 at 08:51:52PM -0400, Noah Misch wrote: Agreed. The OpenSSL Project last applied a security fix to 0.9.6 over eight years ago. Compatibility with 0.9.6 has zero or negative value. You've made a persuasive case that we should actively break backward compatibility here. Would that be complicated to do? Nope. If Alvaro's code change builds under 0.9.6, malfunctioning only at runtime, I suspect we would add a configure-time version check and possibly a runtime one as well. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] findDependentObjects() mutual exclusion vs. MVCC catalog scans
On Tue, Jul 16, 2013 at 11:27:02AM -0400, Robert Haas wrote: I recommend reworking the header comment to avoid mention of SnapshotNow, since if we get rid of SnapshotNow, the reference might not be too clear to far-future hackers. + /* + * For a scan using a non-MVCC snapshot like SnapshotSelf, we would simply + * reuse the old snapshot. So far, the only caller uses MVCC snapshots. + */ + freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan-heap_rel)); This comment is not very clear, because it doesn't describe what the code actually does, but rather speculates about what the code could do if the intention of some future caller were different. I recommend adding Assert(IsMVCCSnapshot(scan-xs_snapshot)) and changing the comment to something like this: For now, we don't handle the case of a non-MVCC scan snapshot. This is adequate for existing uses of this function, but might need to be changed in the future. On Tue, Jul 16, 2013 at 11:35:48AM -0400, Tom Lane wrote: I agree with Robert's comments, and in addition suggest that this code needs a comment about why it's safe to use the snapshot without doing RegisterSnapshot or equivalent. Committed with hopefully-better comments. Thanks. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Mon, Jul 15, 2013 at 11:43:04AM -0700, David Fetter wrote: On Sun, Jul 14, 2013 at 10:15:12PM -0400, Noah Misch wrote: See attached patch revisions. The first patch edits find_minmax_aggs_walker() per my comments just now. The second is an update of your FILTER patch with the changes to which I alluded above; it applies atop the first patch. Would you verify that I didn't ruin anything? Barring problems, will commit. Tested your changes. They pass regression, etc. :) Committed. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Adding optionally commit number in PG_VERSION_STR
Hi all, It happens that I work occasionally on multiple builds based on different stable branches at the same time to check fixes that need to be backpatched, and I tend to easily lose track on which version the build I created is based on (Duh!). There is of course the version number up to the 3rd digit available (for example 9.2.4, 9.3beta2, etc.), but as a developer I think that it would be helpful to include the commit ID in PG_VERSION_STR to get a better reference on exactly what the development build is based on. This could be controlled by an additional flag in ./configure.in called something like --enable-version-commit, of course disabled by default. If enabled, PG_VERSION_STR would be generated with the new information. configure would also return an error when this flag is enabled if git is either not found, or if the repository where configure is not a native git repository. Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Listen/notify across clusters
On 07/15/2013 04:10 PM, Greg Jaskiewicz wrote: In terms of features, apart from separating LISTEN so that it can be actually used on Standbys, wouldn't it be a matter of including the notifications in the WAL stream, as simple packets ? This would guarantee same behaviour as on the master. I guess one problem is to implement writing to the WAL with the smallest possible performance hit. As far as I can see there are two possible approaches: either write to WAL when NOTIFY is run or write to WAL on commit. The former seems more in line with how commands in PostgreSQL usually work. There shouldn't be any major problems with implementing LISTEN on the slaves since LISTEN is done in memory. I feel like I as a beginner to the codebase am missing something bit because while this is a fair bit of work it does not too hard to implement. Andreas -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding optionally commit number in PG_VERSION_STR
Michael Paquier michael.paqu...@gmail.com writes: It happens that I work occasionally on multiple builds based on different stable branches at the same time to check fixes that need to be backpatched, and I tend to easily lose track on which version the build I created is based on (Duh!). There is of course the version number up to the 3rd digit available (for example 9.2.4, 9.3beta2, etc.), but as a developer I think that it would be helpful to include the commit ID in PG_VERSION_STR to get a better reference on exactly what the development build is based on. This could be controlled by an additional flag in ./configure.in called something like --enable-version-commit, of course disabled by default. If enabled, PG_VERSION_STR would be generated with the new information. configure would also return an error when this flag is enabled if git is either not found, or if the repository where configure is not a native git repository. Personally, I'd find that pretty useless, because the build I'm running has typically got uncommitted changes in it. Also, there isn't any good way to set PG_VERSION_STR except at configure time, which means the common practice of doing (at most) make clean before rebuilding would not result in an updated string even if you had committed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses
On Tue, Jul 16, 2013 at 9:17 PM, Greg Smith g...@2ndquadrant.com wrote: On 7/16/13 12:46 PM, Ants Aasma wrote: Spread checkpoints sprinkles the writes out over a long period and the general tuning advice is to heavily bound the amount of memory the OS willing to keep dirty. That's arguing that you can make this feature be useful if you tune in a particular way. That's interesting, but the goal here isn't to prove the existence of some workload that a change is useful for. You can usually find a test case that validates any performance patch as helpful if you search for one. Everyone who has submitted a sorted checkpoint patch for example has found some setup where it shows significant gains. We're trying to keep performance stable across a much wider set of possibilities though. Let's talk about default parameters instead, which quickly demonstrates where your assumptions fail. The server I happen to be running pgbench tests on today has 72GB of RAM running SL6 with RedHat derived kernel 2.6.32-358.11.1. This is a very popular middle grade server configuration nowadays. There dirty_background_ratio and dirty_background_ratio are 10 (percent). That means that roughly 7GB of RAM can be used for write caching. Note that this is a fairly low write cache tuning compared to a survey of systems in the field--lots of people have servers with earlier kernels where these numbers can be as high as 20 or even 40% instead. The current feasible tuning for shared_buffers suggests a value of 8GB is near the upper limit, beyond which cache related overhead makes increases counterproductive. Your examples are showing 53% of shared_buffers dirty at checkpoint time; that's typical. The checkpointer is then writing out just over 4GB of data. With that background what process here has more data to make decisions with? -The operating system has 7GB of writes it's trying to optimize. That potentially includes backend, background writer, checkpoint, temp table, statistics, log, and WAL data. The scheduler is also considering read operations. -The checkpointer process has 4GB of writes from rarely written shared memory it's trying to optimize. Actually I was arguing that the reasoning that OS will take care of the sorting does not apply in reasonably common cases. My point is that the OS isn't able to optimize the writes because spread checkpoints trickle the writes out to the OS in random order over a long time. If OS writeback behavior is left in the default configuration it will start writing out data before checkpoint write phase ends (due to dirty_expire_centisecs), this will miss write combining opportunities that would arise if we sorted the data before dumping them to the OS dirty buffers. I'm not arguing that we try to bypass OS I/O scheduling decisions, I'm arguing that by arranging checkpoint writes in logical order we will make pages visible to the I/O scheduler in a way that will lead to more efficient writes. Also I think that you are overestimating the capabilities of the OS IO scheduler. At least for Linux, the IO scheduler does not see pages in the dirty list - only pages for which writeback has been initiated. In default configuration this means up to 128 read and 128 write I/Os are considered. The writes are picked by basically doing round robin on files with dirty pages and doing a clocksweep scan for a chunk of pages from each. So in reality there is practically no benefit in having the OS do the reordering, while there is the issue that flushing a large amount of dirty pages at once does very nasty things to query latency by overloading all of the I/O queues. This is why if you take the opposite approach of yours today--go searching for workloads where sorting is counterproductive--those are equally easy to find. Any test of write speed I do starts with about 50 different scale/client combinations. Why do I suggest pgbench-tools as a way to do performance tests? It's because an automated sweep of client setups like it does is the minimum necessary to create enough variation in workload for changing the database's write path. It's really amazing how often doing that shows a proposed change is just shuffling the good and bad cases around. That's been the case for every sorting and fsync delay change submitted so far. I'm not even interested in testing today's submission because I tried that particular approach for a few months, twice so far, and it fell apart on just as many workloads as it helped. As you know running a full suite of write benchmarks takes a very long time, with results often being inconclusive (noise is greater than effect we are trying to measure). This is why I'm interested which workloads you suspect might fall apart from this patch - because I can't think of any. Worst case would be that the OS fully absorbs all checkpoint writes before writing anything out, so the sorting is useless waste of CPU and memory. The CPU
Re: [HACKERS] A general Q about index
On Tue, Jul 16, 2013 at 7:00 PM, David Johnston pol...@yahoo.com wrote: David Johnston wrote soroosh sardari wrote Hi I want to know how an index is created and used. actually if you can show to me a simple start point, it would be great. Regards, Soroosh Sardari In the documentation there is a table of contents and listed on that table is a section named Indexes. David J. Since you posted this to hacker you may mean you wish to know how to program them as oppose to use them in SQL. I have no clue to this regard. If you did mean use in SQL then the documentation is your friend and you also should not have posted this question to -hackers but to -general instead; probably should have posted there anyway to begin with and been more clear as to what you mean by created and used. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/A-general-Q-about-index-tp5763912p5763928.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Actually I mean the index in the source code, I want to know how index is used in the backend for execution of some query. sorry my question is a bit ambiguous, I start to read src/backend/access/index/genam.c, and I think maybe this file is not good way to start. let me write a bunch of question to illustrate my point, -How create an index for a table with specified field -How index is stored in a page -what module is responsible to fetch and swap out index pages -Is there a kind of abstract class for all variant of index?
Re: [HACKERS] A general Q about index
On 07/16/2013 11:09 PM, soroosh sardari wrote: Actually I mean the index in the source code, I want to know how index is used in the backend for execution of some query. sorry my question is a bit ambiguous, I start to read src/backend/access/index/genam.c, and I think maybe this file is not good way to start. let me write a bunch of question to illustrate my point, -How create an index for a table with specified field -How index is stored in a page -what module is responsible to fetch and swap out index pages -Is there a kind of abstract class for all variant of index? Start with the documentation: http://www.postgresql.org/docs/current/static/internals.html particularly: http://www.postgresql.org/docs/current/static/indexam.html http://www.postgresql.org/docs/current/static/gist.html http://www.postgresql.org/docs/current/static/catalog-pg-index.html http://www.postgresql.org/docs/current/static/catalog-pg-am.html That should give you the ground work to understand the source code a bit better. Then feel free to return with specific questions you don't feel are explained by the documentation or code comments and aren't obvious from the code. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers