Re: [HACKERS] Prefered Types
04.05.2011 0:00, Alvaro Herrera пишет: Excerpts from Зотов Роман's message of mar may 03 16:31:31 -0300 2011: but here we can see problem like F(smallint) F(integer) but call like F(float) i wouldn`t like to fail it. I think this particular example would be a mistake, because that cast would be lossy, so you want to invoke it only when the user explicitely selects it. The other way around would be fine, I think, that is, F(float) F(float8) and the call is F(int) As i think i not must write function with Float, String, and many other arg when my function have INT arg... and if caller wouln`t think about types he cant use your strong types why it not must work like as assignment??? why implicit and assignment is different??? I know only implicit and explicit casts and i think imlicit=asssign PS This patch needet, because in any case we must calc prefer more smartly, yes this patch is 1/10 of full solution, but it`s first step!!! Well, if the other 9/10 were clear, there would be no discussion. The problem is that the missing bits have not been designed and thus we don't know if this 1/10 will be useful to them. We need to find a complete design before committing to any initial portion which may turn out to be bogus down the road. Yes, but while you think what update table1 set IntField = FloatField is valid but Select FuncWithIntArg(FloatArg) is not valid you have no problems in current solution, because it works same :) -- С уважением, Зотов Роман Владимирович руководитель Отдела разработки ЗАО НПО Консультант г.Иваново, ул. Палехская, д. 10 тел./факс: (4932) 41-01-21 mailto: zo...@oe-it.ru -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extreme bloating of intarray GiST indexes
I have another hypothesis about index bloat cause. AFAIK, vaccum procedure on GiST don't have any storage utilization guarantee. For example, if only one live item is in some page, then only one item will be left in this page. I.e. there is no index reroganization during vacuum. If there wouldn't be many inserts into such pages in future then they will be stay bloat. With best regards, Alexander Korotkov.
[HACKERS] Pull up aggregate subquery
I sometimes wonder if we could pull up aggregate query in the optimizer. My typical problem is: Consider two relations, medium size M and large size L. L has reference column to M's primary key. Say, create table size_m as select i as id, repeat(i::text, i % 100) as val from generate_series(1, 2) i; create table size_l as select i as id, m.id as m_id, repeat(i::text, i % 100) as val from generate_series(1, 10) i, size_m m where i.i / 10 = m.id; Now, you want to query M under some condition with join aggregate L group by M's primary key. select m_id, sum_len from size_m m inner join(select m_id, sum(length(val)) as sum_len from size_l group by m_id)l on m.id = l.m_id where val = '1'; The generated plan is: QUERY PLAN - Nested Loop (cost=36116.92..38339.67 rows=50 width=235) (actual time=440.679..1039.964 rows=1 loops=1) Join Filter: (m.id = size_l.m_id) - Seq Scan on size_m m (cost=0.00..897.00 rows=1 width=227) (actual time=0.021..16.698 rows=1 loops=1) Filter: (val = '1'::text) - GroupAggregate (cost=36116.92..37217.09 rows=10026 width=248) (actual time=440.651..1013.704 rows=1 loops=1) - Sort (cost=36116.92..36366.90 rows=1 width=248) (actual time=440.619..593.062 rows=1 loops=1) Sort Key: size_l.m_id Sort Method: external sort Disk: 25736kB - Seq Scan on size_l (cost=0.00..4565.91 rows=1 width=248) (actual time=0.011..138.243 rows=1 loops=1) Total runtime: 1044.039 ms (10 rows) Note that most of the result of aggregate is discarded on join M, because M resulted in small output with filter by M.val. If we can filter M first and filter L by the M's result before running aggregate, the response may dramatically get faster. If you filter by M.id instead of M.val the optimizer is smart enough to push down the condition to L, which is filtered before aggregate. select m_id, sum_len from size_m m inner join(select m_id, sum(length(val)) as sum_len from size_l group by m_id)l on m.id = l.m_id where id = 1; QUERY PLAN Nested Loop (cost=0.00..5713.02 rows=2 width=235) (actual time=72.121..82.364 rows=1 loops=1) - Seq Scan on size_m m (cost=0.00..897.00 rows=1 width=227) (actual time=0.028..10.252 rows=1 loops=1) Filter: (id = 1) - GroupAggregate (cost=0.00..4815.98 rows=2 width=248) (actual time=72.065..72.067 rows=1 loops=1) - Seq Scan on size_l (cost=0.00..4815.89 rows=10 width=248) (actual time=0.051..71.968 rows=10 loops=1) Filter: (m_id = 1) Total runtime: 82.525 ms (7 rows) This seems like the benefit of EquivalentClass. 1 = M.id = L.m_id is implied and the optimizer adds implicit constant filter L.m_id = 1 in the plan tree. In contrast, in the former case of M.val = '1' doesn't imply any condition for L.m_id. That's fair enough. However, I think we can filter L by L.m_id before aggregate because L.m_id is of the group clause as well as the join condition and M.id is unique in M. In such cases, the query can be transform something like: GroupAggregate - NestLoop (L.m_id = M.id) - SeqScan L - SeqScan M (filter: M.val = '1') This transformation can be done by pulling up aggregate Query in pull_up_subqueries(). Currently the optimizer doesn't pull up any queries which contains aggregate, but as shown above in some cases we can do it. Attached is WIP proof of concept patch to do it. I know it breaks general queries but it transforms as I described above. I suppose the missing piece is adding condition of when to pull up aggregate. how is done in the patch. db1=# explain select m_id, sum_len from size_m m inner join(select m_id, sum(length(val)) as sum_len from size_l group by m_id)l on m.id = l.m_id where val = '1'; QUERY PLAN GroupAggregate (cost=6712.96..6713.16 rows=10 width=471) (actual time=125.496..125.499 rows=1 loops=1) - Sort (cost=6712.96..6712.99 rows=10 width=471) (actual time=125.228..125.288 rows=10 loops=1) Sort Key: size_l.m_id Sort Method: quicksort Memory: 25kB - Nested Loop (cost=0.00..6712.80 rows=10 width=471) (actual time=0.142..125.089 rows=10 loops=1) Join Filter: (m.id = size_l.m_id) - Seq Scan on size_m m (cost=0.00..897.00 rows=1 width=227) (actual time=0.037..8.956 rows=1 loops=1) Filter: (val = '1'::text) - Seq Scan on size_l
Re: [HACKERS] Predicate locking
Greg Smith wrote: My point was just that no one has really done that here yet: provided an example showing SSI serialization working as a substitute for predicate locking in this sort of use case. I trust that the theory is sound here, and yet I'd still like to see that demonstrated. Fair enough. You can find examples where there are no false positives or false negatives in the src/test/isolation directory in any checkout of the source code. Dan will be presenting the results of a set of DBT-2 runs at PGCon, which might help. I've been gradually building up a set of examples at: http://wiki.postgresql.org/wiki/SSI That set is incomplete so far due to a scarcity of round tuits, but if you want to suggest any particular tests, or add any (it's a Wiki), I welcome the input. With all that going on, and having mentioned that Wiki page on this thread, I didn't think posting examples to this list was useful, but could be persuaded otherwise. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] full_page_writes makes no difference?
Hi, On 05/04/2011 03:46 AM, Tian Luo wrote: No matter I turn on or turn off the full_page_writes, I always observe 8192-byte writes of log data for simple write operations (write/update). How did you measure that? A single transaction doing a single write, I guess. Ever tried multiple transactions with a simple write operation each and checking how much WAL that spits out per transaction? As I understand it, dirty blocks are written to disk as soon as feasible. After all, that helps crash recovery. With a basically idle system, as soon as feasible might be pretty soon. However, put your (disk sub-) system under load and as soon as feasible might take awhile. But according to the document, when this is off, it could speed up operations but may cause problems during recovery. So, I guess this is because it writes less when the option is turned off. However, this contradicts my observations I think you didn't trigger the savings. It's about writing full pages on the first write to a block after a checkpoint. Did you monitor checkpoint times of Postgres in your tests? If I am not missing anything, I find that the writes of log data go through function XLogWrite in source file backend/access/transam/xlog.c. In this file, log data are written with the following code: from = XLogCtl-pages + startidx * (Size) XLOG_BLCKSZ; nbytes = npages * (Size) XLOG_BLCKSZ; if (write(openLogFile, from, nbytes) != nbytes) { ... } So, nbytes should always be multiples of XLOG_BLCKSZ, which in the default case, is 8192. That observation seems correct. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] full_page_writes makes no difference?
On Wed, May 4, 2011 at 7:16 AM, Tian Luo jackro...@gmail.com wrote: Hi guys, No matter I turn on or turn off the full_page_writes, I always observe 8192-byte writes of log data for simple write operations (write/update). Not sure how you measured it, but ISTM that the correct GUC to play with is fsync. If thats turned off, the WAL buffers won't be fsynced to the disk at every commit. But that would mean reduced reliability in case of database crash. But according to the document, when this is off, it could speed up operations but may cause problems during recovery. So, I guess this is because it writes less when the option is turned off. However, this contradicts my observations When full_page_writes is turned off, the full page won't be backed up in the WAL record after the first modification after a checkpoint. So yes, it can reduce the amount of WAL written to the disk. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] full_page_writes makes no difference?
On Wed, May 4, 2011 at 5:46 PM, Pavan Deolasee pavan.deola...@gmail.comwrote: On Wed, May 4, 2011 at 7:16 AM, Tian Luo jackro...@gmail.com wrote: Hi guys, No matter I turn on or turn off the full_page_writes, I always observe 8192-byte writes of log data for simple write operations (write/update). Not sure how you measured it, but ISTM that the correct GUC to play with is fsync. If thats turned off, the WAL buffers won't be fsynced to the disk at every commit. But that would mean reduced reliability in case of database crash. And I should have added that post 8.3, we also have a user-settable parameter called synchronous_commit. Normally, database must write WAL up to the commit record to the stable storage when a transaction commits to ensure that there is no data loss in case of database crash. But if synchronous_commit is turned off, the database might delay writing the WAL buffers to the disk, thus reducing write activity, but at a increased risk of data loss. http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] branching for 9.2devel
On May 3, 2011, at 11:10 PM, Andrew Dunstan wrote: On 05/03/2011 09:53 PM, David Blewett wrote: On Tue, May 3, 2011 at 9:51 PM, David Blewettda...@dawninglight.net wrote: This seems like a pretty good idea, but maybe it'd be easiest to take it a step further and add an automatic pgindent-ified patch is created when a patch is added to the commitfest app? That should read: ... but maybe it'd be easiest to take it a step further and have an additional, automatically created patch file that is run through pgindent when a patch is added to the commitfest app. You can't indent patches, only patched files. And that's the problem with this happy scheme. For it to work at all sanely we'd need to keep the committed code that the patch is to be applied against strictly pgindent clean, presumably via some automated process such as a commit hook. That's been suggested in the past, but hasn't met with universal approval, IIRC. What if this hypothetical tool pulled the latest source, made a copy of that source, applied the patch to the copy, pg_indented the original AND the copy, and then diff'd? I think that would give you a properly indented patch. The context lines in the patch would have the wrong indentation, but I think patch is pretty smart about dealing with that (or at least can be told to ignore whitespace differences). -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefered Types
On May 4, 2011, at 12:00 AM, Зотов Роман wrote: F(smallint) F(integer) but call like F(float) i wouldn`t like to fail it. I think this particular example would be a mistake, because that cast would be lossy, so you want to invoke it only when the user explicitely selects it. The other way around would be fine, I think, that is, F(float) F(float8) and the call is F(int) As i think i not must write function with Float, String, and many other arg when my function have INT arg... and if caller wouln`t think about types he cant use your strong types why it not must work like as assignment??? why implicit and assignment is different??? I know only implicit and explicit casts and i think imlicit=asssign I was ready to educate Зотов on why that was a bad idea until I read... Yes, but while you think what update table1 set IntField = FloatField is valid but Select FuncWithIntArg(FloatArg) is not valid you have no problems in current solution, because it works same :) Sure enough, this works: UPDATE ... SET int_field = ( SELECT float_field FROM ... ); Which begs the question... why do we allow on assignment casting of a float to an int? I would think that should be explicit only... If we are going to allow assignment casting of float to int, then I'm hard-pressed to see why we wouldn't allow you to call an int function with a float value, assuming there wasn't a more suitable cast available. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefered Types
Jim Nasby j...@nasby.net writes: Which begs the question... why do we allow on assignment casting of a float to an int? Because the SQL standard requires it. In any case, the user's intent in such a case is perfectly clear. The reasons for not allowing assignment casts to happen in expression contexts are (1) it would often provide us with too many ways to resolve an operator or function call, leading to ambiguous operator failures; (2) it would be surprising for the parser to silently choose a cast that causes information loss. Neither of these arguments applies to assign this value to an integer column. It's true that if you have exactly one function named f, and it takes an int, then f(float) could be considered clear and unambiguous. But designing this behavior around only the easy cases is not going to work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] branching for 9.2devel
On Wed, May 4, 2011 at 12:19 AM, Tom Lane t...@sss.pgh.pa.us wrote: Mind you, I've read more than enough horribly-formatted patches to wish that we could do something about this. But I doubt that a mechanical reformatting pass before reviewing will be a net plus. It wouldn't hurt to have the option. It would also be nice if we could come to some conclusions on how to handle $SUBJECT. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] patch for new feature: Buffer Cache Hibernation
Hi, I am working on new feature `Buffer Cache Hibernation' which enables postgres to keep higher cache hit ratio even just started. Postgres usually starts with ZERO buffer cache. By saving the buffer cache data structure into hibernation files just before shutdown, and loading them at startup, postgres can start operations with the saved buffer cache as the same condition as just before the last shutdown. Here is the patch for 9.0.3 (also tested on 8.4.7) http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch The patch includes the following. - At shutdown, buffer cache data structure (such as BufferDescriptors, BufferBlocks and StrategyControl) is saved into hibernation files. - At startup, buffer cache data structure is loaded from hibernation files and buffer lookup hashtable is setup based on buffer descriptors. - Above functions are enabled by specifying `enable_buffer_cache_hibernation=on' in postgresql.conf. Any comments are welcome and I would very much appreciate merging the patch in source tree. Have fun and thanks! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DLL export with mingw-w64: currently a no-op
On 05/04/2011 01:25 AM, Johann 'Myrkraverk' Oskarsson wrote: On Tue, 03 May 2011 12:40:28 -, Andrew Dunstan and...@dunslane.net wrote: Our Makefiles use dlltool and dllwrap to create DLLs. If you used our recommended build method pgxs would do lots of the work for you. I'm not sure why you asked the mingw-w64 team about building a Postgres extension - they are unlikely to know anything much about our build system. However, they do know about the mingw build tools. In particular, from: http://oldwiki.mingw.org/index.php/dllwrap dllwrap is a tool to build DLLs. It seems to be deprecated in favour of gcc -shared option, but some projects still use it. SQLite, for example. Armed with this information, it may be prudent review the DLL build process in PGXS. For the record: I originally asked the mingw-w64 team for help to link against the 64bit JVM.DLL for which there was no corresponding .def file. Google cache: http://webcache.googleusercontent.com/search?q=cache:tZkweT_cNQQJ:oldwiki.mingw.org/index.php/dllwrap+mingw+dllwrapcd=1hl=isct=clnkgl=isclient=operasource=www.google.is OK, but if we want to get rid of using dllwrap we'd need a complete patch for it. Just changing the definition of the macro without changing the rest isn't going to cut it, I think. But it might well be worth doing. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extreme bloating of intarray GiST indexes
Alexander Korotkov aekorot...@gmail.com writes: I have another hypothesis about index bloat cause. AFAIK, vaccum procedure on GiST don't have any storage utilization guarantee. For example, if only one live item is in some page, then only one item will be left in this page. I.e. there is no index reroganization during vacuum. If there wouldn't be many inserts into such pages in future then they will be stay bloat. Possibly, but the same is true of btree indexes, and we very seldom see cases where that's a serious issue. In any case, this is all just speculation without evidence --- we need to see actual data to figure out what's going on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On 05/04/2011 10:10 AM, Mitsuru IWASAKI wrote: Hi, I am working on new feature `Buffer Cache Hibernation' which enables postgres to keep higher cache hit ratio even just started. Postgres usually starts with ZERO buffer cache. By saving the buffer cache data structure into hibernation files just before shutdown, and loading them at startup, postgres can start operations with the saved buffer cache as the same condition as just before the last shutdown. Here is the patch for 9.0.3 (also tested on 8.4.7) http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch The patch includes the following. - At shutdown, buffer cache data structure (such as BufferDescriptors, BufferBlocks and StrategyControl) is saved into hibernation files. - At startup, buffer cache data structure is loaded from hibernation files and buffer lookup hashtable is setup based on buffer descriptors. - Above functions are enabled by specifying `enable_buffer_cache_hibernation=on' in postgresql.conf. Any comments are welcome and I would very much appreciate merging the patch in source tree. That sounds cool. Please a) make sure your patch is up to data against the latest source in git and b) submit it to the next commitfest at https://commitfest.postgresql.org/action/commitfest_view?id=10 We don't backport features, and 9.1 is closed for features now, so the earliest release this could be used in is 9.2. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On Wed, May 4, 2011 at 3:10 PM, Mitsuru IWASAKI iwas...@jp.freebsd.org wrote: Postgres usually starts with ZERO buffer cache. By saving the buffer cache data structure into hibernation files just before shutdown, and loading them at startup, postgres can start operations with the saved buffer cache as the same condition as just before the last shutdown. Offhand this seems pretty handy for benchmarks where it would help get reproducible results. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Mitsuru IWASAKI iwas...@jp.freebsd.org writes: Postgres usually starts with ZERO buffer cache. By saving the buffer cache data structure into hibernation files just before shutdown, and loading them at startup, postgres can start operations with the saved buffer cache as the same condition as just before the last shutdown. This seems like a lot of complication for rather dubious gain. What happens when the DBA changes the shared_buffers setting, for instance? How do you protect against the cached buffers getting out-of-sync with the actual disk files (especially during recovery scenarios)? What about crash-induced corruption in the cache file itself (consider the not-unlikely possibility that init will kill the database before it's had time to dump all the buffers during a system shutdown)? Do you have any proof that writing out a few GB of buffers and then reading them back in is actually much cheaper than letting the database re-read the data from the disk files? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Excerpts from Tom Lane's message of mié may 04 12:44:36 -0300 2011: This seems like a lot of complication for rather dubious gain. What happens when the DBA changes the shared_buffers setting, for instance? How do you protect against the cached buffers getting out-of-sync with the actual disk files (especially during recovery scenarios)? What about crash-induced corruption in the cache file itself (consider the not-unlikely possibility that init will kill the database before it's had time to dump all the buffers during a system shutdown)? Do you have any proof that writing out a few GB of buffers and then reading them back in is actually much cheaper than letting the database re-read the data from the disk files? I thought the idea wasn't to copy the entire buffer but only a descriptor, so that the buffer would be loaded from the original page. If shared_buffers changes, there's no problem. If the new setting is smaller, then the last paages would just not be copied, and would have to be read from disk the first time they are accessed. If the new setting is larger, then the last few buffers would remain unused until requested. As for gain, I have heard of test setups requiring hours of runtime in order to prime the buffer cache. Crash safety would have to be researched, sure. Maybe only do it in clean shutdown. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] full_page_writes makes no difference?
On Tue, May 3, 2011 at 6:46 PM, Tian Luo jackro...@gmail.com wrote: Hi guys, No matter I turn on or turn off the full_page_writes, I always observe 8192-byte writes of log data for simple write operations (write/update). But according to the document, when this is off, it could speed up operations but may cause problems during recovery. So, I guess this is because it writes less when the option is turned off. However, this contradicts my observations If I am not missing anything, I find that the writes of log data go through function XLogWrite in source file backend/access/transam/xlog.c. In this file, log data are written with the following code: from = XLogCtl-pages + startidx * (Size) XLOG_BLCKSZ; nbytes = npages * (Size) XLOG_BLCKSZ; if (write(openLogFile, from, nbytes) != nbytes) { ... } So, nbytes should always be multiples of XLOG_BLCKSZ, which in the default case, is 8192. My question is, if it always writes full pages no matter full_page_writes is on or off, what is the difference? The full pages refers to the shared_buffers pages, not the xlog pages. The thing it answers is, does the full shared_buffer page get injected into the xlog, or just a diff of it? If you look at the offset of the xlog write, you would see that it is writing 8192 bytes to the same offset over and over again. In my hands using pgbench -T 300 -c 1, I get about 16 transactions each with a 8192 xlog write to the same offset before moving to the next xlog block. But immediately after a checkpoint, I get only 1 or 2 writes to the same offset before moving to the next one, due to full page writes taking up so much more room in the xlog stream. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
2011/5/4 Greg Stark gsst...@mit.edu: On Wed, May 4, 2011 at 3:10 PM, Mitsuru IWASAKI iwas...@jp.freebsd.org wrote: Postgres usually starts with ZERO buffer cache. By saving the buffer cache data structure into hibernation files just before shutdown, and loading them at startup, postgres can start operations with the saved buffer cache as the same condition as just before the last shutdown. Offhand this seems pretty handy for benchmarks where it would help get reproducible results. It could have an option to force it or not at start of postgres. This could helps on benchmarks scenarios. -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pull up aggregate subquery
Hitoshi Harada umi.tan...@gmail.com writes: I sometimes wonder if we could pull up aggregate query in the optimizer. I don't have time to look at this right now, but please add to the upcoming commitfest. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Packaging
Hey folks, I'd kind of like to get this issue nailed down soon so I can update the PGXN HOWTO and illustrate a generally agreed-upon best practice for extension developers. How *do* we want people to use versions in their extension? Thanks, David On Apr 28, 2011, at 2:16 PM, David E. Wheeler wrote: On Apr 28, 2011, at 7:04 AM, Tom Lane wrote: I think what we're discussing here is bug-fix revisions that don't affect the SQL declarations for the extension. Presumably, that means a change in the C code, so the shared library is the right place to keep the revision number. A version number in the control file seems to carry a nontrivial risk of being out of sync with the actual code in the shared library. But that's exactly where it is stored right now. What's not clear to me is whether to just suggest that extension authors who care about this should provide a foo_version() function, or to try to standardize it a bit more than that. Please, if those are the choices, go with the latter. If you leave it to extension authors, they'll all have different names and different return types, and will thus be worthless to most folks wanting a generalized way to see what versions of extensions they have installed. Hell, I already regret that pgtap_version() returns NUMERIC. Which reminds me, I might change it in a future version. Then it's *really* inconsistent, isn't it? One point worth thinking about is that not all extensions will have a shared library at all --- SQL-only extensions have been mentioned several times as an important use case. For those, there's no such thing as an update that doesn't change the script file, and we shouldn't try to impose a requirement of providing a lower-level revision number. No, but there are new releases without code changes. I've been making releases that tweak documentation and the Makefile (for 9.1 support) but not the code. Should the extension in this case get a new version or not? Look, I read this thread this morning carefully, but I have to say I don't really understand it. Considering that there was consensus on not requiring any format, meaning, or mandated sort ordering of versions, there's suddenly quite a lot of discussion of the meaning and format, if not sort ordering. So maybe it's half-assed. Maybe the version can be anything but the revision must be an integer. Maybe there's a `pg_extension_version($extension_name)` function that returns ARRAY[$version, $revision], and the revision is set in the control file but not included in the version or in the upgrade file names. I think I can live with that. But, hell, you're halfway to mandating the meaning by doing this. Will we have to go the rest of the way in the future? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On Wed, May 4, 2011 at 4:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Do you have any proof that writing out a few GB of buffers and then reading them back in is actually much cheaper than letting the database re-read the data from the disk files? I believe he's just writing out the meta data. Ie, which blocks to re-reread from the disk files. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Alvaro Herrera wrote: As for gain, I have heard of test setups requiring hours of runtime in order to prime the buffer cache. And production ones too. I have multiple customers where a server restart is almost a planned multi-hour downtime. The system may be back up, but for a couple of hours performance is so terrible it's barely usable. You can watch the MB/s ramp up as the more random data fills in over time; getting that taken care of in a larger block more amenable to elevator sorting would be a huge help. I never bothered with this particular idea though because shared_buffers is only a portion of the important data. Cedric's pgfincore code digs into the OS cache, too, which can then save enough to be really useful here. And that's already got a snapshot/restore feature. The slides at http://www.pgcon.org/2010/schedule/events/261.en.html have a useful into to that, pages 30 through 34 are the neat ones. That provides some other neat APIs for preloading popular data into cache too. I'd rather work on getting something like that into core, rather than adding something that only is targeting just shared_buffers. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On Wed, May 4, 2011 at 7:10 AM, Mitsuru IWASAKI iwas...@jp.freebsd.org wrote: Hi, I am working on new feature `Buffer Cache Hibernation' which enables postgres to keep higher cache hit ratio even just started. Postgres usually starts with ZERO buffer cache. By saving the buffer cache data structure into hibernation files just before shutdown, and loading them at startup, postgres can start operations with the saved buffer cache as the same condition as just before the last shutdown. Here is the patch for 9.0.3 (also tested on 8.4.7) http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch The patch includes the following. - At shutdown, buffer cache data structure (such as BufferDescriptors, BufferBlocks and StrategyControl) is saved into hibernation files. - At startup, buffer cache data structure is loaded from hibernation files and buffer lookup hashtable is setup based on buffer descriptors. - Above functions are enabled by specifying `enable_buffer_cache_hibernation=on' in postgresql.conf. Any comments are welcome and I would very much appreciate merging the patch in source tree. Have fun and thanks! It applies and builds against head with offsets and some fuzz. It fails make check, but apparently only because src/test/regress/expected/rangefuncs.out needs to be updated to include the new setting. (Although all the other enable% settings are for the planner, so making a new setting with that prefix that does something else might be undesirable) I think that PgFincore (http://pgfoundry.org/projects/pgfincore/) provides similar functionality. Are you familiar with that? If so, could you contrast your approach with that one? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] VARIANT / ANYTYPE datatype
Hello, A customer came to us with this request: a way to store any data in a column. We've gone back and forth trying to determine reasonable implementation restrictions, safety and useful semantics for them. I note that this has been requested in the past: http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php and both Oracle and MS-SQL have it and apparently people find them useful. I didn't find any indication that SQL contains anything resembling this. The main idea is to be able to store column values in an audit table like this: change_time timestamptz table_name name column_name name old_value variant new_value variant So per-column changes, which is much more convenient than the regular idea of storing the whole NEW and/or OLD record(s). Currently, they use text for old_value and new_value, but this is, of course, not very satisfactory. My thought was that a variant type would store the datum as typiddata so that it would be possible to identify the datatype stored in each column/row and interpret adequately, calling the appropriate output function etc. On input it would be limited to come only from inside the system, not from the outside world, as that would have obvious security implications; so it'd be similar to pg_node_tree in that regard. Now this has obvious limitations: first, any query that tries to extract data would need to include a cast of the variant value to the appropriate type, so that the type can be resolved early. Thus, trying to extract rows of different types would be forbidden. Also, there would be a security problem with a datum storing something whose datatype later changes (consider a user-defined record type or things like that). My first reaction was to do something like CREATE TYPE foo VARIANT OF (int, text, timestamptz); and then you could declare old_value with type foo, which would only allow values of the declared types. This makes it easy to catalogue used types in any variant, and thus easy to restrict modifying or dropping types that are used in some variant. However, this idea was rejected by the customer due to the unusability: you would have to remember to edit the variant to add the new type anytime you added a new column to a table, which would be cumbersome. What the customer suggested was to have a new fork, which stores type OIDs of datatypes used in the variant. Then when a type is to be altered, all forks would be scanned to determine if the type is used, and raise an error if so. I rejected that idea as unworkable. However, as a middle ground we agreed that we could allow a declared variant to store any pinned type without restrictions; those can't be changed or dropped so there's no safety concern. Storing other types (user-defined types, records and so on) would require some safety net. Before spending too much time exploring a detailed design, I'd like to hear some opinions on the basic idea. -- Álvaro Herrera alvhe...@alvh.no-ip.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
All, I thought that Dimitri had already implemented this using Fincore. It's linux-only, but that should work well enough to test the general concept. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Alvaro Herrera alvhe...@alvh.no-ip.org wrote: A customer came to us with this request: a way to store any data in a column. +1 More than once (usually in replication, interface, or audit situations) I've had to create a table with one column each of a number of different data types, only one of which will be used in each row. It would make life much simpler for programming such things if an unknown type could be stored. I could either cast the types in the rows based on the related identifier column, but it would be even nicer if they could be read unknown and pg_typeof or something similar could be used to control flow based on the type. As one example, only *last night* I created the following table as part of the development of our shop's next generation of software. (Yes I know about Slony, Londiste, and others -- we need to roll something out which integrates with existing systems, without production disruption, over the next few years. This does require a custom solution.) Table public.DbTranOpValue Column | Type| Modifiers +---+--- countyNo | CountyNoT | not null backendPid | integer | not null tranStart | TimestampT | not null logRecordSeqNo | LogRecordSeqNoT | not null columnName | ColumnNameT | not null isAfter| BooleanT| not null textValue | text | intValue | bigint| numericValue | numeric | binaryValue| bytea | booleanValue | BooleanT| dateValue | DateT | timeValue | TimeT | timestampValue | TimestampT | Indexes: DbTranOpValue_pkey PRIMARY KEY, btree (backendPid, tranStart, logRecordSeqNo, columnName) I would much rather have had a value column of unknown type. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] branching for 9.2devel
You can't indent patches, only patched files. And that's the problem with this happy scheme. For it to work at all sanely we'd need to keep the committed code that the patch is to be applied against strictly pgindent clean, presumably via some automated process such as a commit hook. That's been suggested in the past, but hasn't met with universal approval, IIRC. Well, there is another solution to this, which is to use Git branches and forks instead of mailing around patches. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] branching for 9.2devel
On Wed, May 4, 2011 at 19:21, Josh Berkus j...@agliodbs.com wrote: You can't indent patches, only patched files. And that's the problem with this happy scheme. For it to work at all sanely we'd need to keep the committed code that the patch is to be applied against strictly pgindent clean, presumably via some automated process such as a commit hook. That's been suggested in the past, but hasn't met with universal approval, IIRC. Well, there is another solution to this, which is to use Git branches and forks instead of mailing around patches. That makes no difference to this problem, really. If the committer (or reviewer) has to reindent it anyway, you can just as well do a git checkout work patch -p1 /where/ever pgindent git diff as git remote add somewhere git fetch somewhere git checkout work --track somewhere/something pgindent git diff. There are some reasons why using git branches and forks are nice to work with, but they don't solve tihs problem. Or are you saying there should be an automated service where you registered your git url + branch and then it would pull that branch, run pgindent for you, and then republish it somewhere? Not sure how big a win that is in the end, plus it's going to fail as soon as you get a confligt anywhere anyway... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] branching for 9.2devel
On Wed, May 4, 2011 at 1:21 PM, Josh Berkus j...@agliodbs.com wrote: You can't indent patches, only patched files. And that's the problem with this happy scheme. For it to work at all sanely we'd need to keep the committed code that the patch is to be applied against strictly pgindent clean, presumably via some automated process such as a commit hook. That's been suggested in the past, but hasn't met with universal approval, IIRC. Well, there is another solution to this, which is to use Git branches and forks instead of mailing around patches. Shouldn't it be as simple as keeping a git clone of trunk up to date, applying the patch, running pgindent and emitting the resulting diff? Once it's been generated, just run git reset --hard to clean out all local changes. -- Thanks, David Blewett -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Alvaro Herrera alvhe...@alvh.no-ip.org writes: A customer came to us with this request: a way to store any data in a column. We've gone back and forth trying to determine reasonable implementation restrictions, safety and useful semantics for them. Yes, it seems rather messy. The main idea is to be able to store column values in an audit table like this: old_value variant new_value variant Currently, they use text for old_value and new_value, but this is, of course, not very satisfactory. Just out of curiosity, what actual functionality gain would ensue over just using text? It seems like doing anything useful with the audit table contents would still require casting the column to text, or the moral equivalent of that. If we did have a concept of ANYTYPE, or really we'd need ANYTYPE[], it could perhaps be used in pg_statistic, in place of the rather ugly anyarray hack that's there now. But I note that nothing above the level of C code can do anything very useful with the contents of pg_statistic, and I'm not sure that having an official type would change that. However, as a middle ground we agreed that we could allow a declared variant to store any pinned type without restrictions; those can't be changed or dropped so there's no safety concern. If you're going to accept that there are restrictions, I don't see that there is a good argument against your thought of a declared union type. At least then it's clear what the restrictions are. I am firmly against exposing the notion of pinned types in any user-visible SQL semantics, and even more firmly against the idea of creating fundamental functionality that only works for built-in types and can't be extended to add-on types. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extreme bloating of intarray GiST indexes
Josh Berkus j...@agliodbs.com writes: (c) we *think* that other columns of the table, including other indexed columns, are changed much more frequently than the intarray column is. Currently doing analysis on that. Yeah, I noticed that your statistics for the table showed far more updates than insertions or deletions. If the intarray itself didn't change often, that would result in lots of duplicate entries being made in the index. They'd get cleaned by vacuum eventually, but maybe not fast enough to avoid the one-live-tuple-per-page syndrome that Alexander was speculating about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unfriendly handling of pg_hba SSL options with SSL off
On mån, 2011-04-25 at 19:18 -0400, Tom Lane wrote: Hm, does that mean we have consensus on treating it as an error? Regarding the patch you committed: I would avoid hardcoding postgresql.conf in error or hint messages, since we don't know whether that's the actual name of the file. No other message has that file name hardcoded. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extreme bloating of intarray GiST indexes
Josh Berkus j...@agliodbs.com writes: Can you provide the data in the column that's indexed? Attached. This is for the index which is 90% free space. I tried loading this data in fresh, and then creating a gist__intbig_ops index on it. I got these pgstattuple numbers (in 8.4.8): table_len | 8806400 tuple_count| 15005 tuple_len | 4081360 tuple_percent | 46.35 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 4088100 free_percent | 46.42 On the other hand, loading the data with a pre-existing empty index gave table_len | 7798784 tuple_count| 15005 tuple_len | 4081360 tuple_percent | 52.33 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 3183672 free_percent | 40.82 Neither of those numbers are great, and it's a bit surprising that CREATE INDEX produces a result notably worse than incremental loading; but still a darn sight better than 90% free space. So I think probably the update pattern has a lot to do with this. * If you didn't notice earlier, it's a partial index. The two columns which determine the partial index change more often than the intarray column. Yeah, with only about half of the table actually indexed, since you showed only 7786 index entries in your results. But unless there's reason to think the indexed and unindexed entries are substantially different in the intarray column, this is probably not very relevant. * We've also determined some other unusual patterns from watching the application: (a) the listings table is a very wide table, with about 60 columns (b) whenever the table gets updated, the application code updates these 60 columns in 4 sections. So there's 4 updates to the same row, in a single transaction. Hmm. That is going to lead to four dead index entries for every live one (unless some of the updates are HOT, which won't happen if you're changing any indexed columns). VACUUM will get back the space eventually, but not before you've caused some index bloat. I tried doing something similar to my test table: contrib_regression=# alter table listings add column junk int; ALTER TABLE contrib_regression=# create index li on listings(junk); CREATE INDEX contrib_regression=# begin; BEGIN contrib_regression=# update listings set junk=1; UPDATE 15005 contrib_regression=# update listings set junk=2; UPDATE 15005 contrib_regression=# update listings set junk=3; UPDATE 15005 contrib_regression=# update listings set junk=4; UPDATE 15005 contrib_regression=# commit; COMMIT contrib_regression=# vacuum listings; VACUUM and then got these pgstattuple numbers: table_len | 39460864 tuple_count| 15005 tuple_len | 4081360 tuple_percent | 10.34 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 32923872 free_percent | 83.43 which is up in the same ballpark with your problem. Now probably your client's app is not updating all rows at once, but still this is a pretty wasteful update pattern. Is there a reason not to update all the columns in a single update? If you can't change the app, I'd suggest more aggressive autovacuuming as the least painful fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unfriendly handling of pg_hba SSL options with SSL off
Peter Eisentraut pete...@gmx.net writes: On mån, 2011-04-25 at 19:18 -0400, Tom Lane wrote: Hm, does that mean we have consensus on treating it as an error? Regarding the patch you committed: I would avoid hardcoding postgresql.conf in error or hint messages, since we don't know whether that's the actual name of the file. No other message has that file name hardcoded. Fair enough --- do you have a proposal for alternate wording? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Alvaro Herrera wrote: A customer came to us with this request: a way to store any data in a column. We've gone back and forth trying to determine reasonable implementation restrictions, safety and useful semantics for them. I note that this has been requested in the past: http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php and both Oracle and MS-SQL have it and apparently people find them useful. I didn't find any indication that SQL contains anything resembling this. I see VARIANT/ANYTYPE as the most general case of supporting union types, which, say, could have more specific examples of allow any number or date here but nothing else. If VARIANT is supported, unions in general ought to be also. The most effective way of supporting VARIANT or union types in general is having an implementation where in the general case each value in the database knows its own data type rather than the data type being provided by a context such as what table column it is in. For example, if rather than storing a data value directly we store a 2-attribute struct naming a data type and pointing to or holding the data value. See how SQLite works as an example of how VARIANTs or unions could work, although that on its own would need to be made more comprehensive for Pg. I claim ignorance as to how Pg currently implements these matters. Where VARIANT/union types are supported by default, declaring more specific types is just a type constraint and an optimization. Of course, when we know the type of a column/etc isn't going to be VARIANT or some other union type, then a simple optimization allows us to just store the value and have its type provided by context rather than the struct. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On Wed, May 4, 2011 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@alvh.no-ip.org writes: A customer came to us with this request: a way to store any data in a column. We've gone back and forth trying to determine reasonable implementation restrictions, safety and useful semantics for them. Yes, it seems rather messy. The main idea is to be able to store column values in an audit table like this: old_value variant new_value variant Currently, they use text for old_value and new_value, but this is, of course, not very satisfactory. Just out of curiosity, what actual functionality gain would ensue over just using text? It seems like doing anything useful with the audit table contents would still require casting the column to text, or the moral equivalent of that. The problem with text is that for composite/table types you lose the ability to cast back when the structure changes. Most of the auditing implementations I've done cast new/old to text and drop it directly into a single column on the audit record. It works ok, but you have to messily update the text later when the table changes. For non-variant composites you can add columns down the line and it works ok in dependent records without too much fuss. I think though that getting this to work such that type dependency is via row/column instead of just column is going to be tough. Outside of composites, I don't see much advantages vs the text route, performance maybe? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extreme bloating of intarray GiST indexes
On 5/4/11 11:29 AM, Tom Lane wrote: which is up in the same ballpark with your problem. Now probably your client's app is not updating all rows at once, but still this is a pretty wasteful update pattern. Is there a reason not to update all the columns in a single update? Yeah, really crappy application code. Discussing it with app developers now ... If you can't change the app, I'd suggest more aggressive autovacuuming as the least painful fix. Will test that. It's not clear that vacuuming is helping at all. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extreme bloating of intarray GiST indexes
Josh Berkus j...@agliodbs.com writes: On 5/4/11 11:29 AM, Tom Lane wrote: If you can't change the app, I'd suggest more aggressive autovacuuming as the least painful fix. Will test that. It's not clear that vacuuming is helping at all. Well, you realize of course that you need a REINDEX to get the index size back down to a sane range. Autovacuum may or may not be able to keep it from creeping back up ... but vacuum definitely won't remove existing bloat. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Message style cleanup
On ons, 2011-05-04 at 20:32 +0200, Magnus Hagander wrote: Is this part really correct? - fprintf(stderr, _(%s: final receive failed: %s\n), + fprintf(stderr, _(%s: final receive failed: %s), all other msgs have \n.. I see you removed the final dot in many of the other messages, did you just accidentally grab one too many? (There are a couple of other examples of removing the \n though, so perhaps it was intentional? But they're all preceeding disconnect_and_exit() so I don't see why it shouldn't be consistent? The result of PQerrorMessage() already ends in \n. With that in mind, maybe I missed something? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Message style cleanup
On Wed, May 4, 2011 at 21:09, Peter Eisentraut pete...@gmx.net wrote: On ons, 2011-05-04 at 20:32 +0200, Magnus Hagander wrote: Is this part really correct? - fprintf(stderr, _(%s: final receive failed: %s\n), + fprintf(stderr, _(%s: final receive failed: %s), all other msgs have \n.. I see you removed the final dot in many of the other messages, did you just accidentally grab one too many? (There are a couple of other examples of removing the \n though, so perhaps it was intentional? But they're all preceeding disconnect_and_exit() so I don't see why it shouldn't be consistent? The result of PQerrorMessage() already ends in \n. With that in mind, maybe I missed something? Oh, d'uh. I'm not paying attention. Sorry about the noise. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefered Types
On Tue, May 3, 2011 at 3:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Alvaro Herrera alvhe...@commandprompt.com writes: The interesting discussion is what happens next. To me, this is all related to this previous discussion: http://archives.postgresql.org/pgsql-hackers/2010-09/msg00232.php Yeah, there doesn't seem like much point unless we have a clear idea what we're going to do with the change. BTW, it occurs to me to wonder whether, instead of making types be more or less preferred, we should attack the issue from a different direction and assign preferred-ness ratings to casts. That seems to be more or less the direction that Robert was considering in the above-linked thread. I'm not sure it's better than putting the ratings on types --- in particular, neither viewpoint seems to offer a really clean answer about what to do when trying to resolve a multiple-argument function in which one possible resolution offers a more-preferred conversion for one argument but a less-preferred conversion for another one. But it's an alternative we ought to think about before betting all the chips on generalizing typispreferred. Personally I've always felt that the typispreferred mechanism was a bit of a wart; changing it from a bool to an int won't improve that, it'll just make it a more complicated wart. Casts have already got a standards-blessed notion that some are more equal than others, so maybe attaching preferredness ratings to them will be less of a wart. Not sure about it though. I think this is a pretty good analysis. One of the big, fat problems with typispreferred is that it totally falls apart when more than two types are involved. For example, given a call f(int2), we can't decide between f(int4) and f(int8), but it seems pretty clear (to me, at least) that we should prefer to promote as little as possible and should therefore pick f(int4). The problem is less acute with string-like data types because there are only two typcategory-S data types that get much use: text and varchar. But add a third type to the mix (varchar2...) or start playing around with functions that are defined for name and bpchar but not text or some such thing, and things get sticky. Generalizing typispreferred to an integer definitely helps with these cases, assuming anyway that you are dealing mostly with built-in types, or that the extensions you are using can somehow agree among themselves on reasonable weighting values. But it is not a perfect solution either, because it can really only handle pretty linear topologies. It's reasonable to suppose that the integer types are ordered int2 - int4 - int8 - numeric and that the floating point types are ordered float4 - float8 (- numeric?), but I think the two hierarchies are pretty much incomparable, and an integer typispreferred won't handle that very well, unless we make the two groups separate categories, but arguably numeric belongs in both groups so that doesn't really seem to work very well either. Certainly from a theoretical perspective there's no reason why you couldn't have A - B - X and C - D - X, with A-C, A-D, B-C, and B-D incomparable. It almost feels like you need a graph to model it properly, which perhaps argues for your idea of attaching weights to the casts. But there are some problems with that, too. In particular, it would be nice to be able to hook in new types with a minimum of fuss. For example, say we add a new string type, like citext, via an extension. Right now, we need to add casts not only from citext to text, but also from citext to all the things to which text has casts, if we really want citext to behave like text. That solution works OK for the first extension type we load in, but as soon as you add any nonstandard casts from text to other things (perhaps yet another extension type of some kind), it starts to get a bit leaky. In some sense it feels like it'd be nice to be able to walk the graph - if an implicit cast from A to B is OK, and an implicit cast from B to C is OK, perhaps an implicit cast from A to C is also OK. But that seems awfully expensive to do at runtime, and it'd introduce some strange behavior particularly with the way we have the reg* - oid and oid - reg* casts set up. select a.castsource::regtype, a.casttarget::regtype, b.casttarget::regtype from pg_cast a, pg_cast b where a.casttarget = b.castsource and a.castcontext = 'i' and b.castcontext = 'i' and not exists (select 1 from pg_cast x where x.castsource = a.castsource and x.casttarget = b.casttarget and x.castcontext = 'i') and a.castsource b.casttarget; It's not clear to me whether in any of this there is a solution to the problem of int2 being a second-class citizen. Perhaps we could add casts from int4 and int8 back to int2, and make it less-preferred than all of the other integer types, but I'm not sure what else that would break. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Re: [HACKERS] VARIANT / ANYTYPE datatype
On 05/04/2011 01:36 PM, Tom Lane wrote: The main idea is to be able to store column values in an audit table like this: old_value variant new_value variant Currently, they use text for old_value and new_value, but this is, of course, not very satisfactory. Just out of curiosity, what actual functionality gain would ensue over just using text? It seems like doing anything useful with the audit table contents would still require casting the column to text, or the moral equivalent of that. Yeah, I've been down this road once or twice, and I think that's the $64 question. I wrote a custom audit app two or three years ago. After several iterations the customer and I found that using an hstore for the old/new (or old record / changeset, which is what we actually use) was the most suitable for our use. I think if we did this we'd need to add some sort of is_type() and typeof() functions for variant objects. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate locking
On Tue, May 3, 2011 at 10:07 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: ... on a toy table with contrived values. How different is this from the often-asked question about why a query against a four-line table is not using the index they expect, and how can we expect it to scale if it doesn't? I agree that it's not unreasonable for someone to ask either question. If my response falls short, I'm game to try again. I guess what surprises me about this a bit is that we have to predicate-lock the whole table even if we're not actually looking at all the rows. I can sort of see why that's necessary, but I'm a bit fuzzy on the details, and it does seem a little unfortunate in this instance... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate locking
Robert Haas robertmh...@gmail.com wrote: On Tue, May 3, 2011 at 10:07 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: ... on a toy table with contrived values. How different is this from the often-asked question about why a query against a four-line table is not using the index they expect, and how can we expect it to scale if it doesn't? I agree that it's not unreasonable for someone to ask either question. If my response falls short, I'm game to try again. I guess what surprises me about this a bit is that we have to predicate-lock the whole table even if we're not actually looking at all the rows. I can sort of see why that's necessary, but I'm a bit fuzzy on the details, and it does seem a little unfortunate in this instance... Well, as far as I can tell, every production-quality database with predicate locking models the predicates based on the rows actually accessed. Until now, that has been every popular SQL database except PostgreSQL and Oracle. That makes predicate locking sensitive to the plan chosen. It was because of this that I thought it might be wise to include a bump to the seq_page_cost and/or cpu_tuple_cost for plans inside a serializable transaction. This would encourage indexed access rather than a table scan at an earlier threshold, thereby reducing false positive serialization failures. At the time the suggestion got a rather cool reception. Is it time to reconsider that? On the other hand, as a shop where we're probably going to set default_transaction_isolation = serializable in our postgresql.conf files and include trigger checks that we're running at that level, we can just boost those globally. That may also work for others. Once I wrap up these changes to our replication system I'm in the middle of coding, I'll see about getting all our development machines onto 9.1beta with default serialization and see how much trouble our apps have. Even on our development machines we run with a copy of real data from a circuit court county database. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prefered Types
Robert Haas robertmh...@gmail.com writes: ... One of the big, fat problems with typispreferred is that it totally falls apart when more than two types are involved. For example, given a call f(int2), we can't decide between f(int4) and f(int8), but it seems pretty clear (to me, at least) that we should prefer to promote as little as possible and should therefore pick f(int4). Yeah. If your mental model is one of least promotion, you really cannot express that at all with a preferred type concept, even if the ratings are integers and not bools. On the other hand, it does seem possible to attach a cost or distance metric to casts and get some reasonably intuitive behavior that way. If you check the archives I believe we've discussed this before using the distance terminology. It still falls down though on the question of what to prefer when there are several combinations of multiple casts to choose between. And as you say it's not entirely clear how well either approach generalizes to after-the-fact insertion of new types/casts in the hierarchy. Perhaps it would be adequate to allow automatic resolution of an overloading conflict only when one of the available alternatives dominates all others, ie, none of the argument positions requires a longer distance cast than is used in that position by any other available alternative. I'm just throwing that out as a possibility, I haven't tried it. It's not clear to me whether in any of this there is a solution to the problem of int2 being a second-class citizen. I've always felt that the basic problem int2 has got is that the parser initially types integer-looking constants as int4 or larger, even if they'd fit in int2. If it typed them as int2 then the unintuitive behaviors would largely go away, without any need for allowing implicit down-casting from int4 to int2. I actually tried that once, probably close to ten years ago, and it blew up real good because many cases that formerly were considered an exact match no longer were, and the parser started making some pretty surprising (or at least not backwards compatible) resolution choices. Maybe with a more controllable type-promotion mechanism we could get better results there. BTW, not to rain on the parade or anything, but I'll bet that rejiggering anything at all here will result in whining that puts the 8.3-era removal of a few implicit casts to shame. If the new behavior is really significantly better *for users* then we can probably withstand the complaints; but if it's just marginal improvements or just improves life for hypothetical future extensions, it's not going to fly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2011: Fast GiST index build
During studying of existing GiST code I have a question. gistFindCorrectParent function have branch with following comment: /* * awful!!, we need search tree to find parent ... , but before we * should release all old parent */ Can you provide me an example of case when this branch works? With best regards, Alexander Korotkov.
Re: [HACKERS] VARIANT / ANYTYPE datatype
On Wed, May 4, 2011 at 2:55 PM, Andrew Dunstan and...@dunslane.net wrote: On 05/04/2011 01:36 PM, Tom Lane wrote: The main idea is to be able to store column values in an audit table like this: old_value variant new_value variant Currently, they use text for old_value and new_value, but this is, of course, not very satisfactory. Just out of curiosity, what actual functionality gain would ensue over just using text? It seems like doing anything useful with the audit table contents would still require casting the column to text, or the moral equivalent of that. Yeah, I've been down this road once or twice, and I think that's the $64 question. I wrote a custom audit app two or three years ago. After several iterations the customer and I found that using an hstore for the old/new (or old record / changeset, which is what we actually use) was the most suitable for our use. yeah -- +1 on that method. I think it's really the right way to go with the recent hstore enhancements. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM
On Wed, May 4, 2011 at 3:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jaime Casanova ja...@2ndquadrant.com writes: I want to propose the addition of a new field in IDENTIFY_SYSTEM: xlogversion, which will carry XLOG_PAGE_MAGIC from primary. The idea of sending that info is to allow us to know if the xlog page version of two different major versions are compatible or not. Currently pg_upgrade requires the primary to be taken down, That's *intentional*. The notion of WAL-shipping-replication compatibility between two different major versions is insane on its face. They will not have compatible system catalog contents. You might get perfect replication of the master's catalogs, but the slave wouldn't be able to interpret them. That's exactly how hard in place upgrade was to begin with. Considering how valuable this would be, it seems worth it to pursue this. The reason we have XLOG_PAGE_MAGIC is really more the opposite: to prevent people from trying to recover across a minor version update in which we had to break XLOG compatibility. I don't recall right now if that's ever actually happened, but it definitely could. If that is true, then allowing this patch will allow us to detect that incompatibility when the standby connects to the master, and explain the issue in a useful error message. Otherwise we will just barf on the magic value. Having access to these details might make it possible to upgrade. They could be inferred, but it would be better to have the full data so we can take an informed decision about whether or not it is possible. So even if people don't believe in the rationale behind the patch, would allowing it harm anything at this point? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Predicate locking
I wrote: On the other hand, as a shop where we're probably going to set default_transaction_isolation = serializable in our postgresql.conf files and include trigger checks that we're running at that level, we can just boost those globally. That may also work for others. Just as a quick experiment I took Greg's example and tried it with different costs, and thereby eliminated the false positives for this particular example, all the way down to a 5 row table!: set random_page_cost = 0.2; set cpu_tuple_cost = 0.05; drop table t; create table t (id bigint, value bigint); insert into t(id,value) (select s,1 from generate_series(1,5) as s); create index t_idx on t(id); begin transaction; set transaction isolation level serializable; select * from t where id = 2; insert into t (id, value) values (-2, 1); Execute this on the second client: set random_page_cost = 0.2; set cpu_tuple_cost = 0.05; begin transaction; set transaction isolation level serializable; select * from t where id = 3; insert into t (id, value) values (-3, 0); commit; Then go back to the first client and commit -- no problem. I make no representation that these are great numbers for any particular workload; it's just meant as a quick illustration that these behaviors are tunable. With serializable transactions, it probably is reasonable to figure that the cost of a sequential scan or of reading a tuple includes the cost of some percentage of transactions being rolled back and restarted. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011: Alvaro Herrera alvhe...@alvh.no-ip.org writes: The main idea is to be able to store column values in an audit table like this: old_valuevariant new_valuevariant Currently, they use text for old_value and new_value, but this is, of course, not very satisfactory. Just out of curiosity, what actual functionality gain would ensue over just using text? It seems like doing anything useful with the audit table contents would still require casting the column to text, or the moral equivalent of that. Storage efficiency. These people have really huge databases; small changes in how tight things are packed makes a large difference for them. (For example, we developed a type to store SHA-2 digests in a more compact way than bytea mainly because of this reason. Also, at some time they also wanted to apply compression to hstore keys and values.) As someone commented downthread, they also want to have things such as a typeof operator. It could be used in (say) a plpgsql function to choose different branches of code. Things go wild when you think about using VARIANT as a function parameter type. Given how complex are our current function resolution rules I'm not really sold on making this work at all, so initially I'm aiming at just raising an error in that case. If we did have a concept of ANYTYPE, or really we'd need ANYTYPE[], it could perhaps be used in pg_statistic, in place of the rather ugly anyarray hack that's there now. But I note that nothing above the level of C code can do anything very useful with the contents of pg_statistic, and I'm not sure that having an official type would change that. Well, we could get rid of ugly hacks that are in various places in the backend code to cope with this. Nor sure how useful it'd be for doing things at the SQL level. However, as a middle ground we agreed that we could allow a declared variant to store any pinned type without restrictions; those can't be changed or dropped so there's no safety concern. If you're going to accept that there are restrictions, I don't see that there is a good argument against your thought of a declared union type. At least then it's clear what the restrictions are. I am firmly against exposing the notion of pinned types in any user-visible SQL semantics, and even more firmly against the idea of creating fundamental functionality that only works for built-in types and can't be extended to add-on types. The argument was it's too cumbersome to use (which makes sense: users are certain to forget to add the new type to the declared union when they add a new column to the table, possibly causing a run-time error if the audit trigger is in place and attempts to load a new datum into the log table.) I understand the reluctancy to use pinned-ness in a user-visible way. Back to the drawing board. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011: Just out of curiosity, what actual functionality gain would ensue over just using text? It seems like doing anything useful with the audit table contents would still require casting the column to text, or the moral equivalent of that. Storage efficiency. These people have really huge databases; small changes in how tight things are packed makes a large difference for them. (For example, we developed a type to store SHA-2 digests in a more compact way than bytea mainly because of this reason. Also, at some time they also wanted to apply compression to hstore keys and values.) Hmm. The prototypical case for this would probably be a 4-byte int, which if you add an OID to it so you can resolve the type is going to take 8 bytes, plus you are going to need a length word because there is really no alternative to the VARIANT type being varlena overall, which makes it 9 bytes if you're lucky on alignment and up to 16 if you're not. That is not shorter than the average length of the text representation of an int. The numbers don't seem a lot better for 8-byte quantities like int8, float8, or timestamp. It might be marginally worthwhile for timestamp, but surely this is a huge amount of effort to substitute for thinking of a more compact text representation for timestamps. Pardon me for being unconvinced. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DLL export with mingw-w64: currently a no-op
On Wed, 04 May 2011 15:11:57 -, Andrew Dunstan and...@dunslane.net wrote: On 05/04/2011 01:25 AM, Johann 'Myrkraverk' Oskarsson wrote: On Tue, 03 May 2011 12:40:28 -, Andrew Dunstan and...@dunslane.net wrote: Our Makefiles use dlltool and dllwrap to create DLLs. If you used our recommended build method pgxs would do lots of the work for you. dllwrap is a tool to build DLLs. It seems to be deprecated in favour of gcc -shared option, but some projects still use it. SQLite, for example. Armed with this information, it may be prudent review the DLL build process in PGXS. OK, but if we want to get rid of using dllwrap we'd need a complete patch for it. Just changing the definition of the macro without changing the rest isn't going to cut it, I think. But it might well be worth doing. You're right. And now that I know a little bit more about how to build DLLs and what's involved I can take a look at this, sometime in the not-too- distant future. Until next time, have fun! -- Johann Oskarssonhttp://www.2ndquadrant.com/|[] PostgreSQL Development, 24x7 Support, Training and Services --+-- | Blog: http://my.opera.com/myrkraverk/blog/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On May 4, 2011, at 3:04 PM, Alvaro Herrera wrote: As someone commented downthread, they also want to have things such as a typeof operator. It could be used in (say) a plpgsql function to choose different branches of code. FWIW, pg_typeof(any) has been in core since 9.0. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On 05/04/2011 07:05 PM, Tom Lane wrote: Alvaro Herreraalvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011: Just out of curiosity, what actual functionality gain would ensue over just using text? It seems like doing anything useful with the audit table contents would still require casting the column to text, or the moral equivalent of that. Storage efficiency. These people have really huge databases; small changes in how tight things are packed makes a large difference for them. (For example, we developed a type to store SHA-2 digests in a more compact way than bytea mainly because of this reason. Also, at some time they also wanted to apply compression to hstore keys and values.) Hmm. The prototypical case for this would probably be a 4-byte int, which if you add an OID to it so you can resolve the type is going to take 8 bytes, plus you are going to need a length word because there is really no alternative to the VARIANT type being varlena overall, which makes it 9 bytes if you're lucky on alignment and up to 16 if you're not. That is not shorter than the average length of the text representation of an int. The numbers don't seem a lot better for 8-byte quantities like int8, float8, or timestamp. It might be marginally worthwhile for timestamp, but surely this is a huge amount of effort to substitute for thinking of a more compact text representation for timestamps. Pardon me for being unconvinced. I'm far from convinced that storing deltas per column rather than per record is a win anyway. I don't have hard numbers to hand, but my vague recollection is that my tests showed it to be a design that used more space. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Some surprising precedence behavior in PG's grammar
While looking at the grammar's operator-precedence declarations in connection with a recent pgsql-docs question, it struck me that this declaration is a foot-gun waiting to go off: %nonassoc IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */ The only terminal that we actually need to attach precedence to for IS NULL and related productions is IS. The others are just listed there to save attaching explicit %prec declarations to those productions. This seems like a bad idea, because attaching a precedence to a terminal symbol that doesn't absolutely have to have one is just asking for trouble: it can cause bison to accept grammars that would better have provoked a shift/reduce error, and to silently resolve the ambiguity in a way that you maybe didn't expect. So I thought to myself that it'd be better to remove the unnecessary precedence markings, and tried, with the attached patch. And behold, I got a shift/reduce conflict: state 2788 1569 b_expr: b_expr qual_Op . b_expr 1571 | b_expr qual_Op . NULL_P shift, and go to state 1010 NULL_P[reduce using rule 1571 (b_expr)] So the god of unintended consequences has been here already. What this is telling us is that in examples such as CREATE TABLE foo (f1 int DEFAULT 10 %% NULL); it is not immediately clear to bison whether to shift upon seeing the NULL (which leads to a parse tree that says %% is an infix operator with arguments 10 and NULL), or to reduce (which leads to a parse tree that says that %% is a postfix operator with argument 10, and NULL is a column declaration constraint separate from the DEFAULT expression). If you try the experiment, you find out that the first interpretation is preferred by the current grammar: ERROR: operator does not exist: integer %% unknown Now, this is probably a good thing, because NULL as a column declaration constraint is not SQL standard (only NOT NULL is), so we're resolving the ambiguity in a way that's more likely to be SQL-compatible. But it could be surprising to somebody who expected the other behavior, especially since this seemingly-closely-related command is parsed the other way: CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL); ERROR: operator does not exist: integer %% And the reason for that difference in behavior is that NOT has a declared precedence lower than POSTFIXOP, whereas NULL has a declared precedence that's higher. That comparison determines how bison resolves the shift/reduce conflict. The fact that this behavior falls out of a precedence declaration that's seemingly quite unrelated, and was surely not designed with this case in mind, is a perfect example of why I say that precedence declarations are hazardous. So I'd still like to get rid of the precedence markings for TRUE_P, FALSE_P, and UNKNOWN, and will do so unless somebody has a really good reason not to. (It looks like we could avoid marking ZONE, too.) But I would be happier if we could also not mark NULL, because that's surely used in a lot of other places, and could easily bite us a lot harder than this. Can anyone think of an alternative way to resolve this particular conflict without the blunt instrument of a precedence marking? regards, tom lane diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 933a1a2..2fb0418 100644 *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *** static void SplitColQualList(List *qualL *** 614,620 %left Op OPERATOR /* multi-character ops and user-defined operators */ %nonassoc NOTNULL %nonassoc ISNULL ! %nonassoc IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */ %left '+' '-' %left '*' '/' '%' %left '^' --- 614,620 %left Op OPERATOR /* multi-character ops and user-defined operators */ %nonassoc NOTNULL %nonassoc ISNULL ! %nonassoc IS/* sets precedence for IS NULL, etc */ %left '+' '-' %left '*' '/' '%' %left '^' *** a_expr: c_expr { $$ = $1; } *** 9887,9893 * a ISNULL * a NOTNULL */ ! | a_expr IS NULL_P { NullTest *n = makeNode(NullTest); n-arg = (Expr *) $1; --- 9887,9893 * a ISNULL * a NOTNULL */ ! | a_expr IS NULL_P %prec IS { NullTest *n = makeNode(NullTest); n-arg = (Expr *) $1; *** a_expr: c_expr { $$ = $1; } *** 9901,9907 n-nulltesttype = IS_NULL; $$ = (Node *)n; } ! | a_expr IS NOT NULL_P { NullTest *n = makeNode(NullTest); n-arg = (Expr *) $1; --- 9901,9907 n-nulltesttype = IS_NULL; $$ = (Node *)n; } ! | a_expr IS NOT NULL_P %prec IS { NullTest *n = makeNode(NullTest); n-arg = (Expr *) $1; *** a_expr: c_expr { $$ = $1; } *** 9919,9960 { $$ = (Node
Re: [HACKERS] VARIANT / ANYTYPE datatype
David E. Wheeler da...@kineticode.com writes: On May 4, 2011, at 3:04 PM, Alvaro Herrera wrote: As someone commented downthread, they also want to have things such as a typeof operator. It could be used in (say) a plpgsql function to choose different branches of code. FWIW, pg_typeof(any) has been in core since 9.0. But note that pg_typeof() would surely say VARIANT if applied to such a datatype. You'd need some other, VARIANT-specific function that knew enough to dig into the value at runtime and get the OID therein. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some surprising precedence behavior in PG's grammar
On 05/04/2011 07:39 PM, Tom Lane wrote: While looking at the grammar's operator-precedence declarations in connection with a recent pgsql-docs question, it struck me that this declaration is a foot-gun waiting to go off: %nonassoc IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */ The only terminal that we actually need to attach precedence to for IS NULL and related productions is IS. The others are just listed there to save attaching explicit %prec declarations to those productions. This seems like a bad idea, because attaching a precedence to a terminal symbol that doesn't absolutely have to have one is just asking for trouble: it can cause bison to accept grammars that would better have provoked a shift/reduce error, and to silently resolve the ambiguity in a way that you maybe didn't expect. So I thought to myself that it'd be better to remove the unnecessary precedence markings, and tried, with the attached patch. And behold, I got a shift/reduce conflict: state 2788 1569 b_expr: b_expr qual_Op . b_expr 1571 | b_expr qual_Op . NULL_P shift, and go to state 1010 NULL_P[reduce using rule 1571 (b_expr)] So the god of unintended consequences has been here already. What this is telling us is that in examples such as CREATE TABLE foo (f1 int DEFAULT 10 %% NULL); it is not immediately clear to bison whether to shift upon seeing the NULL (which leads to a parse tree that says %% is an infix operator with arguments 10 and NULL), or to reduce (which leads to a parse tree that says that %% is a postfix operator with argument 10, and NULL is a column declaration constraint separate from the DEFAULT expression). If you try the experiment, you find out that the first interpretation is preferred by the current grammar: ERROR: operator does not exist: integer %% unknown Yeah, IIRC the default action for a shift/reduce conflict is to shift, as it's usually the right thing to do. Now, this is probably a good thing, because NULL as a column declaration constraint is not SQL standard (only NOT NULL is), so we're resolving the ambiguity in a way that's more likely to be SQL-compatible. But it could be surprising to somebody who expected the other behavior, especially since this seemingly-closely-related command is parsed the other way: CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL); ERROR: operator does not exist: integer %% And the reason for that difference in behavior is that NOT has a declared precedence lower than POSTFIXOP, whereas NULL has a declared precedence that's higher. That comparison determines how bison resolves the shift/reduce conflict. The fact that this behavior falls out of a precedence declaration that's seemingly quite unrelated, and was surely not designed with this case in mind, is a perfect example of why I say that precedence declarations are hazardous. So I'd still like to get rid of the precedence markings for TRUE_P, FALSE_P, and UNKNOWN, and will do so unless somebody has a really good reason not to. (It looks like we could avoid marking ZONE, too.) But I would be happier if we could also not mark NULL, because that's surely used in a lot of other places, and could easily bite us a lot harder than this. Can anyone think of an alternative way to resolve this particular conflict without the blunt instrument of a precedence marking? My bison-fu is a bit rusty, but years ago I could do this stuff in my sleep. I'll be surprised if there isn't a way. If we do need a precedence setting for NULL_P, then I think it should probably be on its own and not sharing one with IS. If you don't solve it soon I'll take a look after I clear a couple of higher priority items from my list. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Excerpts from Tom Lane's message of mié may 04 20:05:54 -0300 2011: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011: Just out of curiosity, what actual functionality gain would ensue over just using text? It seems like doing anything useful with the audit table contents would still require casting the column to text, or the moral equivalent of that. Storage efficiency. These people have really huge databases; small changes in how tight things are packed makes a large difference for them. (For example, we developed a type to store SHA-2 digests in a more compact way than bytea mainly because of this reason. Also, at some time they also wanted to apply compression to hstore keys and values.) Hmm. The prototypical case for this would probably be a 4-byte int, which if you add an OID to it so you can resolve the type is going to take 8 bytes, plus you are going to need a length word because there is really no alternative to the VARIANT type being varlena overall, which makes it 9 bytes if you're lucky on alignment and up to 16 if you're not. Hmm, I was hoping that we could skip making it varlena at least in some cases ... perhaps that's hopeless, in which case we'll have to reassess the whole idea. Still there's the new functionality argument, though. As a followup idea there exists the desire to store records as records and not text representation of same (given differing record types, of course), for which it'd be more worthwhile. It might be marginally worthwhile for timestamp, but surely this is a huge amount of effort to substitute for thinking of a more compact text representation for timestamps. Surely if it's just for timestamps it'd be a waste of time. Pardon me for being unconvinced. :-) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some surprising precedence behavior in PG's grammar
Andrew Dunstan and...@dunslane.net writes: On 05/04/2011 07:39 PM, Tom Lane wrote: If you try the experiment, you find out that the first interpretation is preferred by the current grammar: ERROR: operator does not exist: integer %% unknown Yeah, IIRC the default action for a shift/reduce conflict is to shift, as it's usually the right thing to do. Well, there's nothing default about it: we've got precedence declarations that specifically tell it to do that. I'm just disturbed because this isn't what those precedences were meant to do. I would be happier if we could also not mark NULL, because that's surely used in a lot of other places, and could easily bite us a lot harder than this. Can anyone think of an alternative way to resolve this particular conflict without the blunt instrument of a precedence marking? My bison-fu is a bit rusty, but years ago I could do this stuff in my sleep. I'll be surprised if there isn't a way. If we do need a precedence setting for NULL_P, then I think it should probably be on its own and not sharing one with IS. Yeah, I was thinking that too. If we put %prec on the IS [NOT] NULL productions then there is no need for NULL_P to have exactly its current precedence; anything above POSTFIXOP would preserve the current behavior in the DEFAULT ... NULL case. (And if we decided we wanted to flip that behavior, anything below POSTFIXOP would do that.) But it would still make life safer for future grammar hacking if it didn't have precedence at all. BTW, I wonder why NOTNULL and ISNULL have their own precedence levels, rather than being made to act exactly like IS [NOT] NULL ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Alvaro Herrera alvhe...@commandprompt.com writes: As a followup idea there exists the desire to store records as records and not text representation of same (given differing record types, of course), for which it'd be more worthwhile. Maybe. The conventional wisdom is that text representation of data is more compact than PG's internal representation by a significant factor --- our FAQ says up to 5x, in fact. I know that that's including row overhead and indexes and so on, but I still don't find it to be a given that you're going to win on space with this sort of trick. Another point here is that once the values in question get to be any significant number of bytes apiece, both text and the proposed VARIANT representation could be subject to compression, which would greatly reduce any nominal advantage of one over the other. Any representational inefficiency in either would just be grist for the compressor. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some surprising precedence behavior in PG's grammar
On Thu, May 5, 2011 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: So I'd still like to get rid of the precedence markings for TRUE_P, FALSE_P, and UNKNOWN, and will do so unless somebody has a really good reason not to. (It looks like we could avoid marking ZONE, too.) But I would be happier if we could also not mark NULL, because that's surely used in a lot of other places, and could easily bite us a lot harder than this. Can anyone think of an alternative way to resolve this particular conflict without the blunt instrument of a precedence marking? Isn't there already some gadget which forces postfix operators to be discouraged compared to some other interpretation in other cases? That would be the opposite of the current interpretation though which you said you preferred. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] patch: fix race in SSI's CheckTargetForConflictsIn
While running some benchmarks to test SSI performance, I found a race condition that's capable of causing a segfault. A patch is attached. The bug is in CheckTargetForConflictsIn, which scans the list of SIREAD locks on a lock target when it's modified. There's an optimization in there where the writing transaction will remove a SIREAD lock that it holds itself, because it's being replaced with a (stronger) write lock. To do that, it needs to drop its shared lwlocks and reacquire them in exclusive mode. The existing code deals with concurrent modifications in that interval by redoing checks. However, it misses the case where some other transaction removes all remaining locks on the target, and proceeds to remove the lock target itself. The attached patch fixes this by deferring the SIREAD lock removal until the end of the function. At that point, there isn't any need to worry about concurrent updates to the target's lock list. The resulting code is also simpler. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c index 3309d07..c274bca 100644 --- a/src/backend/storage/lmgr/predicate.c +++ b/src/backend/storage/lmgr/predicate.c @@ -3905,6 +3905,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag) LWLockId partitionLock; PREDICATELOCKTARGET *target; PREDICATELOCK *predlock; + PREDICATELOCK *mypredlock = NULL; + PREDICATELOCKTAG mypredlocktag; Assert(MySerializableXact != InvalidSerializableXact); @@ -3950,139 +3952,21 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag) if (sxact == MySerializableXact) { /* - * If we're getting a write lock on the tuple and we're not in a - * subtransaction, we don't need a predicate (SIREAD) lock. We - * can't use this optimization within a subtransaction because the - * subtransaction could be rolled back, and we would be left - * without any lock at the top level. + * If we're getting a write lock on a tuple, we don't need + * a predicate (SIREAD) lock on the same tuple. We can + * safely remove our SIREAD lock, but we'll defer doing so + * until after the loop because that requires upgrading to + * an exclusive partition lock. * - * At this point our transaction already has an ExclusiveRowLock - * on the relation, so we are OK to drop the predicate lock on the - * tuple, if found, without fearing that another write against the - * tuple will occur before the MVCC information makes it to the - * buffer. + * We can't use this optimization within a subtransaction + * because the subtransaction could roll back, and we + * would be left without any lock at the top level. */ if (!IsSubTransaction() GET_PREDICATELOCKTARGETTAG_OFFSET(*targettag)) { -uint32 predlockhashcode; -PREDICATELOCKTARGET *rmtarget = NULL; -PREDICATELOCK *rmpredlock; -LOCALPREDICATELOCK *locallock, - *rmlocallock; - -/* - * This is a tuple on which we have a tuple predicate lock. We - * only have shared LW locks now; release those, and get - * exclusive locks only while we modify things. - */ -LWLockRelease(SerializableXactHashLock); -LWLockRelease(partitionLock); -LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED); -LWLockAcquire(partitionLock, LW_EXCLUSIVE); -LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE); - -/* - * Remove the predicate lock from shared memory, if it wasn't - * removed while the locks were released. One way that could - * happen is from autovacuum cleaning up an index. - */ -predlockhashcode = PredicateLockHashCodeFromTargetHashCode - ((predlock-tag), targettaghash); -rmpredlock = (PREDICATELOCK *) - hash_search_with_hash_value(PredicateLockHash, -(predlock-tag), -predlockhashcode, -HASH_FIND, NULL); -if (rmpredlock) -{ - Assert(rmpredlock == predlock); - - SHMQueueDelete(predlocktargetlink); - SHMQueueDelete((predlock-xactLink)); - - rmpredlock = (PREDICATELOCK *) - hash_search_with_hash_value(PredicateLockHash, - (predlock-tag), - predlockhashcode, - HASH_REMOVE, NULL); - Assert(rmpredlock == predlock); - - RemoveTargetIfNoLongerUsed(target, targettaghash); - - LWLockRelease(SerializableXactHashLock); - LWLockRelease(partitionLock); - LWLockRelease(SerializablePredicateLockListLock); - - locallock = (LOCALPREDICATELOCK *) - hash_search_with_hash_value(LocalPredicateLockHash, - targettag, targettaghash, - HASH_FIND, NULL); - - /* - * Remove entry in local lock table if it exists and has - * no children. It's OK if it doesn't exist; that means - * the lock was transferred to a new target by a different - * backend. - */ - if (locallock != NULL) -
[HACKERS] Enhancing online recovery in SR mode
Hi, I have just committed changes to enhance online recovery in streaming replication mode. Pgpool-II had to restart child process to recognize failback node. Now pgpool-II becomes a little bit smarter and does not need to restart children. This means that client sessions started before online recovery can continue while doing online recovery. The trick is to have local backend status cache in pgpool child. It copies backend status into local memory. Thus the process continues to disregard failback node until the session ends. Once the session ends, the process exits with status 1, and pgpool-II main restart new one, which will recognize failback node. Note that this enhancement is applied to pcp_attach_node as well. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enhancing online recovery in SR mode
Oops. This should have been posted to pgpool-hackers. Sorry for noise. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp Hi, I have just committed changes to enhance online recovery in streaming replication mode. Pgpool-II had to restart child process to recognize failback node. Now pgpool-II becomes a little bit smarter and does not need to restart children. This means that client sessions started before online recovery can continue while doing online recovery. The trick is to have local backend status cache in pgpool child. It copies backend status into local memory. Thus the process continues to disregard failback node until the session ends. Once the session ends, the process exits with status 1, and pgpool-II main restart new one, which will recognize failback node. Note that this enhancement is applied to pcp_attach_node as well. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some surprising precedence behavior in PG's grammar
Greg Stark gsst...@mit.edu writes: Isn't there already some gadget which forces postfix operators to be discouraged compared to some other interpretation in other cases? Yeah. I'm not unhappy with the current grammar's behavior in this case. What's bothering me is that the implementation seems likely to create surprising/unexpected behaviors after future grammar changes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some surprising precedence behavior in PG's grammar
On Thu, May 5, 2011 at 4:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: Isn't there already some gadget which forces postfix operators to be discouraged compared to some other interpretation in other cases? Yeah. I'm not unhappy with the current grammar's behavior in this case. What's bothering me is that the implementation seems likely to create surprising/unexpected behaviors after future grammar changes. I do wonder how much we really gain from having postfix operators. Other than ! I've never seen one and of course anyone who wanted to use one could just as easily use a prefix operator. In practice I think most unary operators are just special cases of binary operators anyways and often once you have the binary operator it's clearer to just use that anyways. A *lot* of grammar conflicts we've had to worry about end up going away if we didn't have postfix operators. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers