Re: [HACKERS] Wal sync odirect
On 07/21/2013 10:01 PM, Миша Тюрин wrote: hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal. i think in my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd under wal) - there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my archive command may read wal directly without os cache. just opinion, i have not done any tests yet. but i am still under the some memory eviction anomaly. PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode for WAL writes. See comments in src/include/access/xlogdefs.h (search for O_DIRECT). You should also examine src/backend/access/transam/xlog.c, particularly the function get_sync_bit(...) Try doing some tests with pg_test_fsync, see how performance looks. If your theory is right and WAL traffic is putting pressure on kernel write buffers, using fsync=open_datasync - which should be the default on Linux - may help. I'd recommend doing some detailed tracing and performance measurements before trying to proceed further. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: template-ify (binary) extensions
On 07/22/2013 12:11 AM, Hannu Krosing wrote: Dropping this barrier by installing an untrusted PL (or equally insecure extensions), an attacker with superuser rights can trivially gain root. Could you elaborate ? This is equivalent to claiming that any linux user can trivially gain root. Uh. Sorry, you're of course right, the attacker can only gain postgres rights in that case. Thanks for correcting. The point still holds. It's another layer that an attacker would have to overcome. You already mentioned untrusted PL languages, and I don't see any difference in between offering PL/pythonu and PL/C on security grounds, really. I agree. However, this also means that any kind of solution it offers is not a good one for the security conscious sysadmin. This is usually the case with a security conscious sysadmin - they very seldom want to install anything. Exactly. That's why I'm favoring solutions that don't require any extension and keep the guarantee of preventing arbitrary native code. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce maximum error in tuples estimation after vacuum.
Hello, I'm very sory to take your time on this mess. ul 2013 16:06:11 +0530, Amit Kapila amit.kap...@huawei.com wrote in 014201ce7bc6$f71eb950$e55c2bf0$@kap...@huawei.com I understood your patch's algorithm, but still I have doubt in my mind that if the next analyze can correct the estimates, Why would that be not sufficient. Please refer my last mail for analysis of same http://www.postgresql.org/message-id/000601ce77ad$7d3388e0$779a9aa0$@kapila@ huawei.com Hmm. I've reconfirmed what was happened on my test set. As the result, the misestimation with dead_tup = 0 which I thought to observe has turned out to be an illusion.. Tuple number estimation is working as it is expected. I withdraw this patch. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Wal sync odirect
i tell about wal_level is higher than MINIMAL wal_level != minimal http://doxygen.postgresql.org/xlogdefs_8h_source.html 48 * Because O_DIRECT bypasses the kernel buffers, and because we never 49 * read those buffers except during crash recovery or if wal_level != minimal hi, list. there are my proposal. i would like to tell about odirect in wal sync in wal_level is higher than minimal. i think in my case when wal traffic is up to 1gb per 2-3 minutes but discs hardware with 2gb bbu cache (or maybe ssd under wal) - there would be better if wall traffic could not harm os memory eviction. and i do not use streaming. my archive command may read wal directly without os cache. just opinion, i have not done any tests yet. but i am still under the some memory eviction anomaly. PostgreSQL already uses O_DIRECT for WAL writes if you use O_SYNC mode for WAL writes. See comments in src/include/access/xlogdefs.h (search for O_DIRECT). You should also examine src/backend/access/transam/xlog.c, particularly the function get_sync_bit(...) Try doing some tests with pg_test_fsync, see how performance looks. If your theory is right and WAL traffic is putting pressure on kernel write buffers, using fsync=open_datasync - which should be the default on Linux - may help. I'd recommend doing some detailed tracing and performance measurements before trying to proceed further. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Wal sync odirect
On 07/22/2013 03:30 PM, Миша Тюрин wrote: i tell about wal_level is higher than MINIMAL OK, so you want to be able to force O_DIRECT for wal_level = archive ? I guess that makes sense if you expect the archive_command to read the file out of the RAID controller's write cache before it gets flushed and your archive_command can also use direct I/O to avoid pulling it into cache. You already know where to change to start experimenting with this. What exactly are you trying to ask? I don't see any risk in forcing O_DIRECT for higher wal_level, but I'm not an expert in WAL and recovery. I'd recommend testing on a non-critical PostgreSQL instance. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding new joining alghoritm to postgresql
On 07/19/2013 09:47 PM, tubadzin wrote: Hi. I'm a little confused. 1.I have source code 9.2.4. version from http://www.postgresql.org/ftp/source/ 2.I want to add new alghoritm to index nested loops join, merge join and hash join. I have Executor catalog in src catalag containing nodeHash.c, nodeHasjoin.c, nodeMergejoin and nodeNestloop.c 3.After changes, I want to compile postgresql and use it. 4.Problem is: a)I do not know which library is responsible for this functionality. I understand, that I have to compile src and replace library (I don't know which library) in path where Postgresql in installed: C:\Program Files (x86)\PostgreSQL\9.2 b)I don't know how use files/library (which library?) with visual studio 2010 and how compile it. Start here: http://www.postgresql.org/docs/current/static/install-windows-full.html You don't need to install all the dependencies when you're just compiling a copy for testing. You might find this tool I wrote a while ago interesting, it tries to automate downloading and compiling dependencies, creation of config.pl, etc: https://github.com/2ndQuadrant/pg_build_win Once you've successfully compiled PostgreSQL, start reading the planner/executor sources. You will find this documentation quite useful when trying to understand the code: http://www.postgresql.org/docs/current/static/internals.html as well as: http://www.postgresql.org/files/developer/tour.pdf http://momjian.us/main/presentations/internals.html Working with the query planner and adding node types is NOT the easiest way to get started with the PostgreSQL source code, though! You will have a lot of learning ahead of you. Consider trying to explain in greater detail what your idea is. See if anybody here has already tried it, make sure you're not exploring a dead end. Get ideas and suggestions on how to approach the problem before you start work on it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
On 07/22/2013 12:17 PM, Quan Zongliang wrote: Hi hackers, I tried to improve performance when database is Chinese. Under openSUSE, create index on table with 54996 rows locale=C, 140ms locale=zh_CN, 985ms I think the function strcoll() of Linux is too slow. So, I made a new utf8 to GB18030 map, store Chinese order in it. Do not call strcoll(). On my modified code, same operation, locale=zh_CN, 203ms. It might be worth looking at gcc's strcoll() implementation. See if it performs better when you use the latest gcc, and if not try to improve gcc's strcoll() . I'd be interested in seeing a test case for this that shows that the results of your new collation are exactly the same as the original strcoll() based approach. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses
(2013/07/19 22:48), Greg Smith wrote: On 7/19/13 3:53 AM, KONDO Mitsumasa wrote: Recently, a user who think system availability is important uses synchronous replication cluster. If your argument for why it's OK to ignore bounding crash recovery on the master is that it's possible to failover to a standby, I don't think that is acceptable. PostgreSQL users certainly won't like it. OK. I will also test recovery time. However, I consider more good practice now, I test it with new patch. I want you to read especially point that is line 631, 651, and 656. MAX_WRITEBACK_PAGES is 1024 (1024 * 4096 byte). You should read http://www.westnet.com/~gsmith/content/linux-pdflush.htm to realize everything you're telling me about the writeback code and its congestion logic I knew back in 2007. The situation is even worse than you describe, because this section of Linux has gone through multiple, major revisions since then. You can't just say here is the writeback source code; you have to reference each of the commonly deployed versions of the writeback feature to tell how this is going to play out if released. There are four major ones I pay attention to. The old kernel style as see in RHEL5/2.6.18--that's what my 2007 paper discussed--the similar code but with very different defaults in 2.6.22, the writeback method/tuning in RHEL6/Debian Squeeze/2.6.32, and then there are newer kernels. (The newer ones separate out into a few branches too, I haven't mapped those as carefully yet) The writeback source code which I indicated part of writeback is almost same as community kernel (2.6.32.61). I also read linux kernel 3.9.7, but it is almost same this part. I think that fs-writeback.c is easier than xlog.c. It is only 1309 steps. I think that linux distributions are only different about tuning parameter, but same as programing logic. Do you think to need reading debian kernel source code? I will read part of this code, because it is only scores of steps at most. There are some examples of what really bad checkpoints look like in http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon2011.pdf if you want to see some of them. That's the talk I did around the same time I was trying out spreading the database fsync calls out over a longer period. Does it cause in ext3 or 4 file system? I think this is bug of XFS. If fsync call doesn't return, it indicate cannot writing WAL and not return their commit. It is seriously problem. My fsync patch is only sleep returned succece of fsync and maximum sleep time is set to 10 seconds. It does not cause bad for this problem. When I did that, checkpoints became even less predictable, and that was a major reason behind why I rejected the approach. I think your suggestion will have the same problem. You just aren't generating test cases with really large write workloads yet to see it. You also don't seem afraid of how exceeding the checkpoint timeout is a very bad thing yet. I think it is important that why this problem was caused. We should try to find the cause of which program has bug or problem. In addition, if you set a large value of a checkpoint_timeout or checkpoint_complete_taget, you have said that performance is improved, but is it true in all the cases? The timeout, yes. Throughput is always improved by increasing checkpoint_timeout. Less checkpoints per unit of time increases efficiency. Less writes of the most heavy accessed buffers happen per transaction. It is faster because you are doing less work, which on average is always faster than doing more work. And doing less work usually beats doing more work, but doing it smarter. If you want to see how much work per transaction a test is doing, track the numbers of buffers written at the beginning/end of your test via pg_stat_bgwriter. Tests that delay checkpoints will show a lower total number of writes per transaction. That seems more efficient, but it's efficiency mainly gained by ignoring checkpoint_timeout. OK. In next test, I will try it. When a checkpoint complication target is actually enlarged, performance may fall in some cases. I think this as the last fsync having become heavy owing to having write in slowly. I think you're confusing throughput and latency here. Increasing the checkpoint timeout, or to a lesser extent the completion target, on average that increases throughput. It results in less work, and the more/less work amount is much more important than worrying about scheduler details. Now matter how efficient a given write is, whether you've sorted it across elevator horizon boundary A or boundary B, it's better not do it at all. I think fsync which has longest time or continues a lot block other transactions. And my patch not only improvement of throughput but also realize stable response time at fsync phase in checkpoint. By the way: if you have a theory like the last fsync having become heavy for why something is
Re: [HACKERS] improve Chinese locale performance
On 07/22/2013 03:54 PM, Craig Ringer wrote: On 07/22/2013 12:17 PM, Quan Zongliang wrote: Hi hackers, I tried to improve performance when database is Chinese. Under openSUSE, create index on table with 54996 rows locale=C, 140ms locale=zh_CN, 985ms I think the function strcoll() of Linux is too slow. So, I made a new utf8 to GB18030 map, store Chinese order in it. Do not call strcoll(). On my modified code, same operation, locale=zh_CN, 203ms. It might be worth looking at gcc's strcoll() implementation. See if it performs better when you use the latest gcc, and if not try to improve gcc's strcoll() . I'd be interested in seeing a test case for this that shows that the results of your new collation are exactly the same as the original strcoll() based approach. Do not same exactly. I found some errors in gcc's strcoll() when order by Chinese character. Because there are lots of special characters in Chinese. gcc's strcoll() do not consider this or missed at part of them. Yes, the best way is to impove gcc's strcoll(). But I don't know how to do. Thanks, Quan Zongliang -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses
(2013/07/21 4:37), Heikki Linnakangas wrote: Mitsumasa-san, since you have the test rig ready, could you try the attached patch please? It scans the buffer cache several times, writing out all the dirty buffers for segment A first, then fsyncs it, then all dirty buffers for segment B, and so on. The patch is ugly, but if it proves to be helpful, we can spend the time to clean it up. Thank you! It is interesting code, I test it. By the way, my campany's colleague helps us to testing. If you have other idea, please send me patch or methods. Best regards, -- Mitsumasa KONDO -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] small typo in src/backend/access/transam/xlog.c
Hi in void BootStrapXLOG(void) * to seed it other than the system clock value...) The upper half of the * uint64 value is just the tv_sec part, while the lower half is the XOR * of tv_sec and tv_usec. This is to ensure that we don't lose uniqueness * unnecessarily if uint64 is really only 32 bits wide. A person * knowing this encoding can determine the initialization time of the * installation, which could perhaps be useful sometimes. */ gettimeofday(tv, NULL); sysidentifier = ((uint64) tv.tv_sec) 32; sysidentifier |= (uint32) (tv.tv_sec | tv.tv_usec); should be sysidentifier |= (uint32) (tv.tv_sec ^ tv.tv_usec); Regards Didier
[HACKERS] enum-ify resource manager's xl_info values
Hi, Would somebody object to making the rmgr's invo value #defines like: /* XLOG info values for XLOG rmgr */ #define XLOG_CHECKPOINT_SHUTDOWN0x00 #define XLOG_CHECKPOINT_ONLINE 0x10 #define XLOG_NOOP 0x20 #define XLOG_NEXTOID0x30 #define XLOG_SWITCH 0x40 #define XLOG_BACKUP_END 0x50 #define XLOG_PARAMETER_CHANGE 0x60 #define XLOG_RESTORE_POINT 0x70 #define XLOG_FPW_CHANGE 0x80 #define XLOG_END_OF_RECOVERY0x90 #define XLOG_FPI0xA0 into enums? We already have a bunch of places looking at those values and if/when changeset extraction makes it in, it's going to be one more. Having the compiler tell you where a new value should have been been added as well helps. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using ini file to setup replication
On Fri, Jul 19, 2013 at 6:53 PM, Andres Freund and...@2ndquadrant.com wrote: So you can just do stuff like: server.foo.grand_unified_config = value. it looks good to me too. when server parse values which is written in postgresql.conf, server handles those parameter as item list value. after that, those parameter overwrite to corresponding values. so, we can allocate area of those values dynamically when server first parse those values. (of cause we should count number of servers) but I think that when server setting value is no set in order by server name, parsing is a little difficult. we should check whether server name has already registered at all time. for example : postgresql.conf - server.AAA.wal_sender_timeout = 60 server.BBB.synchronous_transfer = data_flush server.AAA.synchronous_transfer = all postgresql.conf - what do you think? and if the parameter(e.g., wal_sender_timeout) is not set in configure file(or postgresql,conf) like above, what value should we set value to parameter? and how we handle originally wal_sender_timeout? will it leave? or delete? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
On 7/22/13 3:54 AM, Craig Ringer wrote: It might be worth looking at gcc's strcoll() implementation. See if it performs better when you use the latest gcc, and if not try to improve gcc's strcoll() . I think part of the problem is that we call strcoll for each comparison, instead of doing strxfrm once for each datum and then just strcmp for each comparison. That is effectively equivalent to what the proposal implements. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX checking of index constraints
On Sun, Jul 21, 2013 at 11:30:54AM -0700, Josh Berkus wrote: Noah, Attached patch just restores the old behavior. Would it be worth preserving the ability to fix an index consistency problem with a REINDEX independent from related heap consistency problems such as duplicate keys? I would love to have two versions of REINDEX, one which validated and one which didn't. Maybe a ( validate off ) type check? +1 There are reasons to reindex that do not involve its validity and it would be great to not need to visit the heap for that. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enum-ify resource manager's xl_info values
Andres Freund and...@2ndquadrant.com writes: Would somebody object to making the rmgr's invo value #defines like: into enums? I think that will create more problems than it fixes. For one thing, the same field is used to store values that would need to be multiple independent enum types; and we also store additional bits into that field. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enum-ify resource manager's xl_info values
On 2013-07-22 08:53:53 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Would somebody object to making the rmgr's invo value #defines like: into enums? I think that will create more problems than it fixes. For one thing, the same field is used to store values that would need to be multiple independent enum types; and we also store additional bits into that field. Oh, I don't want to change the definition of XLogRecord or such. I just want to make the series of #defines an enum so you can write something like inf = record-xl_info ~XLR_INFO_MASK; switch ((XLogXactRecordType) info) { case ; } Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using ini file to setup replication
On Fri, Jul 19, 2013 at 2:20 PM, Samrat Revagade revagade.sam...@gmail.com wrote: for example: if i want to configure 2 servers then it will add 6 lines,for 3 -9, for 4-12 setting's for particular server will be: considering the way of setting value to conf parameters : guc . value standby_name.'AAA' synchronous_transfer. commit wal_sender_timeout.60 I have a feeling Samrat and Sawada-san have some good use cases where this extra syntax could be a big step up in expressiveness. But I'm having a hard time figuring out exactly what they have in mind. If either of you could explain in more detail how the extra syntax would apply to your use case and how it would let you do something that you can't already do it might be helpful. I'm assuming the idea is something like having a single config file which can work for the server regardless of whether it's acting as the primary or standby and then be able to switch roles by switching a single flag which would control which parts of the config file were applied. But I'm having a hard time seeing how exactly that would work. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Wal sync odirect
Le lundi 22 juillet 2013 09:39:50, Craig Ringer a écrit : On 07/22/2013 03:30 PM, Миша Тюрин wrote: i tell about wal_level is higher than MINIMAL OK, so you want to be able to force O_DIRECT for wal_level = archive ? I guess that makes sense if you expect the archive_command to read the file out of the RAID controller's write cache before it gets flushed and your archive_command can also use direct I/O to avoid pulling it into cache. You already know where to change to start experimenting with this. What exactly are you trying to ask? I don't see any risk in forcing O_DIRECT for higher wal_level, but I'm not an expert in WAL and recovery. I'd recommend testing on a non-critical PostgreSQL instance. IIRC there is also some fadvise() call to flush the buffer cache when using 'minimal', but not when using archiving of WAL. I'm unsure how this has been tunned with streaming replication addition. see xlog.c|h -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
Greg Stark st...@mit.edu writes: I do find the logic and variable names a bit confusing so I'm tempted to add some comments based on my initial confusion. And I'm tempted to add an ordinalityAttNum field to the executor node so we don't need to make these odd scanslot means this unless we have ordinality in which case it means that and funcslot means this logic. I haven't read this patch, but that comment scares the heck out of me. Even if the patch isn't broken today, it will be tomorrow, if it's making random changes like that in data structure semantics. Also, if you're confused, so will be everybody else who has to deal with the code later --- so I don't think fixing the comments and variable names is optional. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changeset generation v5-01 - Patches git tree
On Tue, Jul 16, 2013 at 9:00 AM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jul 7, 2013 at 4:34 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-07-07 15:43:17 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: 3b) Add catcache 'filter' that ensures the cache stays unique and use that for the mapping I slightly prefer 3b) because it's smaller, what's your opinions? This is just another variation on the theme of kluging the catcache to do something it shouldn't. You're still building a catcache on a non-unique index, and that is going to lead to trouble. I don't think the lurking dangers really are present. The index essentially *is* unique since we filter away anything non-unique. The catcache code hardly can be confused by tuples it never sees. That would even work if we started preloading catcaches by doing scans of the entire underlying relation or by caching all of a page when reading one of its tuples. I can definitely see that there are aesthetical reasons against doing 3b), that's why I've also done 3a). So I'll chalk you up to voting for that... I also vote for (3a). I did a quick once over of 1, 2, and 3a and they look reasonable. Barring strenuous objections, I'd like to go ahead and commit these, or perhaps an updated version of them. Hearing no objections, I have done this. Per off-list discussion with Andres, I also included patch 4, which gives us regression test coverage for this code, and have fixed a few bugs and a bunch of stylistic things that bugged me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
So the more I look at this patch the fewer things I want to change in it. I've several times thought I should make an improvement and then realized I was really just making unnecessary tweaks that didn't really make much difference. It seems a shame that the node has to call the function and get back a slot only to laboriously copy over all the attributes into a new slot. Worse, it's actually storing all the original tuples in a tuplestore without the ordinality and adding in the ordinality on output. This works because the FunctionScan only supports rescan, not mark/restore so it can easily recalculate them consistently if the tuplestore is rescanned. I looked into what it would take to get the ordinality added on the original scan and it would have to go so deep that it doesn't really seem worthwhile. I do find the logic and variable names a bit confusing so I'm tempted to add some comments based on my initial confusion. And I'm tempted to add an ordinalityAttNum field to the executor node so we don't need to make these odd scanslot means this unless we have ordinality in which case it means that and funcslot means this logic. That has the side benefit that if the executor node ever wants to add more attributes it won't have this assumption that the last column is the ordinality baked in. I think it'll make the code a bit clearer. Also, I really think the test cases are excessive. They're mostly repeatedly testing the same functionality over and over in cases that are superficially different but I'm fairly certain just invoke the same codepaths. This will just be an annoyance if we make later changes that require adjusting the output. Notably the test that covers the view defintiion appears in pg_views scares me. We bash around the formatting rules for view definition outputs pretty regularly. On the other hand it is nice to have regression tests that make sure these cases are covered. There's been more than one bug in the past caused by omitting updating these functions. I'm leaning towards leaving it in but in the long run we probably need a better solution for this. Oh, and I'm definitely leaning towards naming the column ordinality. Given we name columns things like generate_series and sum etc there doesn't seem to be good reason to avoid doing that here. All in all though I feel like I'm looking for trouble. It's not a very complex patch and is definitely basically correct. Who should get credit for it? David? Andrew? And who reviewed it? Dean? Anyone else? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Attached is an update that I think sorts out all of the documentation concerns. I broke this section into paragraphs now that it's getting so long too. The only code change is that this now labels the controversial lag here average rate limit schedule lag. That way if someone wants to introduce other measures of rate limit lag, like a more transaction oriented one, you might call that average rate limit transaction lag and tell the two apart. The rewritten documentation here tries to communicate that there is a schedule that acts like it was pre-computed at the start of each client too. It's not ever adjusted based on what individual transactions do. I also noted the way this can cause schedule lag for some time after a slow transaction finishes, since that's the main issue observed so far. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c new file mode 100644 index 2ad8f0b..4e6b608 *** a/contrib/pgbench/pgbench.c --- b/contrib/pgbench/pgbench.c *** int unlogged_tables = 0; *** 137,142 --- 137,148 doublesample_rate = 0.0; /* + * When threads are throttled to a given rate limit, this is the target delay + * to reach that rate in usec. 0 is the default and means no throttling. + */ + int64 throttle_delay = 0; + + /* * tablespace selection */ char *tablespace = NULL; *** typedef struct *** 202,212 --- 208,220 int listen; /* 0 indicates that an async query has been * sent */ int sleeping; /* 1 indicates that the client is napping */ + boolthrottling; /* whether nap is for throttling */ int64 until; /* napping until (usec) */ Variable *variables; /* array of variable definitions */ int nvariables; instr_time txn_begin; /* used for measuring transaction latencies */ instr_time stmt_begin; /* used for measuring statement latencies */ + boolis_throttled; /* whether transaction throttling is done */ int use_file; /* index in sql_files for this client */ boolprepared[MAX_FILES]; } CState; *** typedef struct *** 224,229 --- 232,240 instr_time *exec_elapsed; /* time spent executing cmds (per Command) */ int*exec_count; /* number of cmd executions (per Command) */ unsigned short random_state[3]; /* separate randomness for each thread */ + int64 throttle_trigger; /* previous/next throttling (us) */ + int64 throttle_lag; /* total transaction lag behind throttling */ + int64 throttle_lag_max; /* max transaction lag */ } TState; #define INVALID_THREAD((pthread_t) 0) *** typedef struct *** 232,237 --- 243,250 { instr_time conn_time; int xacts; + int64 throttle_lag; + int64 throttle_lag_max; } TResult; /* *** usage(void) *** 356,361 --- 369,375 -N, --skip-some-updates skip updates of pgbench_tellers and pgbench_branches\n -P, --progress=NUM show thread progress report every NUM seconds\n -r, --report-latencies report average latency per command\n +-R, --rate SPEC target rate in transactions per second\n -s, --scale=NUM report this scale factor in output\n -S, --select-onlyperform SELECT-only transactions\n -t, --transactions number of transactions each client runs *** doCustom(TState *thread, CState *st, ins *** 898,914 { PGresult *res; Command **commands; top: commands = sql_files[st-use_file]; if (st-sleeping) { /* are we sleeping? */ instr_time now; INSTR_TIME_SET_CURRENT(now); ! if (st-until = INSTR_TIME_GET_MICROSEC(now)) st-sleeping = 0; /* Done sleeping, go ahead with next command */ else return true;/* Still sleeping, nothing to do here */ } --- 912,973 { PGresult *res; Command **commands; + booltrans_needs_throttle = false; top: commands =
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
On Mon, Jul 22, 2013 at 4:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: I haven't read this patch, but that comment scares the heck out of me. Even if the patch isn't broken today, it will be tomorrow, if it's making random changes like that in data structure semantics. It's not making random changes. It's just that it has two code paths, in one it has the function scan write directly to the scan slot and in the other it has the function write to a different slot and copies the fields over to the scan slot. It's actually doing the right thing it's just hard to tell that at first (imho) because it's using the scan slots to determine which case applies instead of having a flag something to drive the decision. Also, if you're confused, so will be everybody else who has to deal with the code later --- so I don't think fixing the comments and variable names is optional. Well that's the main benefit of having someone review the code in my opinion. I'm no smarter than David or Andrew who wrote the code and there's no guarantee I'll spot any bugs. But at least I can observe myself and see where I have difficulty following the logic which the original author is inherently not in a position to do. Honestly this is pretty straightforward and well written so I'm just being especially careful not having committed anything recently. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
On Mon, Jul 22, 2013 at 12:50 PM, Peter Eisentraut pete...@gmx.net wrote: I think part of the problem is that we call strcoll for each comparison, instead of doing strxfrm once for each datum and then just strcmp for each comparison. That is effectively equivalent to what the proposal implements. Fwiw I used to be a big proponent of using strxfrm. But upon further analysis I realized it was a real difficult tradeoff. strxrfm saves potentially a lot of cpu cost but at the expense of expanding the size of the sort key. If the sort spills to disk or even if it's just memory bandwidth limited it might actually be slower than doing the additional cpu work of calling strcoll. It's hard to see how to decide in advance which way will be faster. I suspect strxfrm is still the better bet, especially for complex large character set based locales like Chinese. strcoll might still win by a large margin on simple mostly-ascii character sets. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Greg Smith wrote: Thanks. I didn't look at the code, but while trying to read the docs: +para + High rate limit schedule lag values, that is values not small with + respect to the actual transaction latency, indicate that something is + amiss in the throttling process. I couldn't really parse the above. Of the first six words, which one is a verb? Is there a noun that needs to be plural? Is there a word that shouldn't be there? ... Oh, I think it makes sense if I assume that rate limit schedule lag is a single concept .. but if so, that phrase seems too many words for it. (So when the RLSL values are high, this indicates a problem. Is that what the above means?) Also, it took me a while to understand what values not small means. I think there must be a way to phrase this that's easier to understand. High lag can highlight a subtle + problem there even if the target rate limit is met in the end. One + possible cause of schedule lage is insufficient pgbench threads to + handle all of the clients. typo lage above. To improve that, consider reducing the + number of clients, increasing the number of threads in pgbench, or + running pgbench on a separate host. Another possibility is that the + database is not keeping up with the load at some point. When that + happens, you will have to reduce the expected transaction rate to + lower schedule lag. +/para Thanks -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Hello Greg, Thanks for the improvement! I have a small reservation about finish/end time schedule in the second paragraph, or maybe there is something that I do not understand. There is no schedule for finishing anything, only start times are scheduled, so I wish the text could avoid suggesting that finish time are scheduled. The rate is targeted by starting transactions along a Poisson-distributed schedule time line. The expected finish time schedule - start time schedule moves forward based on when the client first started, not when the previous transaction ended. That approach means that when transactions go past their original scheduled end time, it is possible for later ones to catch up again. - That approach means that long transactions can result in later transactions to be late with respect to the schedule, while short transactions makes it possible for late ones to catch up again. Would you be ok with that? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Hello Alvaro, Thanks. I didn't look at the code, but while trying to read the docs: +para + High rate limit schedule lag values, that is values not small with + respect to the actual transaction latency, indicate that something is + amiss in the throttling process. I couldn't really parse the above. Of the first six words, which one is a verb? None. High values for the time lag measured with respect to the rate limit schedule. Is there a noun that needs to be plural? Is there a word that shouldn't be there? I do not think so. ... Oh, I think it makes sense if I assume that rate limit schedule lag is a single concept .. but if so, that phrase seems too many words for it. (So when the RLSL values are high, this indicates a problem. Is that what the above means?) Yep! Also, it took me a while to understand what values not small means. I think there must be a way to phrase this that's easier to understand. That's what we are trying to do, but we still need to be precise. With less words it seems more understandable, but previous versions suggested that the meaning with ambiguous, that is people put their own intuitive definition of lag, which resulted in surprises at the measures and cumulative behavior. The alternative was either to change what is measured, but I insisted that this measure is the useful one, or to try to reduce the ambiguity of the documentation, the result of efforts by Greg myself your helping to debug:-) High lag can highlight a subtle + problem there even if the target rate limit is met in the end. I'm fine with that, if it is clear from the context that the lag we're talking about is the one defined on the preceeding paragraph. Greg what do you think? + One possible cause of schedule lage is insufficient pgbench threads to handle all of the clients. typo lage above. Indeed. To improve that, consider reducing the + number of clients, increasing the number of threads in pgbench, or + running pgbench on a separate host. Another possibility is that the + database is not keeping up with the load at some point. When that + happens, you will have to reduce the expected transaction rate to + lower schedule lag. + /para Thanks for your help! -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
Greg Stark said: So the more I look at this patch the fewer things I want to change in it. I've several times thought I should make an improvement and then realized I was really just making unnecessary tweaks that didn't really make much difference. It's almost as though we actually thought about these things when writing the patch... I looked into what it would take to get the ordinality added on the original scan and it would have to go so deep that it doesn't really seem worthwhile. A design goal was that no SRF implementation should be affected by the change, since there are dozens of C-language SRFs in contrib and third- party modules. The existence of materialize mode prevents us from changing the structure of the tuplestore, since we're not the ones allocating it. The only other approach that seemed possible was to have the tuplestore code itself add an ordinality, which it would have to do unconditionally since for materialize-mode SRFs it would have no way to know if one was required. Doing it in FunctionScan when projecting out tuples seemed much, much cleaner. I do find the logic and variable names a bit confusing so I'm tempted to add some comments based on my initial confusion. And I'm tempted to add an ordinalityAttNum field to the executor node so we don't need to make these odd scanslot means this unless we have ordinality in which case it means that and funcslot means this logic. That has the side benefit that if the executor node ever wants to add more attributes it won't have this assumption that the last column is the ordinality baked in. I think it'll make the code a bit clearer. I admit the (one single) use of checking func_slot for nullity rather than checking the funcordinality flag is a micro-optimization (choosing to fetch the value we know we will need rather than a value which has no other purpose). I thought the comments there were sufficient; perhaps you could indicate what isn't clear? Having the ordinality be the last column is required by spec - I'm sure we could think of pg-specific extensions that would change that, but why complicate the code now? Also, I really think the test cases are excessive. They're mostly repeatedly testing the same functionality over and over in cases that are superficially different but I'm fairly certain just invoke the same codepaths. This will just be an annoyance if we make later changes that require adjusting the output. The majority of the tests are adding an ordinality version to existing test cases that test a number of combinations of language, SETOF, and base vs. composite type. These do exercise various different code paths in expandRTE and thereabouts. One thing I did do is dike out and replace the entire existing test sequence that was commented as testing ExecReScanFunctionScan, because many of the tests in it did not in fact invoke any rescans and probably hadn't done since 7.4. Notably the test that covers the view defintiion appears in pg_views scares me. We bash around the formatting rules for view definition outputs pretty regularly. On the other hand it is nice to have regression tests that make sure these cases are covered. There's been more than one bug in the past caused by omitting updating these functions. I'm leaning towards leaving it in but in the long run we probably need a better solution for this. There are a dozen of those kinds of tests scattered through the regression tests (though many use pg_get_viewdef directly rather than pg_views). While it might be worth centralizing them somewhere, that's really a separate issue from this patch, since it also affects aggregates.sql, window.sql, and with.sql. Oh, and I'm definitely leaning towards naming the column ordinality. Given we name columns things like generate_series and sum etc there doesn't seem to be good reason to avoid doing that here. I've already set out why I object to this. All in all though I feel like I'm looking for trouble. It's not a very complex patch and is definitely basically correct. Who should get credit for it? David? Andrew? And who reviewed it? Dean? Anyone else? It was a joint project between David and myself. Credit to Dean for the thorough review. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
On 07/22/2013 12:49 PM, Greg Stark wrote: On Mon, Jul 22, 2013 at 12:50 PM, Peter Eisentraut pete...@gmx.net wrote: I think part of the problem is that we call strcoll for each comparison, instead of doing strxfrm once for each datum and then just strcmp for each comparison. That is effectively equivalent to what the proposal implements. Fwiw I used to be a big proponent of using strxfrm. But upon further analysis I realized it was a real difficult tradeoff. strxrfm saves potentially a lot of cpu cost but at the expense of expanding the size of the sort key. If the sort spills to disk or even if it's just memory bandwidth limited it might actually be slower than doing the additional cpu work of calling strcoll. It's hard to see how to decide in advance which way will be faster. I suspect strxfrm is still the better bet, especially for complex large character set based locales like Chinese. strcoll might still win by a large margin on simple mostly-ascii character sets. Perhaps we need a bit of performance testing to prove the point. Maybe the behaviour should be locale-dependent. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Very minor update with V19 here, to reflect Alvaro's comments. The tricky part now reads like this: High rate limit schedule lag values, that is lag values that are large compared to the actual transaction latency, indicate that something is amiss in the throttling process. High schedule lag can highlight a subtle problem there even if the target rate limit is met in the end. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c new file mode 100644 index 2ad8f0b..4e6b608 *** a/contrib/pgbench/pgbench.c --- b/contrib/pgbench/pgbench.c *** int unlogged_tables = 0; *** 137,142 --- 137,148 doublesample_rate = 0.0; /* + * When threads are throttled to a given rate limit, this is the target delay + * to reach that rate in usec. 0 is the default and means no throttling. + */ + int64 throttle_delay = 0; + + /* * tablespace selection */ char *tablespace = NULL; *** typedef struct *** 202,212 --- 208,220 int listen; /* 0 indicates that an async query has been * sent */ int sleeping; /* 1 indicates that the client is napping */ + boolthrottling; /* whether nap is for throttling */ int64 until; /* napping until (usec) */ Variable *variables; /* array of variable definitions */ int nvariables; instr_time txn_begin; /* used for measuring transaction latencies */ instr_time stmt_begin; /* used for measuring statement latencies */ + boolis_throttled; /* whether transaction throttling is done */ int use_file; /* index in sql_files for this client */ boolprepared[MAX_FILES]; } CState; *** typedef struct *** 224,229 --- 232,240 instr_time *exec_elapsed; /* time spent executing cmds (per Command) */ int*exec_count; /* number of cmd executions (per Command) */ unsigned short random_state[3]; /* separate randomness for each thread */ + int64 throttle_trigger; /* previous/next throttling (us) */ + int64 throttle_lag; /* total transaction lag behind throttling */ + int64 throttle_lag_max; /* max transaction lag */ } TState; #define INVALID_THREAD((pthread_t) 0) *** typedef struct *** 232,237 --- 243,250 { instr_time conn_time; int xacts; + int64 throttle_lag; + int64 throttle_lag_max; } TResult; /* *** usage(void) *** 356,361 --- 369,375 -N, --skip-some-updates skip updates of pgbench_tellers and pgbench_branches\n -P, --progress=NUM show thread progress report every NUM seconds\n -r, --report-latencies report average latency per command\n +-R, --rate SPEC target rate in transactions per second\n -s, --scale=NUM report this scale factor in output\n -S, --select-onlyperform SELECT-only transactions\n -t, --transactions number of transactions each client runs *** doCustom(TState *thread, CState *st, ins *** 898,914 { PGresult *res; Command **commands; top: commands = sql_files[st-use_file]; if (st-sleeping) { /* are we sleeping? */ instr_time now; INSTR_TIME_SET_CURRENT(now); ! if (st-until = INSTR_TIME_GET_MICROSEC(now)) st-sleeping = 0; /* Done sleeping, go ahead with next command */ else return true;/* Still sleeping, nothing to do here */ } --- 912,973 { PGresult *res; Command **commands; + booltrans_needs_throttle = false; top: commands = sql_files[st-use_file]; + /* +* Handle throttling once per transaction by sleeping. It is simpler +* to do this here rather than at the end, because so much complicated +* logic happens below when statements finish. +*/ + if (throttle_delay ! st-is_throttled) + { + /* +* Use inverse transform sampling to randomly generate a delay, such +* that the
Re: [HACKERS] changeset generation v5-01 - Patches git tree
On Fri, Jun 14, 2013 at 6:51 PM, Andres Freund and...@2ndquadrant.com wrote: The git tree is at: git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4 http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4 On 2013-06-15 00:48:17 +0200, Andres Freund wrote: Overview of the attached patches: 0001: indirect toast tuples; required but submitted independently 0002: functions for testing; not required, 0003: (tablespace, filenode) syscache; required 0004: RelationMapFilenodeToOid: required, simple 0005: pg_relation_by_filenode() function; not required but useful 0006: Introduce InvalidCommandId: required, simple 0007: Adjust Satisfies* interface: required, mechanical, 0008: Allow walsender to attach to a database: required, needs review 0009: New GetOldestXmin() parameter; required, pretty boring 0010: Log xl_running_xact regularly in the bgwriter: required 0011: make fsync_fname() public; required, needs to be in a different file 0012: Relcache support for an Relation's primary key: required 0013: Actual changeset extraction; required 0014: Output plugin demo; not required (except for testing) but useful 0015: Add pg_receivellog program: not required but useful 0016: Add test_logical_decoding extension; not required, but contains the tests for the feature. Uses 0014 0017: Snapshot building docs; not required I've now also committed patch #7 from this series. My earlier commit fulfilled the needs of patches #3, #4, and #5; and somewhat longer ago I committed #1. I am not entirely convinced of the necessity or desirability of patch #6, but as of now I haven't studied the issues closely. Patch #2 does not seem useful in isolation; it adds new regression-testing stuff but doesn't use it anywhere. I doubt that any of the remaining patches (#8-#17) can be applied separately without understanding the shape of the whole patch set, so I think I, or someone else, will need to set aside more time for detailed review before proceeding further with this patch set. I suggest that we close out the CommitFest entry for this patch set one way or another, as there is no way we're going to get the whole thing done under the auspices of CF1. I'll try to find some more time to spend on this relatively soon, but I think this is about as far as I can take this today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
On Mon, Jul 22, 2013 at 01:49:39PM -0400, Greg Smith wrote: Very minor update with V19 here, to reflect Alvaro's comments. The tricky part now reads like this: High rate limit schedule lag values, High values of the rate limit schedule lag measurement? that is lag values that are large compared to the actual transaction latency, indicate that something is amiss in the throttling process. High schedule lag can highlight a subtle problem there even if the target rate limit is met in the end. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign Tables as Partitions
On Fri, Jul 19, 2013 at 1:46 PM, David Fetter da...@fetter.org wrote: This functionality was actually present in the original submission for foreign tables. I ripped it out before commit because I didn't think all of the interactions with other commands had been adequately considered. But I think they did consider it to some degree, which this patch does not. A ref to the patch as submitted patch as applied would help a lot :) The patch as committed is 0d692a0dc9f0e532c67c577187fe5d7d323cb95b. I suspect you can find the patch as submitted by going and figuring out which CommitFest that was part of and working it out from there. Were there any particular things you managed to break with the patch as submitted? Places you thought would be soft but didn't poke at? I think there was *some* discussion of this at the time. The basic issue was to make sure all the commands that could encounter foreign tables with this change but not without it behaved sanely. This would include all variants of ALTER TABLE as well as ANALYZE and any other utility commands that might view it as within their remit to recurse to child tables. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Expression indexes and dependecies
Hello, While doing some tests, I observed that expression indexes can malfunction if the underlying expression changes. For example, say I define a function foo() as: CREATE OR REPLACE FUNCTION foo(a integer) RETURNS integer AS $$ BEGIN RETURN $1 + 1; END; $$ LANGUAGE plpgsql IMMUTABLE; I then create a table, an expression index on the table and insert a few rows: CREATE TABLE test (a int, b char(20)); CREATE UNIQUE INDEX testindx ON test(foo(a)); INSERT INTO test VALUES (generate_series(1,1), 'bar'); A query such as following would return result using the expression index: SET enable_seqscan TO off; SELECT * FROM test WHERE foo(a) = 100; It will return row with a = 99 since foo() is defined to return (a + 1) If I now REPLACE the function definition with something else, say to return (a + 2): CREATE OR REPLACE FUNCTION foo(a integer) RETURNS integer AS $$ BEGIN RETURN $1 + 2; END; $$ LANGUAGE plpgsql IMMUTABLE; I get no error/warnings, but the index and the new function definition are now out of sync. So above query will still return the same result, though the row with (a = 99) no longer satisfies the current definition of function foo(). Perhaps this is a known behaviour/limitation, but I could not find that in the documentation. But I wonder if it makes sense to check for dependencies during function alteration and complain. Or there are other reasons why we can't do that and its a much larger problem than what I'm imagining ? Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
The v3 patch applies perfectly here now. Attached is a spreadsheet with test results from two platforms, a Mac laptop and a Linux server. I used systems with high disk speed because that seemed like a worst case for this improvement. The actual improvement for shrinking WAL should be even better on a system with slower disks. There are enough problems with the hundred tiny fields results that I think this not quite ready for commit yet. More comments on that below. This seems close though, close enough that I am planning to follow up to see if the slow disk results are better. Reviewing the wal-update-testsuite.sh test program, I think the only case missing that would be useful to add is ten tiny fields, one changed. I think that one is interesting to highlight because it's what benchmark programs like pgbench do very often. The GUC added by the program looks like this: postgres=# show wal_update_compression_ratio ; wal_update_compression_ratio -- 25 Is possible to add a setting here that disables the feature altogether? That always makes it easier to consider a commit, knowing people can roll back the change if it makes performance worse. That would make performance testing easier too. wal-update-testsuit.sh takes as long as 13 minutes, it's long enough that I'd like the easier to automate comparison GUC disabling adds. If that's not practical to do given the intrusiveness of the code, it's not really necessary. I haven't looked at the change enough to be sure how hard this is. On the Mac, the only case that seems to have a slowdown now is hundred tiny fields, half nulled. It would be nice to understand just what is going on with that one. I got some ugly results in two short fields, no change too, along with a couple of other weird results, but I think those were testing procedure issues that can be ignored. The pgbench throttle work I did recently highlights that I can't really make a Mac quiet/consistent for benchmarking very well. Note that I ran all of the Mac tests with assertions on, to try and catch platform specific bugs. The Linux ones used the default build parameters. On Linux hundred tiny fields, half nulled was also by far the worst performing one, with a 30% increase in duration despite the 14% drop in WAL. Exactly what's going on there really needs to be investigated before this seems safe to commit. All of the hundred tiny fields cases seem pretty bad on Linux, with the rest of them running about a 11% duration increase. This doesn't seem ready to commit for this CF, but the number of problem cases is getting pretty small now. Now that I've gotten more familiar with the test programs and the feature, I can run more performance tests on this at any time really. If updates addressing the trouble cases are ready from Amit or Hari before the next CF, send them out and I can look at them without waiting until that one starts. This is a very promising looking performance feature. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com WAL-lz-v3.xls Description: MS-Excel spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: template-ify (binary) extensions
Markus Wanner mar...@bluegap.ch writes: - per-installation (not even per-cluster) DSO availability Not sure what the issue is, here, but I agree that should be possible. For any extension where the new package version is shipping the same .so file name, you can only have one module on the whole system for a given PostgreSQL major version. The extensions are per-database, the DSO modules are per major-version (not even per-cluster). That's the trade-off we currently need to make to be able to ship with the current security protections we're talking about. Requiring the sysadmin to install the extensions there, too, seems justified to me. Sounds like good advice, anyways. Yes, installing the same extensions on any standby as the ones installed on the primary should be *required*. My idea to enforce that would be with using Event Triggers on the standby, but how to be able to run them when when the standby replay currently isn't opening a session is quite a problem. The other way to address that is of course PL/C. lo_import() won't write a file for LOAD to load. An untrusted PL (or any other extension allowing the superuser to do that) is currently required to do that. Ok, here's the full worked out example: ~# select lo_import('/path/to/local/on/the/client/adminpack.so'); lo_import --- 82153 (1 row) ~# select lo_export(82153, '/tmp/foo.so'); lo_export --- 1 (1 row) ~# LOAD '/tmp/foo.so'; LOAD ~# CREATE FUNCTION pg_catalog.pg_file_write(text, text, bool) RETURNS bigint AS '/tmp/foo.so', 'pg_file_write' LANGUAGE C VOLATILE STRICT; CREATE FUNCTION Or to put it another way: Trusted PLs exist for a good reason. And some people just value security a lot and want that separation of roles. Yeah, security is important, but it needs to be addressing real threats to have any value whatsoever. We'not talking about adding new big holes as extensions containing modules are using LANGUAGE C in their script and thus are already restricted to superuser. That part would not change. As someone mentioned previously, $PGDATA may well be mounted noexec, so that seems to be a bad choice. I don't understand. You want good default security policy in place, and you're saying that using PGDATA allows for a really easy policy to be implemented by people who don't want the feature. It seems to me to be exactly what you want, why would that be a bad choice then? - upgrading an extension at the OS level Once you've done that, any new backend will load the newer module (DSO file), so you have to be real quick if installing an hot fix in production and the SQL definition must be changed to match the new module version… I agree, that's a problem. Alternatively, we could solve that problem the other way around: Rather than template-ify the DSO, we could instead turn the objects created by the SQL scripts into something that's more linked to the script. Something that would reload as soon as the file on disk changes. I think that's the wrong way to do it, because you generally want more control over those two steps (preparing the template then instanciating it) and you're proposing to completely lose the control and have them be a single step instead. Use case: shipping a new plproxy bunch of functions to 256 nodes. You want to push the templates everywhere then just do the extension update as fast as possible so that it appears to be about 'in sync'. Pro-tip: you can use a plproxy function that runs the alter extension update step using RUN ON ALL facility. (Note how this would make out-of-line extensions a lot closer to the in-line variant your recent patch adds? With the dependency between template and instantiation?) Those dependencies are almost gone now except for being able to deal with DROP TEMPLATE FOR EXTENSION and guaranteeing we can actually pg_restore our pg_dump script. I've been changing the implementation to be what you proposed because I think it's making more sense (thanks!), and regression tests are reflecting that. My github branch is up-to-date with the last changes and I'm soon to send an updated patch that will be really as ready for commit as possible without a commiter review. An attacker having access to a libpq connection with superuser rights cannot currently run arbitrary native code. He can try a DOS by I think I showed a simple way to do that above in this very email. Well, an extension can certainly perform something akin to pg_dlopen(). And thus load pretty much any code it wants. However, I don't think an extension can hook into CREATE EXTENSION, yet. When you call a function, its MODULE_PATHNAME is getting loaded by the backend. You would need to ship a dumb DSO module file so that this works and arrange for the real DSO module file to be loaded via a hook or an event trigger or something else (e.g. local_preload_libraries). I don't see a way to be
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On 2013-07-19 10:40:01 +0530, Hari Babu wrote: On Friday, July 19, 2013 4:11 AM Greg Smith wrote: On 7/9/13 12:09 AM, Amit Kapila wrote: I think the first thing to verify is whether the results posted can be validated in some other environment setup by another person. The testcase used is posted at below link: http://www.postgresql.org/message-id/51366323.8070...@vmware.com That seems easy enough to do here, Heikki's test script is excellent. The latest patch Hari posted on July 2 has one hunk that doesn't apply anymore now. The Head code change from Heikki is correct. During the patch rebase to latest PG LZ optimization code, the above code change is missed. Apart from the above changed some more changes are done in the patch, those are. FWIW I don't like this approach very much: * I'd be very surprised if this doesn't make WAL replay of update heavy workloads slower by at least factor of 2. * It makes data recovery from WAL *noticeably* harder since data corruption now is carried forwards and you need the old data to decode new data * It makes changeset extraction either more expensive or it would have to be disabled there. I think my primary issue is that philosophically/architecturally I am of the opinion that a wal record should make sense of it's own without depending on heap data. And this patch looses that. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello all I very often use a little bit adjusted psql system queries to detection TOP N sized tables. I am thinking so it can be useful for all users I propose a few new commands \dts [N|size] ... show N largest tables | show tables larger than size ordered by size \dis [N|size] ... show N largest indexes | show indexes larger than size ordered by size \dtst [N|size] ... show N largest total size | show tables where total size is larger than size ordered by total size \dtr [N] ... show N largest tables (ordered by rows) example: \dts 10 --- top 10 tables ordered by size \dts 10MB -- tables larger than 10MB ordered by size Schema |Name | Type |Owner| Size +-+---+-+---+- public | eshop_users | table | eshop_owner | 16 kB public | zamestnanci | table | eshop_owner | 16 kB What do you think about this proposal? Comments, notes? I think our \d commands are in inscrutable morass of indecipherable gobbledygook as it is, and this is only one more step down the road to complete insanity. :-( Rather than just continuing to add more imposible-to-remember syntax, we really need a better design here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On 7/22/13 12:58 PM, Amit kapila wrote: As per discussion, updated patch contains following changes: 1. Regression tests for Alter System are removed 2. Parsed the auto file automatically after parsing postgresql.conf 3. Removed addition of include directive in postgresql.conf 4. Removed error handling for parsing errors These changes have shrunk the diff down to 1411 lines of code. I'd like to identify which committer might take this on at this point. In a few respects this is Ready For Committer now, because the committer who takes this on is going to get a strong vote on how to resolve most of the remaining fundamental issues: -Is this the point to finally commit to the config directory approach? -If this does set the config directory usage precedent, is the name used for that appropriate? Whoever suggested the change from conf.d to config made an error IMHO. Every example I've found of other projects doing this style of config refactoring picked either conf.d or a unique, no two are alike name. I'd rather not see Postgres add yet another unique one. (I think the 'postgresql' part of postgresql.auto.conf as the name of the file is redundant too--what else would be in the Postgres config directory but postgresql files?--but that's not a major issue) This could definitely use a round of committer level review of how the GUC handling is being done now too. That part of the code seems to have settled, and things like using the new validate_conf_option could be committed even with other parts still being discussed. Exactly how to best break this out into useful commits is another decision that really needs some input from the potential committer though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
2013/7/22 Robert Haas robertmh...@gmail.com: On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello all I very often use a little bit adjusted psql system queries to detection TOP N sized tables. I am thinking so it can be useful for all users I propose a few new commands \dts [N|size] ... show N largest tables | show tables larger than size ordered by size \dis [N|size] ... show N largest indexes | show indexes larger than size ordered by size \dtst [N|size] ... show N largest total size | show tables where total size is larger than size ordered by total size \dtr [N] ... show N largest tables (ordered by rows) example: \dts 10 --- top 10 tables ordered by size \dts 10MB -- tables larger than 10MB ordered by size Schema |Name | Type |Owner| Size +-+---+-+---+- public | eshop_users | table | eshop_owner | 16 kB public | zamestnanci | table | eshop_owner | 16 kB What do you think about this proposal? Comments, notes? I think our \d commands are in inscrutable morass of indecipherable gobbledygook as it is, and this is only one more step down the road to complete insanity. :-( these commands are targeted to advanced users, and four chars should not be a problem. It has a same advantage and disadvantage as vim UI. it is fast for advanced users, and strange for beginners :(. Rather than just continuing to add more imposible-to-remember syntax, we really need a better design here. do you have any tip? Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
On Mon, Jul 22, 2013 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello all I very often use a little bit adjusted psql system queries to detection TOP N sized tables. I am thinking so it can be useful for all users I propose a few new commands \dts [N|size] ... show N largest tables | show tables larger than size ordered by size \dis [N|size] ... show N largest indexes | show indexes larger than size ordered by size \dtst [N|size] ... show N largest total size | show tables where total size is larger than size ordered by total size \dtr [N] ... show N largest tables (ordered by rows) example: \dts 10 --- top 10 tables ordered by size \dts 10MB -- tables larger than 10MB ordered by size Schema |Name | Type |Owner| Size +-+---+-+---+- public | eshop_users | table | eshop_owner | 16 kB public | zamestnanci | table | eshop_owner | 16 kB What do you think about this proposal? Comments, notes? I think our \d commands are in inscrutable morass of indecipherable gobbledygook as it is, and this is only one more step down the road to complete insanity. :-( Rather than just continuing to add more imposible-to-remember syntax, we really need a better design here. These type of administrative tasks should be implemented as stored procedures or functions, not enhancements to psql. That way non-psql clients can leverage them and you can integrate them to other queries. Another advantage is that they can be implemented as extension. SELECT * from top5(); Is a little more of a pain to type. But with my psql-fu I can cut that down with \i if I'm so inclined. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Allow background workers to be started dynamically.
On Sat, Jul 20, 2013 at 12:39 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I don't have a problem with getting rid of those, it's easy enough to register them inside the worker and it's safe since we start with blocked signals. I seem to remember some discussion about why they were added but I can't find a reference anymore. Alvaro, do you remember? I left them there because it was easy; but they were absolutely necessary only until we decided that we would start the worker's main function with signals blocked. I don't think there's any serious reason not to remove them now. All right, done. FWIW, I think starting the worker's main with signals blocked was definitely the right decision. I think we have consensus to back-patch the other API changes as well. I'll work up a patch for that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
On Mon, Jul 22, 2013 at 3:13 PM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Jul 22, 2013 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello all I very often use a little bit adjusted psql system queries to detection TOP N sized tables. I am thinking so it can be useful for all users I propose a few new commands \dts [N|size] ... show N largest tables | show tables larger than size ordered by size \dis [N|size] ... show N largest indexes | show indexes larger than size ordered by size \dtst [N|size] ... show N largest total size | show tables where total size is larger than size ordered by total size \dtr [N] ... show N largest tables (ordered by rows) example: \dts 10 --- top 10 tables ordered by size \dts 10MB -- tables larger than 10MB ordered by size Schema |Name | Type |Owner| Size +-+---+-+---+- public | eshop_users | table | eshop_owner | 16 kB public | zamestnanci | table | eshop_owner | 16 kB What do you think about this proposal? Comments, notes? I think our \d commands are in inscrutable morass of indecipherable gobbledygook as it is, and this is only one more step down the road to complete insanity. :-( Rather than just continuing to add more imposible-to-remember syntax, we really need a better design here. These type of administrative tasks should be implemented as stored procedures or functions, not enhancements to psql. That way non-psql clients can leverage them and you can integrate them to other queries. Another advantage is that they can be implemented as extension. SELECT * from top5(); Is a little more of a pain to type. But with my psql-fu I can cut that down with \i if I'm so inclined. Yeah, I think that's a very reasonable approach to this kind of problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
Greg Smith escribió: On 7/22/13 12:58 PM, Amit kapila wrote: As per discussion, updated patch contains following changes: 1. Regression tests for Alter System are removed 2. Parsed the auto file automatically after parsing postgresql.conf 3. Removed addition of include directive in postgresql.conf 4. Removed error handling for parsing errors These changes have shrunk the diff down to 1411 lines of code. Nice. I'd like to identify which committer might take this on at this point. I'm considering take a stab at it soonish, FWIW, if no other committer steps up. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
Tom Lane said: I haven't read this patch, but that comment scares the heck out of me. Even if the patch isn't broken today, it will be tomorrow, if it's making random changes like that in data structure semantics. Also, if you're confused, so will be everybody else who has to deal with the code later --- so I don't think fixing the comments and variable names is optional. I must admit to finding all of this criticism of unread code a bit bizarre. There are no random changes in data structure semantics. All that happens is that FunctionScan, in the ordinality case, has two tupdescs to deal with: the one for the function return, and the one for FunctionScan's own scan type. Likewise two slots, one of each type. Absolutely no liberties are taken with any of the semantics. However, since the scan structure already has a place for the scan result slot, the extra slot that we allocate for this case is the function result, func_slot, while in the non-ordinality case, we use the scan result slot for the function result too. [Greg: we just found a bug (actually two, one code + one docs); I think David just posted the fixed version. And ironically, the bug in the code has nothing to do with all of this discussion.] Here, to hopefully end the issue, is the new version of FunctionNext, which is the core of the whole patch (everything else is just setup for this). If anyone wants to point out exactly what is unclear, or which changes any semantics improperly, then please do indicate exactly what you are referring to. /* * FunctionNext * * This is a workhorse for ExecFunctionScan * */ static TupleTableSlot * FunctionNext(FunctionScanState *node) { EState *estate; ScanDirection direction; Tuplestorestate *tuplestorestate; TupleTableSlot *scanslot; TupleTableSlot *funcslot; if (node-func_slot) { /* * ORDINALITY case: FUNCSLOT is the function return, * SCANSLOT the scan result */ funcslot = node-func_slot; scanslot = node-ss.ss_ScanTupleSlot; } else { funcslot = node-ss.ss_ScanTupleSlot; scanslot = NULL; } /* * get information from the estate and scan state */ estate = node-ss.ps.state; direction = estate-es_direction; tuplestorestate = node-tuplestorestate; /* * If first time through, read all tuples from function and put them in a * tuplestore. Subsequent calls just fetch tuples from tuplestore. */ if (tuplestorestate == NULL) { node-tuplestorestate = tuplestorestate = ExecMakeTableFunctionResult(node-funcexpr, node-ss.ps.ps_ExprContext, node-func_tupdesc, node-eflags EXEC_FLAG_BACKWARD); } /* * Get the next tuple from tuplestore. Return NULL if no more tuples. */ (void) tuplestore_gettupleslot(tuplestorestate, ScanDirectionIsForward(direction), false, funcslot); if (!scanslot) return funcslot; /* * we're doing ordinality, so we copy the values from the function return * slot to the (distinct) scan slot. We can do this because the lifetimes * of the values in each slot are the same; until we reset the scan or * fetch the next tuple, both will be valid. */ ExecClearTuple(scanslot); if (ScanDirectionIsForward(direction)) node-ordinal++; else node-ordinal--; if (!TupIsNull(funcslot)) { int natts = funcslot-tts_tupleDescriptor-natts; int i; slot_getallattrs(funcslot); for (i = 0; i natts; ++i) { scanslot-tts_values[i] = funcslot-tts_values[i]; scanslot-tts_isnull[i] = funcslot-tts_isnull[i]; } scanslot-tts_values[natts] = Int64GetDatumFast(node-ordinal); scanslot-tts_isnull[natts] = false; ExecStoreVirtualTuple(scanslot); } return scanslot; } -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: template-ify (binary) extensions
Dimitri, On 07/22/2013 08:44 PM, Dimitri Fontaine wrote: That's the trade-off we currently need to make to be able to ship with the current security protections we're talking about. Anything wrong with shipping postgis-1.5.so and postgis-2.0.so, as I we for Debian? Ok, here's the full worked out example: ~# select lo_import('/path/to/local/on/the/client/adminpack.so'); lo_import --- 82153 (1 row) ~# select lo_export(82153, '/tmp/foo.so'); lo_export --- 1 (1 row) ~# LOAD '/tmp/foo.so'; LOAD ~# CREATE FUNCTION pg_catalog.pg_file_write(text, text, bool) RETURNS bigint AS '/tmp/foo.so', 'pg_file_write' LANGUAGE C VOLATILE STRICT; CREATE FUNCTION Good point. Note that these lo_import() and lo_export() methods - unlike the client counterparts - read and write to the server's file system. So, your example is incomplete in that SELECT lo_import() doesn't load data from the client into the database. But there are many other ways to do that. None the less, your point holds: lo_export() is a way for a superuser to write arbitrary files on the server's file-system via libpq. Or to put it another way: Trusted PLs exist for a good reason. And some people just value security a lot and want that separation of roles. Yeah, security is important, but it needs to be addressing real threats to have any value whatsoever. We'not talking about adding new big holes as extensions containing modules are using LANGUAGE C in their script and thus are already restricted to superuser. That part would not change. I agree, it's not a big hole. But with security, every additional layer counts. I think the next step is to clarify whether or not we want to provide that guarantee. As someone mentioned previously, $PGDATA may well be mounted noexec, so that seems to be a bad choice. I don't understand. You want good default security policy in place, and you're saying that using PGDATA allows for a really easy policy to be implemented by people who don't want the feature. It seems to me to be exactly what you want, why would that be a bad choice then? I'm just saying that mounting $PGDATA with noexec makes $PGDATA unusable to load libraries from; i.e. pg_ldopen() just out right fails. And noexec is another protective layer. Building a solution that requires $PGDATA to be mounted with exec permissions means that very solution won't work with that protective layer. Which is a bad thing. Alternatively, we could solve that problem the other way around: Rather than template-ify the DSO, we could instead turn the objects created by the SQL scripts into something that's more linked to the script. Something that would reload as soon as the file on disk changes. I think that's the wrong way to do it, because you generally want more control over those two steps (preparing the template then instanciating it) and you're proposing to completely lose the control and have them be a single step instead. Doesn't versioning take care of that? I.e. if you've version 1.0 created in your databases and then install 1.1 system wide, that shouldn't immediately instantiate. Use case: shipping a new plproxy bunch of functions to 256 nodes. You want to push the templates everywhere then just do the extension update as fast as possible so that it appears to be about 'in sync'. Pro-tip: you can use a plproxy function that runs the alter extension update step using RUN ON ALL facility. Contrary use case: you actually *want* to *replace* the created version installed, as it's a plain security fix that's backwards compatible. A plain 'apt-get upgrade' would do the job. Anyways, I think this is bike-shedding: The question of what mental model fits best doesn't matter too much, here. I'm concerned about being consistent with whatever mental model we choose and about an implementation that would work with whatever security standards we want to adhere to. (Note how this would make out-of-line extensions a lot closer to the in-line variant your recent patch adds? With the dependency between template and instantiation?) Those dependencies are almost gone now except for being able to deal with DROP TEMPLATE FOR EXTENSION and guaranteeing we can actually pg_restore our pg_dump script. I appreciate the bug fixes. However, there still is at least one dependency. And that might be a good thing. Depending on what mental model you follow. I've been changing the implementation to be what you proposed because I think it's making more sense (thanks!), and regression tests are reflecting that. My github branch is up-to-date with the last changes and I'm soon to send an updated patch that will be really as ready for commit as possible without a commiter review. Good, thanks. An attacker having access to a libpq connection with superuser rights cannot currently run arbitrary native code. He can try a DOS by I think
Re: [HACKERS] proposal - psql - show longest tables
2013/7/22 Merlin Moncure mmonc...@gmail.com: On Mon, Jul 22, 2013 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jul 21, 2013 at 12:47 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello all I very often use a little bit adjusted psql system queries to detection TOP N sized tables. I am thinking so it can be useful for all users I propose a few new commands \dts [N|size] ... show N largest tables | show tables larger than size ordered by size \dis [N|size] ... show N largest indexes | show indexes larger than size ordered by size \dtst [N|size] ... show N largest total size | show tables where total size is larger than size ordered by total size \dtr [N] ... show N largest tables (ordered by rows) example: \dts 10 --- top 10 tables ordered by size \dts 10MB -- tables larger than 10MB ordered by size Schema |Name | Type |Owner| Size +-+---+-+---+- public | eshop_users | table | eshop_owner | 16 kB public | zamestnanci | table | eshop_owner | 16 kB What do you think about this proposal? Comments, notes? I think our \d commands are in inscrutable morass of indecipherable gobbledygook as it is, and this is only one more step down the road to complete insanity. :-( Rather than just continuing to add more imposible-to-remember syntax, we really need a better design here. These type of administrative tasks should be implemented as stored procedures or functions, not enhancements to psql. That way non-psql clients can leverage them and you can integrate them to other queries. Another advantage is that they can be implemented as extension. SELECT * from top5(); Is not a problem for any advanced user write these queries. But it is hard use a parametrized query inside psql. Is a little more of a pain to type. But with my psql-fu I can cut that down with \i if I'm so inclined. you cannot use parameters - then I have to have prepared files like top10, top20, ... what is not too friendly Regards Pavel merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] RangeTblEntry.joinaliasvars representation change
In http://www.postgresql.org/message-id/cak_s-g3-fwveer1c0idvtz0745-7ryifi8whbzcnmsn+hwc...@mail.gmail.com it's pointed out that commit 2ffa740b was a few bricks shy of a load, because it failed to cope with the possibility of a joinaliasvars item containing an implicit coercion. That's not too hard to fix by adding a strip_implicit_coercions() call, but while testing it I realized that there's a second bug in the same place: the code also fails to cope with a Const arising from a dropped input-relation column. (The comment claiming that this can't happen is flat wrong, because we *have* passed the view query through AcquireRewriteLocks().) I fixed that too, and improved the comment in parsenodes.h that led me to overlook implicit coercions in the first place, as per the attached WIP patch. I then went looking for other places that might be assuming too much about what is in joinaliasvars lists, and found several pre-existing bugs :-(. The nastiest one is in flatten_join_alias_vars_mutator's code to expand a whole-row reference, which supposes that if it finds a Const item then that must represent a dropped column. That would be true in the parser or rewriter, but at this point in planning we have already done subquery pullup, which means we could find anything including a Const there. The code would thus mistake a pulled-up constant subquery output for a dropped column. An example in the regression database is explain verbose select j from (int4_tbl join (select q1 as f1, q2 as z, 42 as c from int8_tbl) ss using(f1)) j; QUERY PLAN --- Hash Join (cost=1.11..2.23 rows=5 width=16) Output: ROW(int8_tbl.q1, int8_tbl.q2) Hash Cond: (int8_tbl.q1 = int4_tbl.f1) - Seq Scan on public.int8_tbl (cost=0.00..1.05 rows=5 width=16) Output: int8_tbl.q1, int8_tbl.q2 - Hash (cost=1.05..1.05 rows=5 width=4) Output: int4_tbl.f1 - Seq Scan on public.int4_tbl (cost=0.00..1.05 rows=5 width=4) Output: int4_tbl.f1 (9 rows) The 42 has disappeared entirely from the ROW() construct :-(. This can be reproduced in all active branches. After some reflection I think that the best fix is to redefine AcquireRewriteLocks' processing of dropped columns so that it puts an actual null pointer, not a consed-up Const, into the joinaliasvars list item. This is reliably distinguishable from anything we might pull up from a subquery output, and it doesn't really lose information since the Const was completely phony anyway. (I think the original design envisioned having the Const carrying column datatype info, but we abandoned that idea upon realizing that the dropped column might be of a dropped type. The phony Const is currently always of type INT4.) This definitional change will not affect any rules-on-disk since the dropped-column substitution is only done on rule trees when they are loaded back into memory. A risk we'd be taking with this change is that any extension code that looks at post-rewriter joinaliasvars lists might be confused. However, I'm not sure why extension code would be looking at those. In any case, I can't see a different back-patchable change that would reduce such a risk; we have to change the representation *somehow* if we're going to distinguish these cases correctly. Thoughts? regards, tom lane diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 40b565a11d6ddf40b3cd8e692e625efe1f6dc3a7..3292278774f15851f3ad7ae000139278c3f94fd8 100644 *** a/src/backend/utils/adt/ruleutils.c --- b/src/backend/utils/adt/ruleutils.c *** typedef struct *** 235,240 --- 235,241 * child RTE's attno and rightattnos[i] is zero; and conversely for a * column of the right child. But for merged columns produced by JOIN * USING/NATURAL JOIN, both leftattnos[i] and rightattnos[i] are nonzero. + * Also, if the column has been dropped, both are zero. * * If it's a JOIN USING, usingNames holds the alias names selected for the * merged columns (these might be different from the original USING list, *** set_join_column_names(deparse_namespace *** 3053,3058 --- 3054,3066 char *colname = colinfo-colnames[i]; char *real_colname; + /* Ignore dropped column (only possible for non-merged column) */ + if (colinfo-leftattnos[i] == 0 colinfo-rightattnos[i] == 0) + { + Assert(colname == NULL); + continue; + } + /* Get the child column name */ if (colinfo-leftattnos[i] 0) real_colname = leftcolinfo-colnames[colinfo-leftattnos[i] - 1]; *** set_join_column_names(deparse_namespace *** 3061,3075 else { /* We're joining system columns --- use eref name */ ! real_colname = (char *) list_nth(rte-eref-colnames, i); ! } ! ! /* Ignore dropped
Re: [HACKERS] proposal - psql - show longest tables
On 07/22/2013 03:11 PM, Pavel Stehule wrote: 2013/7/22 Robert Haas robertmh...@gmail.com: Rather than just continuing to add more imposible-to-remember syntax, we really need a better design here. do you have any tip? I agree with Robert. My tip is this: when you're in a hole, the first thing to do is to stop digging. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Allow background workers to be started dynamically.
On Mon, Jul 22, 2013 at 3:16 PM, Robert Haas robertmh...@gmail.com wrote: I think we have consensus to back-patch the other API changes as well. I'll work up a patch for that. Pushed that as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC13 proposal - extend RETURNING syntax
I've noticed problem with UPDATE ... FROM statement. Fix in new version. Regards, Karol diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml index 90b9208..eba35f0 100644 --- a/doc/src/sgml/ref/update.sgml +++ b/doc/src/sgml/ref/update.sgml @@ -194,12 +194,27 @@ UPDATE [ ONLY ] replaceable class=PARAMETERtable_name/replaceable [ * ] [ termreplaceable class=PARAMETERoutput_expression/replaceable/term listitem para - An expression to be computed and returned by the commandUPDATE/ - command after each row is updated. The expression can use any - column names of the table named by replaceable class=PARAMETERtable_name/replaceable - or table(s) listed in literalFROM/. - Write literal*/ to return all columns. + An expression to be computed and returned by the + commandUPDATE/ command either before or after (prefixed with + literalBEFORE./literal and literalAFTER./literal, + respectively) each row is updated. The expression can use any + column names of the table named by replaceable + class=PARAMETERtable_name/replaceable or table(s) listed in + literalFROM/. Write literalAFTER.*/literal to return all + columns after the update. Write literalBEFORE.*/literal for all + columns before the update. Write literal*/literal to return all + columns after update and all triggers fired (these values are in table + after command). You may combine BEFORE, AFTER and raw columns in the + expression. /para + warningpara + Mixing table names or aliases named before or after with the + above will result in confusion and suffering. If you happen to + have a table called literalbefore/literal or + literalafter/literal, alias it to something else when using + RETURNING. + /para/warning + /listitem /varlistentry @@ -287,15 +302,16 @@ UPDATE weather SET temp_lo = temp_lo+1, temp_hi = temp_lo+15, prcp = DEFAULT /para para - Perform the same operation and return the updated entries: + Perform the same operation and return information on the changed entries: programlisting UPDATE weather SET temp_lo = temp_lo+1, temp_hi = temp_lo+15, prcp = DEFAULT WHERE city = 'San Francisco' AND date = '2003-07-03' - RETURNING temp_lo, temp_hi, prcp; + RETURNING temp_lo AS new_low, temp_hi AS new_high, BEFORE.temp_hi/BEFORE.temp_low AS old_ratio, AFTER.temp_hi/AFTER.temp_low AS new_ratio prcp; /programlisting /para + para Use the alternative column-list syntax to do the same update: programlisting diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index d86e9ad..fafd311 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -2335,7 +2335,7 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo) TupleTableSlot * ExecBRUpdateTriggers(EState *estate, EPQState *epqstate, ResultRelInfo *relinfo, - ItemPointer tupleid, TupleTableSlot *slot) + ItemPointer tupleid, TupleTableSlot *slot, TupleTableSlot **planSlot) { TriggerDesc *trigdesc = relinfo-ri_TrigDesc; HeapTuple slottuple = ExecMaterializeSlot(slot); @@ -2381,6 +2381,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate, if (newSlot != NULL) { slot = ExecFilterJunk(relinfo-ri_junkFilter, newSlot); + *planSlot = newSlot; slottuple = ExecMaterializeSlot(slot); newtuple = slottuple; } diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 15f5dcc..06ebaf3 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -603,7 +603,7 @@ ExecUpdate(ItemPointer tupleid, resultRelInfo-ri_TrigDesc-trig_update_before_row) { slot = ExecBRUpdateTriggers(estate, epqstate, resultRelInfo, - tupleid, slot); + tupleid, slot, planSlot); if (slot == NULL) /* do nothing */ return NULL; @@ -737,6 +737,7 @@ lreplace:; hufd.xmax); if (!TupIsNull(epqslot)) { + planSlot = epqslot; *tupleid = hufd.ctid; slot = ExecFilterJunk(resultRelInfo-ri_junkFilter, epqslot); tuple = ExecMaterializeSlot(slot); diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index a896d76..409b4d1 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -1928,6 +1928,7 @@ range_table_walker(List *rtable, { case RTE_RELATION: case RTE_CTE: + case RTE_BEFORE: /* nothing to do */ break; case RTE_SUBQUERY: @@ -2642,6 +2643,7 @@ range_table_mutator(List *rtable, { case RTE_RELATION: case RTE_CTE: + case RTE_BEFORE: /* we don't bother to copy eref, aliases, etc; OK? */ break; case RTE_SUBQUERY: diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 48cd9dc..79b03af 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2366,6 +2366,7 @@
Re: [HACKERS] small typo in src/backend/access/transam/xlog.c
On Mon, Jul 22, 2013 at 6:45 AM, didier did...@gmail.com wrote: Hi in void BootStrapXLOG(void) * to seed it other than the system clock value...) The upper half of the * uint64 value is just the tv_sec part, while the lower half is the XOR * of tv_sec and tv_usec. This is to ensure that we don't lose uniqueness * unnecessarily if uint64 is really only 32 bits wide. A person * knowing this encoding can determine the initialization time of the * installation, which could perhaps be useful sometimes. */ gettimeofday(tv, NULL); sysidentifier = ((uint64) tv.tv_sec) 32; sysidentifier |= (uint32) (tv.tv_sec | tv.tv_usec); should be sysidentifier |= (uint32) (tv.tv_sec ^ tv.tv_usec); And why is that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] small typo in src/backend/access/transam/xlog.c
On 2013-07-22 15:55:46 -0400, Robert Haas wrote: On Mon, Jul 22, 2013 at 6:45 AM, didier did...@gmail.com wrote: Hi in void BootStrapXLOG(void) * to seed it other than the system clock value...) The upper half of the * uint64 value is just the tv_sec part, while the lower half is the XOR * of tv_sec and tv_usec. This is to ensure that we don't lose uniqueness * unnecessarily if uint64 is really only 32 bits wide. A person * knowing this encoding can determine the initialization time of the * installation, which could perhaps be useful sometimes. */ gettimeofday(tv, NULL); sysidentifier = ((uint64) tv.tv_sec) 32; sysidentifier |= (uint32) (tv.tv_sec | tv.tv_usec); should be sysidentifier |= (uint32) (tv.tv_sec ^ tv.tv_usec); And why is that? The comment above tells: while the lower half is the XOR of tv_sec and tv_usec. I don't think it really matters. the bitwise OR has the tenency to collect too many set bits, but ... who cares? On the other hand, changing it is easy and shouldn't cause any problems. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enum-ify resource manager's xl_info values
On 7/22/13 7:21 AM, Andres Freund wrote: Would somebody object to making the rmgr's invo value #defines like: I'm suspicious of enums that are assigned specific values. Enums should stand by themselves, they shouldn't be a symbolic layer on top of some other numbering or bit-fiddling scheme. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new row-level lock error messages
Peter Eisentraut wrote: I would suggest that these changes be undone, except that the old SELECT FOR ... be replaced by a dynamic string that reverse-parses the LockingClause to provide the actual clause that was used. Here's a patch for this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 839ed9d..8efb94b 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -26,6 +26,7 @@ #include optimizer/prep.h #include optimizer/restrictinfo.h #include optimizer/var.h +#include parser/analyze.h #include rewrite/rewriteManip.h #include utils/lsyscache.h @@ -883,7 +884,10 @@ make_outerjoininfo(PlannerInfo *root, (jointype == JOIN_FULL bms_is_member(rc-rti, left_rels))) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(row-level locks cannot be applied to the nullable side of an outer join))); + /*-- + translator: %s is a SQL row locking clause such as FOR UPDATE */ + errmsg(%s cannot be applied to the nullable side of an outer join, + LCS_asString(rc-strength; } sjinfo-syn_lefthand = left_rels; diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 01e2fa3..9ff8050 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1107,7 +1107,11 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) if (parse-rowMarks) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(row-level locks are not allowed with UNION/INTERSECT/EXCEPT))); + /*-- + translator: %s is a SQL row locking clause such as FOR UPDATE */ + errmsg(%s is not allowed with UNION/INTERSECT/EXCEPT, + LCS_asString(((RowMarkClause *) + linitial(parse-rowMarks))-strength; /* * Calculate pathkeys that represent result ordering requirements diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 16ff234..39036fb 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -1221,7 +1221,11 @@ transformValuesClause(ParseState *pstate, SelectStmt *stmt) if (stmt-lockingClause) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(SELECT FOR UPDATE/SHARE cannot be applied to VALUES))); + /*-- + translator: %s is a SQL row locking clause such as FOR UPDATE */ + errmsg(%s cannot be applied to VALUES, + LCS_asString(((LockingClause *) + linitial(stmt-lockingClause))-strength; qry-rtable = pstate-p_rtable; qry-jointree = makeFromExpr(pstate-p_joinlist, NULL); @@ -1312,7 +1316,11 @@ transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt) if (lockingClause) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(SELECT FOR UPDATE/SHARE is not allowed with UNION/INTERSECT/EXCEPT))); + /*-- + translator: %s is a SQL row locking clause such as FOR UPDATE */ + errmsg(%s is not allowed with UNION/INTERSECT/EXCEPT, + LCS_asString(((LockingClause *) + linitial(lockingClause))-strength; /* Process the WITH clause independently of all else */ if (withClause) @@ -1506,7 +1514,11 @@ transformSetOperationTree(ParseState *pstate, SelectStmt *stmt, if (stmt-lockingClause) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(SELECT FOR UPDATE/SHARE is not allowed with UNION/INTERSECT/EXCEPT))); + /*-- + translator: %s is a SQL row locking clause such as FOR UPDATE */ + errmsg(%s is not allowed with UNION/INTERSECT/EXCEPT, + LCS_asString(((LockingClause *) + linitial(stmt-lockingClause))-strength; /* * If an internal node of a set-op tree has ORDER BY, LIMIT, FOR UPDATE, @@ -2063,21 +2075,33 @@ transformDeclareCursorStmt(ParseState *pstate, DeclareCursorStmt *stmt) if (result-rowMarks != NIL (stmt-options CURSOR_OPT_HOLD)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(DECLARE CURSOR WITH HOLD ... FOR UPDATE/SHARE is not supported), + /*-- + translator: %s is a SQL row locking clause such as FOR UPDATE */ + errmsg(DECLARE CURSOR WITH HOLD ... %s is not supported, + LCS_asString(((RowMarkClause *) + linitial(result-rowMarks))-strength)), errdetail(Holdable cursors must be READ ONLY.))); /* FOR UPDATE and SCROLL are not compatible */ if (result-rowMarks != NIL (stmt-options CURSOR_OPT_SCROLL)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(DECLARE SCROLL CURSOR ... FOR UPDATE/SHARE is not supported), + /*-- + translator: %s is a SQL row locking clause such as FOR UPDATE */ + errmsg(DECLARE SCROLL CURSOR ... %s is not supported, + LCS_asString(((RowMarkClause
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On 7/22/13 2:57 PM, Andres Freund wrote: * I'd be very surprised if this doesn't make WAL replay of update heavy workloads slower by at least factor of 2. I was thinking about what a benchmark of WAL replay would look like last year. I don't think that data is captured very well yet, and it should be. My idea was to break the benchmark into two pieces. One would take a base backup, then run a series of tests and archive the resulting the WAL. I doubt you can make a useful benchmark here without a usefully populated database, that's why the base backup step is needed. The first useful result then is to measure how long commit/archiving took and the WAL volume, which is what's done by the test harness for this program. Then the resulting backup would be setup for replay. tarring up the backup and WAL archive could even give you a repeatable test set for ones where it's only replay changes happening. Then the main number that's useful, total replay time, would be measured. The main thing I wanted this for wasn't for code changes; it was to benchmark configuration changes. I'd like to be able to answer questions like which I/O scheduler is best for a standby in a way that has real test data behind it. The same approach should useful for answering your concerns about the replay performance impact of this change too though. * It makes changeset extraction either more expensive or it would have to be disabled there. That argues that if committed at all, the ability to turn this off I was asking about would be necessary. It sounds like this *could* work like how minimal WAL archiving levels allow optimizations that are disabled at higher ones--like the COPY into a truncated/new table cheat. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On Fri, Jul 19, 2013 at 1:20 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: 4. If we use GetActiveSnapshot, all the comments about about a fresh MVCC snapshot still apply. However, the snapshot in question could be even more stale, especially in repeatable read or serializable mode. However, this might be thought a more consistent behavior than what we have now. And I'm guessing that this function is typically run as its own transaction, so in practice this doesn't seem much different from an MVCC snapshot, only cheaper. At the moment, I dislike #2 and slightly prefer #4 to #3. +1 for #4, and if we ever need more then we can provide a non-default way to get at #2. OK, done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
On Mon, Jul 22, 2013 at 12:40 PM, Andrew Dunstan and...@dunslane.net wrote: On 07/22/2013 03:11 PM, Pavel Stehule wrote: 2013/7/22 Robert Haas robertmh...@gmail.com: Rather than just continuing to add more imposible-to-remember syntax, we really need a better design here. do you have any tip? I agree with Robert. My tip is this: when you're in a hole, the first thing to do is to stop digging. I don't think that Pavel believes himself to be in a hole. After setting up my .psqlrc file as I normally do, I could do this: :rtsize limit 10; But it doesn't have the 'MB' feature, and if I want to help someone else I first have to explain to them how to set their .psqlrc file the same as mine, which is rather a bummer. Is looking for the biggest tables a common enough thing that it should be available to everyone, without needing custom customization? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On 7/20/13 10:08 AM, Kohei KaiGai wrote: With that change to expand_targetlist(), the change to getrelid() may be unnecessary, and then also the new rowsec_relid field in RangeTblEntry may not be needed. Hmm. I didn't have this idea. It seems to me fair enough and kills necessity to enhance RangeTblEntry and getrelid() indeed. I try to fix up this implementation according to your suggestion. Great, there's one useful bit of feedback for you then, and that seems to address Tom's getrelid concern too. For the active CommitFest, I don't see any place we can go with this right now except for Returned with Feedback. We really need more reviewers willing to put a significant amount of time into going through this code. Anyone who would like to see RLS committed should consider how you can get resources to support a skilled PostgreSQL reviewer spending time on it. (This is a bit much to expect new reviewers to chew on usefully) I've been working on that here, but I don't have anything I can publicly commit to yet. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expression indexes and dependecies
Pavan Deolasee escribió: Hello, While doing some tests, I observed that expression indexes can malfunction if the underlying expression changes. [...] Perhaps this is a known behaviour/limitation, but I could not find that in the documentation. But I wonder if it makes sense to check for dependencies during function alteration and complain. Or there are other reasons why we can't do that and its a much larger problem than what I'm imagining ? This is a tough problem. The dependency mechanism has no way to keep track of this kind of dependency; all it does is prevent the function from being dropped altogether, but preventing it from acquiring a conflicting definition is outside its charter. One way to attack this would be registering dependencies of a new kind on functions used by index expressions. Then CREATE OR REPLACE function could reject alteration for such functions. I don't know if we care enough about this case. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expression indexes and dependecies
On Mon, Jul 22, 2013 at 6:04 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Pavan Deolasee escribió: Hello, While doing some tests, I observed that expression indexes can malfunction if the underlying expression changes. [...] Perhaps this is a known behaviour/limitation, but I could not find that in the documentation. But I wonder if it makes sense to check for dependencies during function alteration and complain. Or there are other reasons why we can't do that and its a much larger problem than what I'm imagining ? This is a tough problem. The dependency mechanism has no way to keep track of this kind of dependency; all it does is prevent the function from being dropped altogether, but preventing it from acquiring a conflicting definition is outside its charter. One way to attack this would be registering dependencies of a new kind on functions used by index expressions. Then CREATE OR REPLACE function could reject alteration for such functions. I don't know if we care enough about this case. What about a warning and leave it to the dba to reindex? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expression indexes and dependecies
On 2013-07-22 17:04:06 -0400, Alvaro Herrera wrote: Pavan Deolasee escribió: Hello, While doing some tests, I observed that expression indexes can malfunction if the underlying expression changes. [...] Perhaps this is a known behaviour/limitation, but I could not find that in the documentation. But I wonder if it makes sense to check for dependencies during function alteration and complain. Or there are other reasons why we can't do that and its a much larger problem than what I'm imagining ? This is a tough problem. The dependency mechanism has no way to keep track of this kind of dependency; all it does is prevent the function from being dropped altogether, but preventing it from acquiring a conflicting definition is outside its charter. One way to attack this would be registering dependencies of a new kind on functions used by index expressions. Then CREATE OR REPLACE function could reject alteration for such functions. I don't know if we care enough about this case. I think changing the results of a immutable function violates the contract enough to make this the user's fault. Also the other solutions seem hard to achieve ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Comma Comma Comma 8601
Hackers, According to [Wikipedia](https://en.wikipedia.org/wiki/ISO_8601#Times): Decimal fractions may also be added to any of the three time elements. A decimal mark, either a comma or a dot (without any preference as stated in resolution 10 of the 22nd General Conference CGPM in 2003,[11] but with a preference for a comma according to ISO 8601:2004)[12] is used as a separator between the time element and its fraction. A fraction may only be added to the lowest order time element in the representation. To denote 14 hours, 30 and one half minutes, do not include a seconds figure. Represent it as 14:30,5, 1430,5, 14:30.5, or 1430.5. There is no limit on the number of decimal places for the decimal fraction. However, the number of decimal places needs to be agreed to by the communicating parties. I assume that the Postgres project has no interest in supporting the input of whack times like “14:30.5”, “1430.5”, “14:30.5”, or “1430.5”, right? I mean, that’s just bizarre, amirite? But I do wonder if the comma should be allowed for fractional seconds, since the spec says it is preferred (and often used in Javaland, I’m told). As in “14:30:50,232”. Thoughts? Best, David PS: Apologies if you somehow ended up with a bad 80s pop song in your head. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
Pavel Stehule pavel.steh...@gmail.com writes: SELECT * from top5(); $ TABLE top5; -- add a view on top of the SRF you cannot use parameters - then I have to have prepared files like top10, top20, ... what is not too friendly The SRF could be using custom GUCs so that you can parametrize it, or just even classic parameters… $ TABLE top(5); -- needs a patch to accept SRF here… $ TABLE top LIMIT 5; $ SET top.limit = 5; $ TABLE top; Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
On 2013-07-17 10:11:28 -0400, Tom Lane wrote: Kevin Grittner kgri...@postgresql.org writes: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch is broken. Jagarundi still tells that story. At least something like the following patch seems to be required. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index 46149ee..09ea344 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -141,6 +141,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, List *actions; Query *dataQuery; Oid tableSpace; + Oid owner; Oid OIDNewHeap; DestReceiver *dest; bool concurrent; @@ -238,6 +239,8 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, else tableSpace = matviewRel-rd_rel-reltablespace; + owner = matviewRel-rd_rel-relowner; + heap_close(matviewRel, NoLock); /* Create the transient table that will receive the regenerated data. */ @@ -247,8 +250,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, /* Generate the data, if wanted. */ if (!stmt-skipData) - refresh_matview_datafill(dest, dataQuery, queryString, - matviewRel-rd_rel-relowner); + refresh_matview_datafill(dest, dataQuery, queryString, owner); /* Make the matview match the newly generated data. */ if (concurrent) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvokeObjectPostAlterHook() vs. CommandCounterIncrement()
On Sun, Jul 21, 2013 at 4:44 AM, Ants Aasma ants.aa...@eesti.ee wrote: On Jul 21, 2013 4:06 AM, Noah Misch n...@leadboat.com wrote: If these hooks will need to apply to a larger operation, I think that mandates a different means to reliably expose the before/after object states. I haven't checked the code to see how it would fit the API, but what about taking a snapshot before altering and passing this to the hook. Would there be other issues besides performance? If the snapshot is taken only when there is a hook present then the performance can be fixed later. I had the idea of finding a way to pass either the old tuple, or perhaps just the TID of the old tuple. Not sure if passing a snapshot is better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
On Mon, Jul 22, 2013 at 6:01 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-07-17 10:11:28 -0400, Tom Lane wrote: Kevin Grittner kgri...@postgresql.org writes: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch is broken. Jagarundi still tells that story. At least something like the following patch seems to be required. Thanks, I was noticing that breakage today, too. I committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
On Mon, Jul 22, 2013 at 02:44:59PM -0700, Dimitri Fontaine wrote: Pavel Stehule pavel.steh...@gmail.com writes: SELECT * from top5(); $ TABLE top5; -- add a view on top of the SRF you cannot use parameters - then I have to have prepared files like top10, top20, ... what is not too friendly The SRF could be using custom GUCs so that you can parametrize it, or just even classic parameters… $ TABLE top(5); -- needs a patch to accept SRF here… Andrew Gierth will probably be posting a design patch for something similar soon :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
On Mon, Jul 22, 2013 at 03:55:33PM -0700, David Fetter wrote: On Mon, Jul 22, 2013 at 02:44:59PM -0700, Dimitri Fontaine wrote: Pavel Stehule pavel.steh...@gmail.com writes: SELECT * from top5(); $ TABLE top5; -- add a view on top of the SRF you cannot use parameters - then I have to have prepared files like top10, top20, ... what is not too friendly The SRF could be using custom GUCs so that you can parametrize it, or just even classic parameters… $ TABLE top(5); -- needs a patch to accept SRF here… Andrew Gierth will probably be posting a design patch for something similar soon :) Probably not :P Andrew will be posting a design and patch if and when he decides it's appropriate to do so. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Andres Freund and...@2ndquadrant.com writes: On 2013-07-17 10:11:28 -0400, Tom Lane wrote: The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch is broken. Jagarundi still tells that story. Uh, no. Jagarundi was perfectly happy for several build cycles after I committed 405a468. The buildfarm history shows that it started complaining again after this change: f01d1ae Add infrastructure for mapping relfilenodes to relation OIDs At least something like the following patch seems to be required. This does look like a necessary change, but I suspect there is also something rotten in f01d1ae. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] The Commitfest Slacker List
On 6/24/13 12:57 PM, Josh Berkus wrote: Maciej is correct that this policy also belongs on the how to submit a patch wiki page. I will remedy that. I just reviewed and heavily updated the new section you added to https://wiki.postgresql.org/wiki/Submitting_a_Patch That included the idea that the reviewed patch should be similar in size/scope to the submitted one, as well a slightly deeper discussion of how to fit this work into a feature review quote. I find myself needing to explain this whole subject to potential feature sponsors enough that I've tried a few ways of describing it. The closest analog I've found so far is the way carbon offset work is accounted for. I sometimes refer to the mutual review as an offsetting review in conversation, and I threw that term into the guidelines as well. As far as motivating new reviewers goes, let's talk about positive feedback. Anything that complicates the release notes is a non-starter because that resource is tightly controlled by a small number of people, and it's trying to satisfy a lot of purposes. What I would like to see is an official but simple Review Leaderboard for each release instead. Each time someone writes a review and adds it to CF app with a Review entry, the account that entered it gets a point. Sum the points at the end and there's your weighted list for T-shirt prizes. It should be possible to get that count with a trivial SELECT query out of the CF database, and then produce a simple HTML table at the end of each CF or release. Anything that takes more work than that, and anything that takes *any* time that must come from a committer, is too much accounting. This idea would be a bit unfair to people who review big patches instead of little ones. But an approach that disproportionately rewards new contributors working on small things isn't that terrible. As long as the review tests whether the code compiles and passes the regression tests, that's good enough to deserve a point. I'd be happy if we suddenly had a problem where people were doing only that to try game their leaderboard ranking. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Comma Comma Comma 8601
David E. Wheeler da...@justatheory.com writes: But I do wonder if the comma should be allowed for fractional seconds, since the spec says it is preferred (and often used in Javaland, I'm told). As in 14:30:50,232. Thoughts? Does that create any ambiguities against formats we already support? I'm worried about examples like this one: select 'monday, july 22, 22:30 2013'::timestamptz; timestamptz 2013-07-22 22:30:00-04 (1 row) Right now I'm pretty sure that the datetime parser treats comma as a noise symbol. If that stops being true, we're likely to break some applications out there (admittedly, possibly rather strange ones, but still ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
On 2013-07-22 19:09:13 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-07-17 10:11:28 -0400, Tom Lane wrote: The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch is broken. Jagarundi still tells that story. Uh, no. Jagarundi was perfectly happy for several build cycles after I committed 405a468. The buildfarm history shows that it started complaining again after this change: f01d1ae Add infrastructure for mapping relfilenodes to relation OIDs Huh? That's rather strange. No code from that patch is even exececuted until the alter_table regression tests way later. So I can't see how that patch could cause the matview error. Which is pretty clearly using an freed relcache entry. I think this may be a timing issue. At least something like the following patch seems to be required. This does look like a necessary change, but I suspect there is also something rotten in f01d1ae. The alter table failure lateron seems to be ERROR: relation tmp already exists, which just seems to be a consequence of incomplete cleanup due to the earlier crashes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expression indexes and dependecies
Andres Freund and...@2ndquadrant.com writes: On 2013-07-22 17:04:06 -0400, Alvaro Herrera wrote: One way to attack this would be registering dependencies of a new kind on functions used by index expressions. Then CREATE OR REPLACE function could reject alteration for such functions. I don't know if we care enough about this case. I think changing the results of a immutable function violates the contract enough to make this the user's fault. Also the other solutions seem hard to achieve ;) Yeah. Prohibiting any change at all would be a cure worse than the disease, likely, but we don't have the tools to analyze more finely than that. And what if the index uses function A which calls function B, and you change function B? I'd be in favor of adding a note to the CREATE INDEX man page pointing out that if you change the behavior of an immutable function, any bad consequences for indexes are on your own head, and a REINDEX would be advisable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] small typo in src/backend/access/transam/xlog.c
Andres Freund and...@2ndquadrant.com writes: On 2013-07-22 15:55:46 -0400, Robert Haas wrote: And why is that? The comment above tells: while the lower half is the XOR of tv_sec and tv_usec. Yeah, the code doesn't match the comment; this mistake seems to be aboriginal. I don't think it really matters. the bitwise OR has the tenency to collect too many set bits, but ... who cares? This is making the value less unique than intended, so I think it's worth doing something about. However, it's also worth noting that the intended behavior (as described by the comment) was designed to allow for the possibility that uint64 is really only 32 bits --- a possibility we stopped supporting several versions ago. So rather than just quickly s/|/^/, maybe we should step back and think about whether we want to change the sysid generation algorithm altogether. We could for instance keep the high half as tv_sec, while making the low half be something like (tv_usec 12) | (getpid() 0xfff). This would restore the intended ability to reverse-engineer the exact creation time from the sysidentifier, and also add a little more uniqueness by way of the creating process's PID. (Note tv_usec must fit in 20 bits.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
All, Christophe just discovered something with include files which is going to cause issues with ALTER SYSTEM SET. So, take as a hypothetical that you use the default postgresql.conf file, which sets shared_buffers = 32MB. Instead of editing this file, you do ALTER SYSTEM SET shared_buffers = '1GB', which updates config/postgresql.auto.conf. Then you restart PostgreSQL. Everything is hunky-dory, until a later occasion where you *reload* postgresql. Then postgres startup hits the first shared_buffers=32MB (in postgresql.conf), says I can't change shared buffers on a reload, and aborts before ever getting to postgresql.auto.conf. This is a problem that exists now with includes in postgresql.conf, but having ALTER SYSTEM SET will cause more users to hit it. Seems like we'll need to fix it before releasing 9.4. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
Jeff Janes jeff.ja...@gmail.com writes: Is looking for the biggest tables a common enough thing that it should be available to everyone, without needing custom customization? I don't really think so. It's surely not much harder than select relname, pg_relation_size(oid) from pg_class order by 2 desc; Moreover, the people who need this likely don't need it as a psql command, but rather as something available to monitoring tools. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
On 07/22/2013 04:26 PM, Jeff Janes wrote: On Mon, Jul 22, 2013 at 12:40 PM, Andrew Dunstan and...@dunslane.net wrote: On 07/22/2013 03:11 PM, Pavel Stehule wrote: 2013/7/22 Robert Haas robertmh...@gmail.com: Rather than just continuing to add more imposible-to-remember syntax, we really need a better design here. do you have any tip? I agree with Robert. My tip is this: when you're in a hole, the first thing to do is to stop digging. I don't think that Pavel believes himself to be in a hole. I'm not suggesting he's in a hole - I'm suggesting we are. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Very minor update with V19 here, to reflect Alvaro's comments. The tricky part now reads like this: High rate limit schedule lag values, that is lag values that are large compared to the actual transaction latency, indicate that something is amiss in the throttling process. High schedule lag can highlight a subtle problem there even if the target rate limit is met in the end. I have committed this along with slight modification. I changed --rate rate to --rate=rate to follow option style of pgbench. Also I have removed a space in --progress= sec in the doc, which is probably mistakenly added by previous commit. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
Josh Berkus j...@agliodbs.com writes: Christophe just discovered something with include files which is going to cause issues with ALTER SYSTEM SET. So, take as a hypothetical that you use the default postgresql.conf file, which sets shared_buffers = 32MB. Instead of editing this file, you do ALTER SYSTEM SET shared_buffers = '1GB', which updates config/postgresql.auto.conf. Then you restart PostgreSQL. Everything is hunky-dory, until a later occasion where you *reload* postgresql. Everything was already *not* hunky-dory as soon as you did that, since a SIGHUP would have had the same problem. I'd be inclined to think that ALTER SYSTEM SET should not be allowed to modify any PGC_POSTMASTER parameters. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
On 2013-07-23 00:01:50 +0200, Andres Freund wrote: On 2013-07-17 10:11:28 -0400, Tom Lane wrote: Kevin Grittner kgri...@postgresql.org writes: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. The buildfarm members that use -DCLOBBER_CACHE_ALWAYS say this patch is broken. Jagarundi still tells that story. At least something like the following patch seems to be required. Hm. There seems to be more things that need some more improvement from a quick look. First, I have my doubts of the general modus operandy of CONCURRENTLY here. I think in many, many cases it would be faster to simply swap in the newly built heap + indexes in concurrently. I think I have seen that mentioned in the review while mostly skipping the discussion, but still. That would be easy enough as of Robert's mvcc patch. * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance be done a good bit earlier? We're executing queries before that. * The loop over indexes in refresh_by_match_merge should index_open(ExclusiveLock) the indexes initially instead of searching the syscache manually. They are opened inside the loop in many cases anyway. There probably aren't any hazards currently, but I am not even sure about that. The index_open() in the loop at the very least processes the invalidation messages other backends send... I'd even suggest using BuildIndexInfo() or such on the indexes, then you could use -ii_Expressions et al instead of peeking into indkeys by hand. * All not explicitly looked up operators should be qualified using OPERATOR(). It's easy to define your own = operator for tid and set the search_path to public,pg_catalog. Now, the whole thing is restricted to talbe owners afaics, making this decidedly less dicey, but still. But if anyobdy actually uses a search_path like the above you can easily hurt them. * Same seems to hold true for the y = x and y.* IS DISTINCT FROM x.* operations. * (y.*) = (x.*) also needs to do proper operator lookups. * OpenMatViewIncrementalMaintenance() et al seem to be underdocumented. * why is it even allowed that matview_maintenance_depth is 1? Unless I miss something there doesn't seem to be support for a recursive refresh whatever that would mean? * INSERT and DELETE should also alias the table names, to make sure there cannot be issues with the nested queries. * I'd strongly suggest more descriptive table aliases than x, y, d. Those make the statements rather hard to parse. * SPI_exec() is deprecated in favor of SPI_execute() * ISTM deferred uniqueness checks and similar things should be enforced to run ub refresh_by_match_merge() I think this patch/feature needs a good bit of work... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] The Commitfest Slacker List
On 7/3/13 7:25 PM, Bruce Momjian wrote: The extrapolation of Josh's approach is that committers have to do work that the community wants to maintain their commit rights, but their commit rights are helping the community, so why would people care if you take them away --- you only hurt the community further by doing so. The main problem with having inactive committers (which I don't intend to include the important subject matter committers, who I'll get into at the end here) is that they skew the public information about who commits in a counterproductive way. People visit https://wiki.postgresql.org/wiki/Committers , sees that list of names, and then make conclusions based on its content. And some of those conclusions are wrong because the data is inconsistent. The standards listed are applied when new committers are promoted, but they are not applied symmetrically to remove ones who don't anymore. The #1 obstacle to my getting more time to work on core PostgreSQL is that companies presume regular submitters who are also non-committers don't do a very good job. If you are competent and have a consistent track record of contributions to an open source project, the project would make you a committer, right? Conversely, if you've been contributing for a while but aren't a committer, the most likely explanation is that your work quality is poor. That is a completely reasonable viewpoint based on how most open source projects work. The really terrible part is that it means the longer you've been submitting patches, the *less* competent you're assumed to be. When I tell people I've been submitting things since 2007 but am not a committer, the only logical explanation is that my submissions must suck very hard, right? From that perspective, people who are listed as committers but don't actively do work for the project are causing me a serious problem. When someone who rarely commits can obviously qualify, that *proves* the bar for PostgreSQL committers is actually very low to casual observers. That's the message the project is inadvertently sending by leaving committers on there if they stop working actively. The main thing I'd like to see is having the committer list, and its associated guidelines, updated to reflect that there are subject matter experts committing too. That would pull them out of any what have you done for me lately? computations, and possibly open up a way to get more of them usefully. Here are the first two obvious labels like that: Michael Meskes (meskes) - embedded SQL Teodor Sigaev (teodor) - inverted indexes When even Josh Berkus doesn't even know all of this information, it's clearly way too obscure to expect the rest of the world to figure it out. It also boggles my mind that there isn't already an entry like this on there too: Thom Browne - documentation Each time Thom passes through yet another correction patch that is committed with no change, I find it downright bizarre that a community with such limited committer resources wastes their time with that gatekeeping. The standards for nominating committers seem based on whether they can commit just about anything. I think it's more important to consider whether people are trusted to keep commits within their known area(s) of expertise. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX checking of index constraints
On Sun, Jul 21, 2013 at 01:47:00PM -0700, Josh Berkus wrote: On 07/21/2013 11:30 AM, Josh Berkus wrote: Attached patch just restores the old behavior. Would it be worth preserving the ability to fix an index consistency problem with a REINDEX independent from related heap consistency problems such as duplicate keys? I would love to have two versions of REINDEX, one which validated and one which didn't. Maybe a ( validate off ) type check? Cancel this. I just did some tests, and there amount of time required for the validation (at least, in simple two-column table test) is 10% of the time required to reindex in general. At that difference, we don't need two options. Unless you're asking if we want a command to check the index validity without rebuilding it? That might be more valuable ... I meant to ask whether, instead of reverting the accidental behavior change, we should do something like leave the behavior and change the documentation instead. I personally vote no, but that alternative seemed credible enough to justify mentioning it. Something more radical, like a new UI, would be a separate patch. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Ok, since Atri posted our work-so-far and there's not been much comment, I'll outline here my proposed plan of attack. Rather than, as in the WIP patch, using the agg finalfn to validate the split between normal args and ORDER BY args, I propose this: Firstly, as in the WIP patch, func(a) within group (order by b) is looked up as though it were func(a,b). The result must be marked as an ordered set function. A new pg_aggregate integer column, aggordnargs (?), must be equal to the number of normal args (i.e. (a) in this case). Note that this may be 0; one can see a legitimate use case for mode() within group (order by anyelement) for example. The finalfn must be defined to have the same signature, so its args are processed as if it were func_final(a,b) - but only a dummy arg is passed for b. (Similar to the case for window functions.) Resolution of polymorphic parameters and result types therefore works as normal. For hypothetical set functions we add a special case, aggordnargs=-1, for which both the aggregate and the finalfn must be defined as (variadic any) and parse analysis detects this case and unifies the types of the normal args with those of the ORDER BY args. I propose this new syntax: create aggregate func(argtypes...) within group (argtypes...) ( [ STYPE = ... , ] [ SORTOP = ... , ] [ INITCOND = ... , ] FINALFUNC = func_final ); Ordered set functions will typically not need STYPE etc., but hypothetical set functions will be declared as, e.g.: create aggregate rank(variadic any) within group (variadic any) ( STYPE = boolean, INITCOND = 'f', SORTOP = , FINALFUNC = rank_hypothetical_final ); (I'm open to comment as to whether to simply overload the aggsortop column in pg_aggregate or add a new one. I'm inclined to do the latter.) The idea here is that a column of type STYPE will be appended to the list of columns to be sorted, using SORTOP as sort operator, and all input rows will have this column initialized to the INITCOND value. This is to make it easy to implement rank() and friends by simply inserting the hypothetical row into the sort, with a true value to flag it, and finding its position in the sort result. (Better that than comparing the hypothetical row against the whole group.) (Security caveat: it will be necessary for the finalfn in such cases to validate that the additional column exists with the right type. Producing the wrong result is acceptable if the values in it are unexpected; crashing is not.) Any comment before we get back to coding? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revive line type
On 6/26/13 9:34 PM, Peter Eisentraut wrote: Still wondering whether to use a A,B,C-based output format per Tom's comment. Wouldn't it also be helpful to remove The points used in the output are not necessarily the points used on input by making that not true? There are three obvious ways you might output a line: -Math class expectations of slope-intercept form: y = mx + b. Intercept forms don't handle both horizontal and vertical lines though, so that's easily out. -Pair of points. A casual observer might get lucky and observe putting a pair of points in and getting the same pair back again, then incorrectly assume that's normally the case. Seems better to never make that possible in the infinite line case. I'm reminded of how row output usually is in order gives a bad impression about ordering guarantees in SQL. -Ax+By+C=0 Outputting that third one, when it's also the internal form, handles any time of line; will show any assumptions about individual points being preserved are wrong; and avoids rounding errors too. The only downside seems to be that bounded lines are easier to show with a pair of points. I think that suggests an alternate, secondary output format would be useful though, rather than that a different default one is a good idea. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Preventing tuple-table leakage in plpgsql
On Sun, Jul 21, 2013 at 12:40:38PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: Reasonable enough. Code that does use subtransactions will need to be more careful than before to manually free tuple tables in the non-error case. Failure to do so has been creating a leak that lasts until SPI_finish(), but it will now be able to cause a transaction-lifespan leak. Hmm ... good point. The other plan I'd been considering was to add explicit tracking inside spi.c of all tuple tables created within the current procedure, and then have AtEOSubXact_SPI flush any that were created inside a failed subxact. The tables would still be children of the procCxt and thus could not be leaked past SPI_finish. When you suggested attaching to subtransaction contexts I thought that would let us get away without this additional bookkeeping logic, but maybe we should bite the bullet and add the extra logic. Is there reason to believe we wouldn't eventually find a half dozen other allocations calling for similar bespoke treatment? Does something make tuple tables special among memory allocations, or are they just the garden-variety allocation that happens to bother the test case at hand? A change that's meant to remove leak risks really shouldn't be introducing other, new leak risks. Yes; that gives one pause. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Preventing tuple-table leakage in plpgsql
Noah Misch n...@leadboat.com writes: On Sun, Jul 21, 2013 at 12:40:38PM -0400, Tom Lane wrote: Hmm ... good point. The other plan I'd been considering was to add explicit tracking inside spi.c of all tuple tables created within the current procedure, and then have AtEOSubXact_SPI flush any that were created inside a failed subxact. Is there reason to believe we wouldn't eventually find a half dozen other allocations calling for similar bespoke treatment? Does something make tuple tables special among memory allocations, or are they just the garden-variety allocation that happens to bother the test case at hand? It's hard to speculate about the memory management habits of third-party SPI-using code. But in plpgsql, the convention is that random bits of memory should be allocated in a short-term context separate from the SPI procCxt --- typically, the estate-eval_econtext expression context, which exec_stmt_block already takes care to clean up when catching an exception. So the problem is that that doesn't work for tuple tables, which have procCxt lifespan. The fact that they tend to be big (at least 8K apiece) compounds the issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] And then there were 5
After pushes from a few people, the remaining submissions are now waiting for commit. I updated each of those to have the latest info in the CF app, and tried to identify what committers have already looked at them. Access to calls stack from GET DIAGNOSTICS statement: ? Add more regression tests mostly for ALTER OPERATOR FAMILY .. ADD/DROP: ? Robert Haas gave a quick review last week that's generated an update since. But that looks it was just a quick scan for issues. Remove unused targets from plan: Alvaro? (He reviewed it already) SQL Command to edit postgresql.conf (ALTER SYSTEM patch): Alvaro volunteered to look at this if no one else gets to it first. UNNEST() (and other functions) WITH ORDINALITY: Greg Stark -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeTblEntry.joinaliasvars representation change
I wrote: After some reflection I think that the best fix is to redefine AcquireRewriteLocks' processing of dropped columns so that it puts an actual null pointer, not a consed-up Const, into the joinaliasvars list item. Here's a complete patch along that line. Possibly worthy of note is that I chose to keep expandRTE's API the same as before, ie, it returns a NULL Const for a dropped column (when include_dropped is true). I had intended to change it to return a null pointer to match the change in the underlying data structure, but found that most of the callers passing include_dropped = true need the Consts, because they're going to construct a RowExpr that has to have null constants for the dropped columns. In HEAD, I'm a bit tempted to move strip_implicit_coercions into nodes/nodeFuncs.c, so that we don't have the ugliness of the parser calling an optimizer subroutine. But I propose applying this to back branches as-is. regards, tom lane diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c index 7eaf8d27bf08bf5dd1776d203876adb8396c73b3..5f736ad6c4060ff4d7dabb3844a04185e01fa3ef 100644 *** a/src/backend/optimizer/util/var.c --- b/src/backend/optimizer/util/var.c *** flatten_join_alias_vars_mutator(Node *no *** 654,660 newvar = (Node *) lfirst(lv); attnum++; /* Ignore dropped columns */ ! if (IsA(newvar, Const)) continue; newvar = copyObject(newvar); --- 654,660 newvar = (Node *) lfirst(lv); attnum++; /* Ignore dropped columns */ ! if (newvar == NULL) continue; newvar = copyObject(newvar); *** flatten_join_alias_vars_mutator(Node *no *** 687,692 --- 687,693 /* Expand join alias reference */ Assert(var-varattno 0); newvar = (Node *) list_nth(rte-joinaliasvars, var-varattno - 1); + Assert(newvar != NULL); newvar = copyObject(newvar); /* diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c index a9254c8c3a2e33b7c293ef51c53c78a797b1d4f1..42de89f510190877b1f6fa357efb08c81eb7acc9 100644 *** a/src/backend/parser/parse_relation.c --- b/src/backend/parser/parse_relation.c *** *** 24,29 --- 24,30 #include funcapi.h #include nodes/makefuncs.h #include nodes/nodeFuncs.h + #include optimizer/clauses.h #include parser/parsetree.h #include parser/parse_relation.h #include parser/parse_type.h *** markRTEForSelectPriv(ParseState *pstate, *** 749,762 * The aliasvar could be either a Var or a COALESCE expression, * but in the latter case we should already have marked the two * referent variables as being selected, due to their use in the ! * JOIN clause. So we need only be concerned with the simple Var ! * case. */ Var *aliasvar; Assert(col 0 col = list_length(rte-joinaliasvars)); aliasvar = (Var *) list_nth(rte-joinaliasvars, col - 1); ! if (IsA(aliasvar, Var)) markVarForSelectPriv(pstate, aliasvar, NULL); } } --- 750,764 * The aliasvar could be either a Var or a COALESCE expression, * but in the latter case we should already have marked the two * referent variables as being selected, due to their use in the ! * JOIN clause. So we need only be concerned with the Var case. ! * But we do need to drill down through implicit coercions. */ Var *aliasvar; Assert(col 0 col = list_length(rte-joinaliasvars)); aliasvar = (Var *) list_nth(rte-joinaliasvars, col - 1); ! aliasvar = (Var *) strip_implicit_coercions((Node *) aliasvar); ! if (aliasvar IsA(aliasvar, Var)) markVarForSelectPriv(pstate, aliasvar, NULL); } } *** expandRTE(RangeTblEntry *rte, int rtinde *** 1841,1850 * deleted columns in the join; but we have to check since * this routine is also used by the rewriter, and joins * found in stored rules might have join columns for ! * since-deleted columns. This will be signaled by a NULL ! * Const in the alias-vars list. */ ! if (IsA(avar, Const)) { if (include_dropped) { --- 1843,1852 * deleted columns in the join; but we have to check since * this routine is also used by the rewriter, and joins * found in stored rules might have join columns for ! * since-deleted columns. This will be signaled by a null ! * pointer in the alias-vars list. */ ! if (avar == NULL) { if (include_dropped) { *** expandRTE(RangeTblEntry *rte, int rtinde *** 1852,1859 *colnames = lappend(*colnames, makeString(pstrdup())); if (colvars) *colvars = lappend(*colvars, ! copyObject(avar)); } continue; } --- 1854,1869 *colnames =
Re: [HACKERS] [9.4 CF 1] And then there were 5
Greg Smith g...@2ndquadrant.com writes: Remove unused targets from plan: Alvaro? (He reviewed it already) Really I should do that one, but it seems like all my available cycles have been going into bug fixing lately :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses
On 7/22/13 4:52 AM, KONDO Mitsumasa wrote: The writeback source code which I indicated part of writeback is almost same as community kernel (2.6.32.61). I also read linux kernel 3.9.7, but it is almost same this part. The main source code difference comes from going back to the RedHat 5 kernel, which means 2.6.18. For many of these versions, you are right that it is only the tuning parameters that were changed in newer versions. Optimizing performance for the old RHEL5 kernel isn't the most important thing, but it's helpful to know the things it does very badly. My fsync patch is only sleep returned succece of fsync and maximum sleep time is set to 10 seconds. It does not cause bad for this problem. It's easy to have hundreds of relations that are getting fsync calls during a checkpoint. If you have 100 relations getting a 10 second sleep each, you could potentially delay checkpoints by 17 minutes this way. I regularly see systems where shared_buffers=8GB and there are 200 to 400 relation segments that need a sync during a checkpoint. This is the biggest problem with your submission. Once you give up following the checkpoint schedule carefully, it is very easy to end up with large checkpoint deadline misses on production servers. If someone thinks they are doing a checkpoint every 5 minutes, but your patch makes them take 20 minutes instead, that is bad. They will not expect that a crash might have to replay that much activity before the server is useful again. You also don't seem afraid of how exceeding the checkpoint timeout is a very bad thing yet. I think it is important that why this problem was caused. We should try to find the cause of which program has bug or problem. The checkpointer process is the problem. There's no filesystem bug or complicated issues involved in many of the bad cases. Here is a simple example that shows how the toughest problem cases happen: -64GB of RAM -10% dirty_background_ratio = 6GB of dirty writes = 6144MB -2MB/s random I/O when concurrent reads are heavy -3027 seconds to clear the cache = 51 minutes That's how you get to an example like the one in my slides: LOG: checkpoint complete: wrote 33282 buers (3.2%); 0 transaction log file(s) added, 60 removed, 129 recycled; write=228.848 s, sync=4628.879 s, total=4858.859 s It's very hard to do better on these, and I don't expect any change to help this a lot. But I don't want to see a change committed that makes this sort of checkpoint 17 minutes longer if there's 100 relations involved either. My patch not only improvement of throughput but also realize stable response time at fsync phase in checkpoint. The main reason your patch improves latency and throughput is that it makes checkpoints farther apart. That's why I drew you a graph showing how the time between checkpoints lined up perfectly with TPS. If it was only a small problem it would be worth considering, but I think it's likely to end up with these 15 minute I've outlined here instead. And I servey about ext3 file system. I wouldn't worry too much about the problems ext3 has. Like the old RHEL5 kernel I was commenting about above, there are a lot of ext3 systems out there. But we can't do a lot about getting good performance from them. It's only important to test that you're not making them a lot worse with a change. My system block size is 4096, but 8192 or more seems to better. It will decrease number of inode and get more large sequential disk fields. I normally increase read-ahead on Linux systems to get faster speed on sequential disk throughput. Changing the block size might work better in some cases, but not many people are willing to do that. Read-ahead is very easy to change at any time. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] And then there were 5
Tom Lane wrote: Greg Smith g...@2ndquadrant.com writes: Remove unused targets from plan: Alvaro? (He reviewed it already) Really I should do that one, but it seems like all my available cycles have been going into bug fixing lately :-( Yeah, I only did a first pass over that patch and was expecting you would pick it up. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto explain target tables
On 07/21/2013 10:42 PM, Миша Тюрин wrote: hi, list, again. the next proposal into auto explain. one would be happy if could set list of target tables and indexes. sometimes it is very hard to detect who is using your indexes. but turn total logging on under thousands transactions per seconds is not seems like nice idea couse size of resulting log files (cpu utilization might not be so critical) That sounds like a good idea - and since auto_explain is a contrib module, it could potentially be a way to get into PostgreSQL development. The code is in contrib/auto_explain/auto_explain.c . The challenge here is that planned queries don't just touch one table or index. You'd need to walk the query plan (a graph of Node structures) to determine which table(s) and index(es) are touched by the query there's something in the sources for it already (I haven't checked). You'll also need a way to supply the list of tables/indexes you are interested in to the extension. The simplest way to start with is likely to be reading a separate file from the data dir that contains one relation name per line. Integrating it into the postgresql.conf GUC machinery is harder; you'll need a way to parse a list of tables from a GUC (maybe you can re-use the search_path code for this?) or some other way to handle your need for a multi-valued setting. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX checking of index constraints
Noah Misch wrote: I meant to ask whether, instead of reverting the accidental behavior change, we should do something like leave the behavior and change the documentation instead. I personally vote no, but that alternative seemed credible enough to justify mentioning it. Something more radical, like a new UI, would be a separate patch. separate patch++. I agree some use cases probably justify new UI. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto explain target tables
On 07/23/2013 11:24 AM, Craig Ringer wrote: On 07/21/2013 10:42 PM, Миша Тюрин wrote: hi, list, again. the next proposal into auto explain. one would be happy if could set list of target tables and indexes. sometimes it is very hard to detect who is using your indexes. but turn total logging on under thousands transactions per seconds is not seems like nice idea couse size of resulting log files (cpu utilization might not be so critical) That sounds like a good idea - and since auto_explain is a contrib module, it could potentially be a way to get into PostgreSQL development. The code is in contrib/auto_explain/auto_explain.c . The challenge here is that planned queries don't just touch one table or index. You'd need to walk the query plan (a graph of Node structures) to determine which table(s) and index(es) are touched by the query there's something in the sources for it already (I haven't checked). Showing that I'm still very much learning this area myself, a bit more looking around found: http://www.postgresql.org/docs/current/static/querytree.html which makes it clear that the range table for the query will contain what you want. I suspect you'll need to find CTEs and subqueries and extract the relations they touch; I haven't yet checked to see whether they're aggregated into the top level range table, but suspect not. You'll find the code in src/backend/nodes/print.c informative and useful, though not directly reusable. Reading src/include/nodes/primnodes.h would be a good idea; it shows how subqueries, etc, are represented. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Design proposal: fsync absorb linear slider
Recently I've been dismissing a lot of suggested changes to checkpoint fsync timing without suggesting an alternative. I have a simple one in mind that captures the biggest problem I see: that the number of backend and checkpoint writes to a file are not connected at all. We know that a 1GB relation segment can take a really long time to write out. That could include up to 128 changed 8K pages, and we allow all of them to get dirty before any are forced to disk with fsync. Rather than second guess the I/O scheduling, I'd like to take this on directly by recognizing that the size of the problem is proportional to the number of writes to a segment. If you turned off fsync absorption altogether, you'd be at an extreme that allows only 1 write before fsync. That's low latency for each write, but terrible throughput. The maximum throughput case of 128 writes has the terrible latency we get reports about. But what if that trade-off was just a straight, linear slider going from 1 to 128? Just move it to the latency vs. throughput position you want, and see how that works out. The implementation I had in mind was this one: -Add an absorption_count to the fsync queue. -Add a new latency vs. throughput GUC I'll call . Its default value is -1 (or 0), which corresponds to ignoring this new behavior. -Whenever the background write absorbs a fsync call for a relation that's already in the queue, increment the absorption count. -max_segment_absorb 0, have the background writer scan for relations where absorption_count max_segment_absorb. When it finds one, call fsync on that segment. Note that it's possible for this simple scheme to be fooled when writes are actually touching a small number of pages. A process that constantly overwrites the same page is the worst case here. Overwrite it 128 times, and this method would assume you've dirtied every page, while only 1 will actually go to disk when you call fsync. It's possible to track this better. The count mechanism could be replaced with a bitmap of the 128 blocks, so that absorbs set a bit instead of incrementing a count. My gut feel is that this is more complexity than is really necessary here. If in fact the fsync is slimmer than expected, paying for it too much isn't the worst problem to have here. I'd like to build this myself, but if someone else wants to take a shot at it I won't mind. Just be aware the review is the big part here. I should be honest about one thing: I have zero incentive to actually work on this. The moderate amount of sponsorship money I've raised for 9.4 so far isn't getting anywhere near this work. The checkpoint patch review I have been doing recently is coming out of my weekend volunteer time. And I can't get too excited about making this as my volunteer effort when I consider what the resulting credit will look like. Coding is by far the smallest part of work like this, first behind coming up with the design in the first place. And both of those are way, way behind how long review benchmarking takes on something like this. The way credit is distributed for this sort of feature puts coding first, design not credited at all, and maybe you'll see some small review credit for benchmarks. That's completely backwards from the actual work ratio. If all I'm getting out of something is credit, I'd at least like it to be an appropriate amount of it. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto explain target tables
On 07/23/2013 11:41 AM, Craig Ringer wrote: On 07/23/2013 11:24 AM, Craig Ringer wrote: On 07/21/2013 10:42 PM, Миша Тюрин wrote: hi, list, again. the next proposal into auto explain. one would be happy if could set list of target tables and indexes. sometimes it is very hard to detect who is using your indexes. but turn total logging on under thousands transactions per seconds is not seems like nice idea couse size of resulting log files (cpu utilization might not be so critical) That sounds like a good idea - and since auto_explain is a contrib module, it could potentially be a way to get into PostgreSQL development. The code is in contrib/auto_explain/auto_explain.c . The challenge here is that planned queries don't just touch one table or index. You'd need to walk the query plan (a graph of Node structures) to determine which table(s) and index(es) are touched by the query there's something in the sources for it already (I haven't checked). Showing that I'm still very much learning this area myself, a bit more looking around found: http://www.postgresql.org/docs/current/static/querytree.html which makes it clear that the range table for the query will contain what you want. I suspect you'll need to find CTEs and subqueries and extract the relations they touch; I haven't yet checked to see whether they're aggregated into the top level range table, but suspect not. Of course, having said that I was then curious enough to go digging. It's trivial to dump the parse tree or query plan tree, as documented in the above linked documentation, and doing so clearly shows that subqueries and CTEs have their own range tables. So you'll have to walk the node tree looking for range tables and check to see whether any of the range table entries match one of the tables you're looking for. It's possible that this is a completely backwards approach; if so, hopefully one of the more experienced people here will correct me with the simple and obvious way to do it. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - psql - show longest tables
I agree with Robert. My tip is this: when you're in a hole, the first thing to do is to stop digging. I don't think that Pavel believes himself to be in a hole. After setting up my .psqlrc file as I normally do, I could do this: :rtsize limit 10; But it doesn't have the 'MB' feature, and if I want to help someone else I first have to explain to them how to set their .psqlrc file the same as mine, which is rather a bummer. Is looking for the biggest tables a common enough thing that it should be available to everyone, without needing custom customization? I am thinking so our psql interface is not complete without this feature in this are. But, sure, it is my opinion only. Everybody know command \dt and then should to learn only one char more s or r. In this time, we use (in GoodData) patch, that change order for \dt+ from alphabet to size ordered. But it is too limited. Regards Pavel Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] And then there were 5
On 7/22/13 10:48 PM, Tom Lane wrote: Greg Smith g...@2ndquadrant.com writes: Remove unused targets from plan: Alvaro? (He reviewed it already) Really I should do that one, but it seems like all my available cycles have been going into bug fixing lately :-( I just put you down as the committer on it, given Alvaro waved that way too. A further look at recently ALTER OPERATOR FAMILY changes shows that Robert has been working on those quite a bit. I'm putting him down as the committer on this last one. If the CF drags on a bit due to committer time being spent on bugs, given the date here I think that's OK. I would put next week as a loose target for finishing these off. I worry a lot less about committers keeping to their schedules than when there are 50 regular contributors still moving around. That leaves only 1 patch where I have no idea who will commit it: access to calls stack from GET DIAGNOSTICS statement -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers