Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
Tom Lane-2 wrote Unfortunately, while testing it I noticed that there's a potentially fatal backwards-compatibility problem, namely that the COPY n status gets printed on stdout, which is the same place that COPY OUT data is going. While this isn't such a big problem for interactive use, usages like this one are pretty popular: psql -c 'copy mytable to stdout' mydatabase | some-program With the patch, COPY n gets included in the data sent to some-program, which never happened before and is surely not what the user wants. The same if the -c string uses \copy. There are several things we could do about this: 1. Treat this as a non-backwards-compatible change, and document that people have to use -q if they don't want the COPY tag in the output. I'm not sure this is acceptable. I've mostly used copy to with files and so wouldn't mind if STDOUT had the COPY n sent to it as long as the target file is just the copy contents. 2. Kluge ProcessResult so that it continues to not pass back a PGresult for the COPY TO STDOUT case, or does so only in limited circumstances (perhaps only if isatty(stdout), for instance). The main problem with this is that people will test by sending output to a TTY and see the COPY n. Although if it can be done consistently then you minimize backward incompatibility and encourage people to enforce quiet mode while the command runs... 3. Modify PrintQueryStatus so that command status goes to stderr not stdout. While this is probably how it should've been done in the first place, this would be a far more severe compatibility break than #1. (For one thing, there are probably scripts out there that think that any output to stderr is an error message.) I'm afraid this one is definitely not acceptable, though it would be by far the cleanest solution were it not for compatibility concerns. Yes, it's a moot point but I'm not sure it would be best anyway. 4. As #3, but print the command status to stderr only if it's COPY n, otherwise to stdout. This is a smaller compatibility break than #3, but still a break since COPY status was formerly issued to stdout in non TO STDOUT/FROM STDIN cases. (Note that PrintQueryStatus can't tell whether it was COPY TO STDOUT rather than any other kind of COPY; if we want that to factor into the behavior, we need ProcessResult to do it.) Since we are considering stderr my (inexperienced admittedly) gut says that using stderr for this is generally undesirable and especially given our existing precedence. stdout is the seemingly correct target, typically, and the existing quiet-mode toggle provides sufficient control for typical needs. 5. Give up on the print-the-tag aspect of the change, and just fix the wrong-line-number issue (so we'd still introduce the copyStream variable, but not change how PGresults are passed around). I'm inclined to think #2 is the best answer if we can't stomach #1. But the exact rule for when to print a COPY OUT result probably still requires some debate. Or maybe someone has another idea? Also, I'm thinking we should back-patch the aspects of the patch needed to fix the wrong-line-number issue. That appears to have been introduced in 9.2; older versions of PG get the above example right. Comments? I'd like COPY TO to anything but STDOUT to emit a COPY n on STDOUT - unless suppressed by -q(uiet) Document that COPY TO STDOUT does not emit COPY n because STDOUT is already assigned for data and so is not available for notifications. Since COPY is more typically used for ETL than a bare-select, in addition to back-compatibility concerns, this default behavior seems reasonable. Would it be possible to store the n somewhere and provide a command - like GET DIAGNOSTICS in pl/pgsql - if the user really wants to know how many rows were sent to STDOUT? I'm doubt this is even useful in the typical use-case for COPY TO STDOUT but figured I'd toss the idea out there. Is there anything besides a desire for consistency that anyone has or can put forth as a use-case for COPY TO STDOUT emitting COPY n on STDOUT as well? If you are going to view the content inline, and also want a quick count, ISTM you would be more likely to use SELECT to take advantage of all its pretty-print features. If we really need to cater to this use then maybe a --loud-copy-to-stdout switch can be provided to override its default quiet-mode. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/COPY-table-FROM-STDIN-doesn-t-show-count-tag-tp5775018p5795611.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
2014-03-12 7:10 GMT+01:00 David Johnston pol...@yahoo.com: Tom Lane-2 wrote Unfortunately, while testing it I noticed that there's a potentially fatal backwards-compatibility problem, namely that the COPY n status gets printed on stdout, which is the same place that COPY OUT data is going. While this isn't such a big problem for interactive use, usages like this one are pretty popular: psql -c 'copy mytable to stdout' mydatabase | some-program With the patch, COPY n gets included in the data sent to some-program, which never happened before and is surely not what the user wants. The same if the -c string uses \copy. There are several things we could do about this: 1. Treat this as a non-backwards-compatible change, and document that people have to use -q if they don't want the COPY tag in the output. I'm not sure this is acceptable. I've mostly used copy to with files and so wouldn't mind if STDOUT had the COPY n sent to it as long as the target file is just the copy contents. 2. Kluge ProcessResult so that it continues to not pass back a PGresult for the COPY TO STDOUT case, or does so only in limited circumstances (perhaps only if isatty(stdout), for instance). The main problem with this is that people will test by sending output to a TTY and see the COPY n. Although if it can be done consistently then you minimize backward incompatibility and encourage people to enforce quiet mode while the command runs... 3. Modify PrintQueryStatus so that command status goes to stderr not stdout. While this is probably how it should've been done in the first place, this would be a far more severe compatibility break than #1. (For one thing, there are probably scripts out there that think that any output to stderr is an error message.) I'm afraid this one is definitely not acceptable, though it would be by far the cleanest solution were it not for compatibility concerns. Yes, it's a moot point but I'm not sure it would be best anyway. 4. As #3, but print the command status to stderr only if it's COPY n, otherwise to stdout. This is a smaller compatibility break than #3, but still a break since COPY status was formerly issued to stdout in non TO STDOUT/FROM STDIN cases. (Note that PrintQueryStatus can't tell whether it was COPY TO STDOUT rather than any other kind of COPY; if we want that to factor into the behavior, we need ProcessResult to do it.) Since we are considering stderr my (inexperienced admittedly) gut says that using stderr for this is generally undesirable and especially given our existing precedence. stdout is the seemingly correct target, typically, and the existing quiet-mode toggle provides sufficient control for typical needs. 5. Give up on the print-the-tag aspect of the change, and just fix the wrong-line-number issue (so we'd still introduce the copyStream variable, but not change how PGresults are passed around). I'm inclined to think #2 is the best answer if we can't stomach #1. But the exact rule for when to print a COPY OUT result probably still requires some debate. Or maybe someone has another idea? Also, I'm thinking we should back-patch the aspects of the patch needed to fix the wrong-line-number issue. That appears to have been introduced in 9.2; older versions of PG get the above example right. Comments? I'd like COPY TO to anything but STDOUT to emit a COPY n on STDOUT - unless suppressed by -q(uiet) +1 This information can be really interesting and sometimes important, when people has no idea, how they tables are long Regards Pavel Document that COPY TO STDOUT does not emit COPY n because STDOUT is already assigned for data and so is not available for notifications. Since COPY is more typically used for ETL than a bare-select, in addition to back-compatibility concerns, this default behavior seems reasonable. Would it be possible to store the n somewhere and provide a command - like GET DIAGNOSTICS in pl/pgsql - if the user really wants to know how many rows were sent to STDOUT? I'm doubt this is even useful in the typical use-case for COPY TO STDOUT but figured I'd toss the idea out there. Is there anything besides a desire for consistency that anyone has or can put forth as a use-case for COPY TO STDOUT emitting COPY n on STDOUT as well? If you are going to view the content inline, and also want a quick count, ISTM you would be more likely to use SELECT to take advantage of all its pretty-print features. If we really need to cater to this use then maybe a --loud-copy-to-stdout switch can be provided to override its default quiet-mode. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/COPY-table-FROM-STDIN-doesn-t-show-count-tag-tp5775018p5795611.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
Thanks for your efforts! Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. First of all, it seems to me your test case has too small data set that allows to hold all the data in memory - briefly 500K of 200bytes record will consume about 100MB. Your configuration allocates 512MB of shared_buffer, and about 3GB of OS-level page cache is available. (Note that Linux uses free memory as disk cache adaptively.) This cache is designed to hide latency of disk accesses, so this test case does not fit its intention. (Also, the primary purpose of this module is a demonstration for heap_page_prune_hook to hook vacuuming, so simple code was preferred than complicated implementation but better performance.) I could reproduce the overall trend, no cache scan is faster than cached scan if buffer is in memory. Probably, it comes from the cost to walk down T-tree index using ctid per reference. Performance penalty around UPDATE and DELETE likely come from trigger invocation per row. I could observe performance gain on INSERT a little bit. It's strange for me, also. :-( On the other hand, the discussion around custom-plan interface effects this module because it uses this API as foundation. Please wait for a few days to rebase the cache_scan module onto the newer custom-plan interface; that I submitted just a moment before. Also, is it really necessary to tune the performance stuff in this example module of the heap_page_prune_hook? Even though I have a few ideas to improve the cache performance, like insertion of multiple rows at once or local chunk copy instead of t-tree walk down, I'm not sure whether it is productive in the current v9.4 timeframe. ;-( Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Haribabu Kommi Sent: Wednesday, March 12, 2014 1:14 PM To: Kohei KaiGai Cc: Kaigai Kouhei(海外 浩平); Tom Lane; PgHacker; Robert Haas Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?) On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: I will update you later regarding the performance test results. I ran the performance test on the cache scan patch and below are the readings. Configuration: Shared_buffers - 512MB cache_scan.num_blocks - 600 checkpoint_segments - 255 Machine: OS - centos - 6.4 CPU - 4 core 2.5 GHZ Memory - 4GB Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Wed, Mar 12, 2014 at 5:26 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Thanks for your efforts! Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. First of all, it seems to me your test case has too small data set that allows to hold all the data in memory - briefly 500K of 200bytes record will consume about 100MB. Your configuration allocates 512MB of shared_buffer, and about 3GB of OS-level page cache is available. (Note that Linux uses free memory as disk cache adaptively.) Thanks for the information and a small correction. The Total number of records are 5 million. The select operation is selecting 500K records. The total table size is around 1GB. Once I get your new patch re-based on the custom scan patch, I will test the performance again by increasing my database size more than the RAM size. And also I will make sure that memory available for disk cache is less. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty
On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Mar 10, 2014 at 1:13 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com wrote: Enclosed is the patch to implement the requirement that issue log message to suggest VACUUM FULL if a table is nearly empty. The requirement comes from the Postgresql TODO list. I think it would be better if we can use some existing stats to issue warning message rather than traversing the FSM for all pages. For example after vacuuming page in lazy_scan_heap(), we update the freespace for page. You can refer below line in lazy_scan_heap(). freespace = PageGetHeapFreeSpace(page); Now it might be possible that we might not get freespace info easily as it is not accumulated for previous vacuum's. Incase there is no viable way to get it through vacuum stats, we are already updating fsm after vacuum by FreeSpaceMapVacuum(), where I think it should be possible to get freespace. yes this way it works without extra penalty. But the problem is how to calculate the free space which is left in the skipped pages because of visibility bit. One way could be by extrapolating (vac_estimate_reltuples) like we do for some other stats, but not sure if we can get the correct estimates. The main reason is that if you observe that code path, all the decisions are mainly done on the basis of vacrelstats. I have not checked in detail if by using any other stats, this purpose can be achieved, may be once you can look into it. I checked the vac_estimate_reltuples() function, but not able to find a proper way to identify the free space. By the way have you checked if FreeSpaceMapVacuum() can serve your purpose, because this call already traverses FSM in depth-first order to update the freespace. So may be by using this call or wrapper on this such that it returns total freespace as well apart from updating freespace can serve the need. Thanks for information. we can get the table free space by writing some wrapper or modify a little bit of FreeSpaceMapVacuum() function. This way it will not add any extra overhead in identifying the table is almost empty or not. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire
On Tue, Mar 11, 2014 at 2:32 PM, Christian Kruse christ...@2ndquadrant.com wrote: On 11/03/14 13:23, Amit Kapila wrote: Could you please once check (if you are comfortable doing so) wherever this patch is passing tuple, whether it is okay to pass it based on visibility rules, else I will again verify it once. I think I got all places, but it would be nice to have a confirmation. How about in IndexBuildHeapScan()? This also uses SnapShotAny to scan the heap data. Normally in this function, passing tuple to lock routine should not be a problem as it would have ensured to have exclusive lock on relation before reaching this stage, but below comment in this function leads me to think, that there can be problem during system catalog scan. /* * Since caller should hold ShareLock or better, normally * the only way to see this is if it was inserted earlier * in our own transaction. However, it can happen in * system catalogs, since we tend to release write lock * before commit there. Give a warning if neither case * applies. */ I could not immediately think of testcase which can validate it, but this is certainly a point to ponder. Do you think it is safe to pass tuple to XactLockTableWaitWithInfo() in this function? I think now other things in your patch are good, just these tuple visibility validations are tricky and it is taking time to validate the paths, because these gets called in nested paths where in few cases even dirty or snapshot any scans also seems to be safe w.r.t displaying tuple. Anyway, today I have checked most paths, may be one more time I will give a look with fresh mind and then pass on to Committer. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
On Mar 12, 2014 1:46 AM, Josh Berkus j...@agliodbs.com wrote: On 03/11/2014 06:57 AM, Tom Lane wrote: Mind you, I wouldn't be unhappy to see it go away; it's a kluge and always has been. I'm just expecting lots of push-back if we try. And it's kind of hard to resist push-back when you don't have a substitute to offer. Yeah, what we really need is encapsulated per-DB users and local superusers. I think every agrees that this is the goal, but nobody wants to put in the work to implement a generalized solution. Encapsulated would probably be the doable part. But local superuser? Given that a superuser can load and run binaries, how would you propose you restrict that superuser from doing anything they want? And if you don't need that functionality, then hows it really different from being the database owner? /Magnus
Re: [HACKERS] The case against multixact GUCs
Josh Berkus wrote: What makes these GUCs worse is that nobody knows how to set them; nobody on this list and nobody in the field. Heck, I doubt 1 in 1000 of our users (or 1 in 10 people on this list) know what a multixact *is*. I won't contend your first statement, but multixacts are explained in the documentation: http://www.postgresql.org/docs/9.3/static/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On 03/10/2014 09:44 PM, Robert Haas wrote: On Mon, Apr 30, 2012 at 1:34 PM, Noah Misch n...@leadboat.com wrote: When GIN changes a metapage, we WAL-log its ex-header content and never use a backup block. This reduces WAL volume since the vast majority of the metapage is unused. However, ginRedoUpdateMetapage() only restores the WAL-logged content if the metapage LSN predates the WAL record LSN. If a metapage write tore and updated the LSN but not the other content, we would fail to complete the update. Instead, unconditionally reinitialize the metapage similar to how _bt_restore_meta() handles the situation. I found this problem by code reading and did not attempt to build a test case illustrating its practical consequences. It's possible that there's no problem in practice on account of some reason I haven't contemplated. The attached patch doesn't apply any more, but it looks like this issue still exists. Fixed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
From: Alvaro Herrera alvhe...@2ndquadrant.com MauMau escribió: Do you know how I can reply to an email which was deleted locally? I thought I could download an old mail by clicking raw link and import it to the mailer. However, it requires username/password input, and it seems to be different from the one for editing CommitFest. I couldn't find how to authenticate myself. The box that asks for password tells you what the user/password is. I think it's something like archives/archives or similar. The password is there only to keep spammers out, not to have any real auth. Thank you, the user/password was certainly displayed in the box -- archives/antispam. The raw link only gave the mail in text format. I hoped to import the mail into Windows Mail on Windows Vista, but I couldn't. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is SPI safe to use in multi-threaded PL/Java?
From: Tom Lane t...@sss.pgh.pa.us That, more or less. There is exactly zero provision in the Postgres code for multiple threads to exist inside a backend process. It's possible that PL/Java manages to completely insulate the Java world from the C world, so that the C code never sees more than one thread. But any leakage at all in that abstraction is probably going to cause bugs; and as I said, we (PG hackers) are not going to consider such bugs to be our problem. On platforms where the standard libc supports threading (which is most, these days), I'd be particularly worried about leakage along the path java - libc - postgres. If libc becomes aware that there are multiple threads executing inside the process, it's likely to change behaviors. I see... even Tom-san is suspicious about the PL/Java's design, or the use of SPI from code linked with libpthread.so. I'll communicate this to the PL/Java community. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On Wed, Mar 12, 2014 at 4:23 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: The attached patch doesn't apply any more, but it looks like this issue still exists. Fixed. Did you forget to push? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Torn page hazard in ginRedoUpdateMetapage()
On 03/12/2014 02:05 PM, Robert Haas wrote: On Wed, Mar 12, 2014 at 4:23 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: The attached patch doesn't apply any more, but it looks like this issue still exists. Fixed. Did you forget to push? Yep. Pushed now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Store Extension Options
On Tue, Mar 11, 2014 at 8:42 PM, Simon Riggs si...@2ndquadrant.com wrote: On 11 March 2014 18:33, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: -1 to *requiring* validation for table-level options for exactly the same reasons we no longer validate custom GUCs. Well, that is an interesting analogy, but I'm not sure how much it applies here. In the case of a GUC, you can fairly easily validate it once the module does get loaded (and before the module actually tries to do anything with it). I don't see how that's going to work for table options. I trust nobody is seriously proposing that on module load, we're going to scan the whole of pg_class looking to see if there are incorrect settings. (Even if we did, what would we do about it? Not try to force a pg_class update, for sure. And what if the module is loading into the postmaster thanks to a preload spec?) Thank goodness for that. Strict validation does seem scary. I don't really think partial validation makes sense. We could just remove the whole topic, and tell extension authors that it's up to them to defend themselves against bizarre values stored for their table options. But I'm wondering if there's really so much use-case for a feature like that. DBAs are fairly used to the idea that if you put crap data in the database then bad things happen. We provide the table, they provide the data. Validation is possible, but not enforced as essential. (Except in terms of the datatype - but then we are also validating data to specific types here). So I think that DBAs will also cope rather well with table-level options without us nannying them. There is nothing more annoying that needing to run scripts in a specific sequence to make them work, or dumps that fail because certain modules aren't loaded yet (or cannot ever be so). And maybe the DBA wants to annotate tables based on a design and then later move to implement modules to take advantage of the annotation. Having an option be set and yet be unvalidated and/or unused is no more annoying than having a column in a table that is known incorrect and/or not accessed. Searching for badly set options needs to be possible, even easy, but hard validation can cause problems. And if we try and force it, whats to stop people from using a dummy validator just to circumvent the strictness? Then I think my patch is more adherent given these conclusions, except by the some adjustments suggested by Tom Lane and mentioned by Alvaro Herrera [1]. Am I correct? [1] http://www.postgresql.org/message-id/20140307205649.gf4...@eldon.alvh.no-ip.org -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] db_user_namespace a temporary measure
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 03/11/2014 09:37 PM, Tom Lane wrote: In particular, I'd like to see an exclusion that prevents local users from having the same name as any global user, so that we don't have ambiguity in GRANT and similar commands. This doesn't seem simple to enforce (if we supported partial indexes on system catalogs, it would be ...) but surely this representation is more amenable to enforcing it than the existing one. Should be workable if you're creating a local name - just check against the list of global roles. Concurrent creations won't be safe without some sort of locking scheme. A unique index would be a lot better way of plugging that hole than a system-wide lock on user creation. But not sure how to define a unique index that allows (joe, db1) to coexist with (joe, db2) but not with (joe, 0). Isn't this just a case of creating a suitable operator and an exclusion constraint? Defining the constraint in BKI might require extra infrastructure, but it should be possible. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] git-review: linking commits to review discussion in git
From: Murtuza Mukadam murtuza.i.muka...@gmail.com Date: Sat, Mar 8, 2014 at 1:56 PM Subject: Re: [HACKERS] git-review: linking commits to review discussion in git To: Heikki Linnakangas hlinnakan...@vmware.com Cc: pgsql-hackers@postgresql.org Hi Heikki We have linked git commits and reviews in a web interface. If you enter a commit hash, you will be redirected to the email archive of the peer review discussion: http://cesel.encs.concordia.ca/git-reviewed-tracker.php This work is part of my thesis, so feedback is much appreciated. If you have another git repo and mailing lists that you'd like linked, please let us know. Cheers, Murtuza How do we do the linking? We take each email patch, eliminate white space and hash each line. We then compare the lines with those in commits to the same files. The commit that changes the same files and has the largest number of matching lines is considered to be the reviewed commit. On Tue, Jan 28, 2014 at 2:10 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/27/2014 11:36 PM, Murtuza Mukadam wrote: Hello All, We have linked peer review discussions on 'pgsql-hackers' to their respective commits within the main postgresql.git repository. You can view the linked reviews from 2012 until present in the GitHub repo at https://github.com/mmukadam/postgres/tree/review If you want to work with these reviews locally, you can use our git-review tool. It allows you to create reviews and attach them to commits in git. We didn't modify git, instead we added some scripts that use standard git commands. git-review is beta, but since it only adds a detached 'review' branch and modifies the contents of this branch, it has minimal impact and can easily be removed by deleting the 'review' branch and scripts. The online man-page is here: http://users.encs.concordia.ca/~m_mukada/git-review.html In order to install git-review, you need to clone the repository: https://github.com/mmukadam/git-review.git The online tutorial is available here: http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html The clone of postgresql.git with linked review discussion is here (new review discussion are linked nightly) https://github.com/mmukadam/postgres This work is part of my Master's thesis. If you'd like us to change the tool to better suit your review process, have another git repo you'd like us to link commits with review discussion, or have other feedback, please let us know. I don't understand what this does. The repository at https://github.com/mmukadam/postgres looks like just a clone of the main PostgreSQL repository, with no extra links anywhere. And the repository at https://github.com/mmukadam/postgres/tree/review looks like a mailing list archive turned into a git repository, but I don't see any links to the commits in the main repository there. Am I missing something? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Rowtype column and domain subfield with DEFAULT and NOT NULL constraint
Hi, A customer has reported us a strange behaviour regarding a rowtype column with a domain subfield: test=# CREATE DOMAIN my_int_not_null_1 AS INTEGER DEFAULT 1 NOT NULL; CREATE DOMAIN test=# CREATE TYPE my_int_rowtype AS ( test(# f1 INTEGER, test(# f2 my_int_not_null_1 test(# ); CREATE TYPE test=# CREATE TABLE test (id SERIAL, col1 my_int_rowtype); CREATE TABLE test=# INSERT INTO test (col1.f1) VALUES (1); INSERT 0 1 test=# INSERT INTO test (id, col1.f1) VALUES (2, 1); INSERT 0 1 test=# INSERT INTO test (col1) VALUES ((1,NULL)); ERROR: domain my_int_not_null_1 does not allow null values test=# SELECT * FROM test; id | col1 +-- 1 | (1,) 2 | (1,) (2 rows) It seems: - the DEFAULT value (from the domain) is not inserted - the NOT NULL constraint is no applied excepting if we set explicitly the value to NULL, looks like it is the same issue than before, when there is no DEFAULT the parser/rewriter should set the column/subfield to NULL. Is build_column_default() the right place to handle that case ? -- Julien Tachoires http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 12 Březen 2014, 0:41, Peter Geoghegan wrote: On Tue, Mar 11, 2014 at 3:58 PM, Tomas Vondra t...@fuzzy.cz wrote: ERROR: index row size 1416 exceeds maximum 1352 for index gin_idx All index AMs have similar restrictions. Yes, I know and I have no problem with restrictions in general. You may run into similar issues with btree indexes on text columns with long text, for example. The thing is that people don't generally index text directly, because it usually does not make much sense, but using tsvector etc. But with jsonb it's more likely because indexing is one of the goodies (at least for me). And the discussions with several people interested in storing json data I had recently went often like this: me: It seems we'll have a better json datatype in 9.4. them: Nice! me: And it will be possible to do searches on arbitrary keys. them: Yay! me: And we actually got pretty significant improvements in GIN indexes. them: Awesome! me: But the values you may index need to be less than ~1500B. them: Bummer :-( me: Well, you can use GIST then. A good example of such header is dkim-signature which basically contains the whole message digitally signed with DKIM. The signature tends to be long and non-compressible, thanks to the signature. I'm wondering what's the best way around this, because I suspect many new users (especially those attracted by jsonb and GIN improvements) will run into this. Maybe not immediately, but eventully they'll try to insert a jsonb with long value, and it will fail ... The jsonb_hash_ops operator class just stores a 32-bit integer hash value (it always sets the recheck flag, which only some of the other default GIN opclass' strategies do). It only supports containment, and not the full variety of operators that the default opclass supports, which is why it isn't the default. I think that in practice the general recommendation will be that when indexing at the top level, use jsonb_hash_ops. When indexing nested items, use the more flexible default GIN opclass. That seems like a pretty smart trade-off to me. OK, I'll look into the jsonb_hash_ops - that sounds more or less like what I was thinking about (and sure, storing hashes makes some operations impossible to support). The other thing I was thinking about is introducing some kind of upper limit for the value length - e.g. index just the first 1kB, or something like that. My experience is most values are way shorter, or actually differ in the first 1kB, so this should allow most decisions to be made. But I'm not really that familiar with how GIN works, so maybe this is nonsense. The more I think about it, the more inclined I am to lose GiST support entirely for the time being. It lets us throw out about 700 lines of C code, which is a very significant fraction of the total, removes the one open bug, and removes the least understood part of the code. The GiST opclass is not particularly compelling for this. I disagree with that. I see GiST as a simple fallback option for the cases I described. I wasn't able to create a GIN index because of exceeding the max item length, but GiST created just fine. It was considerably slower, but it worked. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
On 2014-03-12 10:03:42 -0300, Alvaro Herrera wrote: Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 03/11/2014 09:37 PM, Tom Lane wrote: In particular, I'd like to see an exclusion that prevents local users from having the same name as any global user, so that we don't have ambiguity in GRANT and similar commands. This doesn't seem simple to enforce (if we supported partial indexes on system catalogs, it would be ...) but surely this representation is more amenable to enforcing it than the existing one. Should be workable if you're creating a local name - just check against the list of global roles. Concurrent creations won't be safe without some sort of locking scheme. A unique index would be a lot better way of plugging that hole than a system-wide lock on user creation. But not sure how to define a unique index that allows (joe, db1) to coexist with (joe, db2) but not with (joe, 0). Isn't this just a case of creating a suitable operator and an exclusion constraint? Defining the constraint in BKI might require extra infrastructure, but it should be possible. Except that we don't have the infrastructure to perform such checks (neither partial, nor expression indexes, no exclusion constraints) on system tables atm. So it's not a entirely trivial thing to do. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 12 Březen 2014, 0:51, Peter Geoghegan wrote: On Tue, Mar 11, 2014 at 4:41 PM, Peter Geoghegan p...@heroku.com wrote: I think that in practice the general recommendation will be that when indexing at the top level, use jsonb_hash_ops. When indexing nested items, use the more flexible default GIN opclass. That seems like a pretty smart trade-off to me. By which I mean: index nested items using an expressional GIN index. I'm still not sure how would that look. Does that mean I'd have to create multiple GIN indexes - one for each possible key or something like that? Can you give an example? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Postgresql XML parsing
Hi, I am working on adding a functionality to PostgreSQL. I need to parse the XML format query plan (produced by PostgreSQL v9.3) and save it in a simple data structure (say C structure). I was wondering if PostgreSQL already had any parsing functions implemented that I can use to do the XML parsing. This is getting difficult as I was not able to get any DTD or XML Schema for the XML files generated by PostgreSQL. I found the files xpath.c/xslt_proc.c files that contain parsing related code, but none of the functions are being called for any xml related query I issue to the database and some of the functions in those files mention as deprecated. It would be greatly helpful if someone could guide me on this. Thank you. -- Regards, Ashoke
Re: [HACKERS] Postgresql XML parsing
On 03/12/2014 09:36 AM, Ashoke wrote: Hi, I am working on adding a functionality to PostgreSQL. I need to parse the XML format query plan (produced by PostgreSQL v9.3) and save it in a simple data structure (say C structure). I was wondering if PostgreSQL already had any parsing functions implemented that I can use to do the XML parsing. This is getting difficult as I was not able to get any DTD or XML Schema for the XML files generated by PostgreSQL. I found the files xpath.c/xslt_proc.c files that contain parsing related code, but none of the functions are being called for any xml related query I issue to the database and some of the functions in those files mention as deprecated. It would be greatly helpful if someone could guide me on this. The only XML parsing we have is where Postgres is built with libxml, in which case we use its parser. But query plan XML is delivered to a client (or a log file, which means more or less the same thing here). If you want to parse it then it should be parsed in the client - that's why we provide it. Inside postgres I don't see a point in parsing the XML rather than handling the query plan directly. The worst possible option would be to make a hand-cut XML parser, either in the client or the server - XML parsing has all sorts of wrinkles that can bite you badly. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pgstat wait timeout (RE: contrib/cache_scan)
It is another topic from the main thread, I noticed the following message under the test cases that takes heavy INSERT workload; provided by Haribabu. [kaigai@iwashi ~]$ createdb mytest [kaigai@iwashi ~]$ psql -af ~/cache_scan.sql mytest \timing Timing is on. --cache scan select 5 million create table test(f1 int, f2 char(70), f3 float, f4 char(100)); CREATE TABLE Time: 22.373 ms truncate table test; TRUNCATE TABLE Time: 17.705 ms insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); WARNING: pgstat wait timeout WARNING: pgstat wait timeout WARNING: pgstat wait timeout WARNING: pgstat wait timeout : Once I got above messages, write performance is dramatically degraded, even though I didn't take detailed investigation. I could reproduce it on the latest master branch without my enhancement, so I guess it is not a problem something special to me. One other strangeness is, right now, this problem is only happen on my virtual machine environment - VMware ESXi 5.5.0. I couldn't reproduce the problem on my physical environment (Fedora20, core i5-4570S). Any ideas? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai Sent: Wednesday, March 12, 2014 3:26 PM To: Haribabu Kommi; Kohei KaiGai Cc: Tom Lane; PgHacker; Robert Haas Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?) Thanks for your efforts! Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. First of all, it seems to me your test case has too small data set that allows to hold all the data in memory - briefly 500K of 200bytes record will consume about 100MB. Your configuration allocates 512MB of shared_buffer, and about 3GB of OS-level page cache is available. (Note that Linux uses free memory as disk cache adaptively.) This cache is designed to hide latency of disk accesses, so this test case does not fit its intention. (Also, the primary purpose of this module is a demonstration for heap_page_prune_hook to hook vacuuming, so simple code was preferred than complicated implementation but better performance.) I could reproduce the overall trend, no cache scan is faster than cached scan if buffer is in memory. Probably, it comes from the cost to walk down T-tree index using ctid per reference. Performance penalty around UPDATE and DELETE likely come from trigger invocation per row. I could observe performance gain on INSERT a little bit. It's strange for me, also. :-( On the other hand, the discussion around custom-plan interface effects this module because it uses this API as foundation. Please wait for a few days to rebase the cache_scan module onto the newer custom-plan interface; that I submitted just a moment before. Also, is it really necessary to tune the performance stuff in this example module of the heap_page_prune_hook? Even though I have a few ideas to improve the cache performance, like insertion of multiple rows at once or local chunk copy instead of t-tree walk down, I'm not sure whether it is productive in the current v9.4 timeframe. ;-( Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Haribabu Kommi Sent: Wednesday, March 12, 2014 1:14 PM To: Kohei KaiGai Cc: Kaigai Kouhei(海外 浩平); Tom Lane; PgHacker; Robert Haas Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?) On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: I will update you later regarding the performance test results. I ran the performance test on the cache scan patch and below are the readings. Configuration: Shared_buffers - 512MB cache_scan.num_blocks - 600 checkpoint_segments - 255 Machine: OS - centos - 6.4 CPU - 4 core 2.5 GHZ Memory - 4GB Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it
Re: [HACKERS] Rowtype column and domain subfield with DEFAULT and NOT NULL constraint
Julien Tachoires julien.tachoi...@dalibo.com writes: A customer has reported us a strange behaviour regarding a rowtype column with a domain subfield: Rowtypes in general do not support defaults for component fields. Is build_column_default() the right place to handle that case ? It's unlikely that this is simple to change. As an example, should the default be inserted during a cast to the rowtype? How about plpgsql variable initialization? What are you going to do about scalar-NULL values of the rowtype (note forbid them is likely to cause all sorts of collateral damage)? But in any case, none of the examples you showed have anything to do with build_column_default(). That would only get applied if the INSERT's targetlist didn't mention col1 at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout (RE: contrib/cache_scan)
Kouhei Kaigai kai...@ak.jp.nec.com writes: WARNING: pgstat wait timeout WARNING: pgstat wait timeout WARNING: pgstat wait timeout WARNING: pgstat wait timeout Once I got above messages, write performance is dramatically degraded, even though I didn't take detailed investigation. I could reproduce it on the latest master branch without my enhancement, so I guess it is not a problem something special to me. One other strangeness is, right now, this problem is only happen on my virtual machine environment - VMware ESXi 5.5.0. I couldn't reproduce the problem on my physical environment (Fedora20, core i5-4570S). We've seen sporadic reports of that sort of behavior for years, but no developer has ever been able to reproduce it reliably. Now that you've got a reproducible case, do you want to poke into it and see what's going on? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
Jaime Casanova ja...@2ndquadrant.com writes: On Tue, Mar 11, 2014 at 10:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: But not sure how to define a unique index that allows (joe, db1) to coexist with (joe, db2) but not with (joe, 0). and why you want that restriction? So that if I say GRANT SELECT ON mytable TO joe, it's unambiguous which user I'm granting to. There should be at most one joe that can access any given database. If we don't have such a restriction, we'll need syntax kluges in GRANT, ALTER OWNER, and probably other commands to disambiguate whether a local or global user is meant. Right now with the db_user_namespace mechanism, you have to say GRANT ... TO joe@db1 if you were granting to a local user in db1. That's unambiguous all right, but it has little else to recommend it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
Magnus Hagander mag...@hagander.net writes: Yeah, what we really need is encapsulated per-DB users and local superusers. I think every agrees that this is the goal, but nobody wants to put in the work to implement a generalized solution. Encapsulated would probably be the doable part. But local superuser? Given that a superuser can load and run binaries, how would you propose you restrict that superuser from doing anything they want? And if you don't need that functionality, then hows it really different from being the database owner? A local user with the superuser privilege would not be able to log into another database, because superuser doesn't give you any extra privilege until you've logged in. Yeah, as superuser you could still break things as much as you pleased, but not through SQL. I share your doubts as to how useful such a concept actually is, but it'd work if we had real local users. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
On Wed, Mar 12, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Yeah, what we really need is encapsulated per-DB users and local superusers. I think every agrees that this is the goal, but nobody wants to put in the work to implement a generalized solution. Encapsulated would probably be the doable part. But local superuser? Given that a superuser can load and run binaries, how would you propose you restrict that superuser from doing anything they want? And if you don't need that functionality, then hows it really different from being the database owner? A local user with the superuser privilege would not be able to log into another database, because superuser doesn't give you any extra privilege until you've logged in. Yeah, as superuser you could still break things as much as you pleased, but not through SQL. You could COPY over the hba file or sometihng like that :) Or just pg_read_binary_file() on the files in another database, which is accessible through SQL as well. I share your doubts as to how useful such a concept actually is, but it'd work if we had real local users. It can also do interesting things like ALTER SYSTEM, replication, backups, etc. All of which could be used to escalate privileges beyond the local database. So you'd have to somehow restrict those, at which point what's the point of the property in the first place? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] db_user_namespace a temporary measure
* Magnus Hagander (mag...@hagander.net) wrote: On Wed, Mar 12, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: I share your doubts as to how useful such a concept actually is, but it'd work if we had real local users. It can also do interesting things like ALTER SYSTEM, replication, backups, etc. All of which could be used to escalate privileges beyond the local database. Probably DROP ROLE for global users too. So you'd have to somehow restrict those, at which point what's the point of the property in the first place? We've been asked quite often for a not-quite-superuser, as in, one which can bypass the normal GRANT-based permission system but which can't do things like create untrusted functions or do other particularly bad activities. I can certainly see value in that. Another oft-requested option is a read-only role which pg_dump or an auditor could use. Anyway, this is getting a bit far afield from the original discussion, which looked like it might actually be heading somewhere interesting.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] db_user_namespace a temporary measure
Magnus Hagander mag...@hagander.net writes: On Wed, Mar 12, 2014 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: A local user with the superuser privilege would not be able to log into another database, because superuser doesn't give you any extra privilege until you've logged in. Yeah, as superuser you could still break things as much as you pleased, but not through SQL. You could COPY over the hba file or sometihng like that :) Or just pg_read_binary_file() on the files in another database, which is accessible through SQL as well. More directly, he could alter pg_authid to make himself a not-local user. But I don't see that it's our responsibility to prevent that. As long as the combination of features works in a straightforward way, I'm happy with it --- and it would, AFAICS. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
* Tom Lane (t...@sss.pgh.pa.us) wrote: Magnus Hagander mag...@hagander.net writes: You could COPY over the hba file or sometihng like that :) Or just pg_read_binary_file() on the files in another database, which is accessible through SQL as well. More directly, he could alter pg_authid to make himself a not-local user. But I don't see that it's our responsibility to prevent that. As long as the combination of features works in a straightforward way, I'm happy with it --- and it would, AFAICS. That depends on exactly what you mean by 'those features'. There's quite a difference between you can set the superuser flag on a local user and then that user will be a superuser and a local user with superuser flag will only be able to impact the database they are local to. I agree that there's nothing stopping us from having a local user which is marked as a superuser from a technical level. What Magnus and I are worried about is the *implication* of such a configuration is and what the user will think it means. Specifically, there will be an assumption that local users can only access or impact the databases which they have access to, which wouldn't be accurate for a local user who is a superuser. Certainly, documenting this would help with that but with as many warnings as we'd have to put up about that being dangerous and that it isn't actually going to prevent that superuser from accessing the other databases if they really wanted to, or prevent them from making a global superuser account, etc, I'm just not convinced that it's worth it for the feature of allowing a local account to be set as superuser- I don't see a huge use-case there. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgstat wait timeout (RE: contrib/cache_scan)
On 12 Březen 2014, 14:54, Kouhei Kaigai wrote: It is another topic from the main thread, I noticed the following message under the test cases that takes heavy INSERT workload; provided by Haribabu. [kaigai@iwashi ~]$ createdb mytest [kaigai@iwashi ~]$ psql -af ~/cache_scan.sql mytest \timing Timing is on. --cache scan select 5 million create table test(f1 int, f2 char(70), f3 float, f4 char(100)); CREATE TABLE Time: 22.373 ms truncate table test; TRUNCATE TABLE Time: 17.705 ms insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); WARNING: pgstat wait timeout WARNING: pgstat wait timeout WARNING: pgstat wait timeout WARNING: pgstat wait timeout : Once I got above messages, write performance is dramatically degraded, even though I didn't take detailed investigation. I could reproduce it on the latest master branch without my enhancement, so I guess it is not a problem something special to me. One other strangeness is, right now, this problem is only happen on my virtual machine environment - VMware ESXi 5.5.0. I couldn't reproduce the problem on my physical environment (Fedora20, core i5-4570S). Any ideas? I've seen this happening in cases when it was impossible to write the stat file for some reason. IIRC there were two basic causes I've seen in the past: (1) writing the stat copy failed - for example when the temporary stat directory was placed in tmpfs, but it was too small (2) writing the stat copy took too long - e.g. with tmpfs and memory pressure, forcing the system to swap to free space for the stat copy (3) IIRC the inquiry (backend - postmaster) to write the file is sent using UDP, which may be dropped in some cases (e.g. when the system is overloaded), so the postmaster does not even know it should write the file I'm not familiar with VMware ESXi virtualization, but I suppose it might be relevant to all three causes. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
Rajeev rastogi rajeev.rast...@huawei.com writes: On 11 March 2014 19:52, Tom Lane wrote: After sleeping on it, I'm inclined to think we should continue to not print status for COPY TO STDOUT. Aside from the risk of breaking scripts, there's a decent analogy to be made to SELECT: we don't print a status tag for that either. It is correct that SELECT does not print conventional way of status tag but still it prints the number of rows selected (e.g. (2 rows)) along with rows actual value, which can be very well considered as kind of status. User can make out with this result, that how many rows have been selected. But in-case of COPY TO STDOUT, if we don't print anything, then user does not have any direct way of finding that how many rows were copied from table to STDOUT, which might have been very useful. Uh, you mean other than the data rows that were just printed? I fail to see how this is much different from the SELECT case: regression=# \copy int8_tbl to stdout 123 456 123 4567890123456789 4567890123456789123 45678901234567894567890123456789 4567890123456789-4567890123456789 regression=# (Note that I'm defining TO STDOUT from psql's perspective, ie the rows are going to the queryFout file, which is the same place the COPY status would get printed to.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 02/26/2014 11:25 PM, Alexander Korotkov wrote: On Thu, Feb 27, 2014 at 1:07 AM, Alexander Korotkov aekorot...@gmail.comwrote: On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/09/2014 12:11 PM, Alexander Korotkov wrote: I've rebased catalog changes with last master. Patch is attached. I've rerun my test suite with both last master ('committed') and attached patch ('ternary-consistent'). Thanks! method | sum +-- committed | 143491.71501 fast-scan-11 | 126916.11199 fast-scan-light| 137321.211 fast-scan-light-heikki | 138168.02801 master | 446976.288 ternary-consistent | 125923.514 I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8 to 4. Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make sure we get similar behavior in Tomas' tests that used 6 search terms. But I always felt that it was too large for real queries, once we have the catalog changes, that's why I lowered to 4 when committing. If an opclass benefits greatly from fast scan, it should provide the ternary consistent function, and not rely on the shim implementation. I'm not sure about decision to reserve separate procedure number for ternary consistent. Probably, it would be better to add ginConfig method. It would be useful for post 9.4 improvements. Hmm, it might be useful for an opclass to provide both, a boolean and ternary consistent function, if the boolean version is significantly more efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a quick check through the array to see if there are any MAYBE arguments, within the consistent function. But I'm inclined to keep the possibility to provide both versions. As long as we support the boolean version at all, there's not much difference in terms of the amount of code to support having them both for the same opclass. A ginConfig could be useful for many other things, but I don't think it's worth adding it now. What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck? We discussed that earlier, but didn't reach any conclusion. That needs to be clarified in the docs. One possibility is to document that they're equivalent. Another is to forbid one of them. Yet another is to assign a different meaning to each. I've been thinking that it might be useful to define them so that a MAYBE result from the tri-consistent function means that it cannot decide if you have a match or not, because some of the inputs were MAYBE. And TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or FALSE, the result would be the same, TRUE+recheck. The practical difference would be that if the tri-consistent function returns TRUE+recheck, ginget.c wouldn't need to bother fetching the other entries, it could just return the entry with recheck=true immediately. While with MAYBE result, it would fetch the other entries and call tri-consistent again. ginget.c doesn't currently use the tri-consistent function that way - it always fetches all the entries for a potential match before calling tri-consistent, but it could. I had it do that in some of the patch versions, but Tomas' testing showed that it was a big loss on some queries, because the consistent function was called much more often. Still, something like that might be sensible in the future, so it might be good to distinguish those cases in the API now. Note that ginarrayproc is already using the return values like that: in GinContainedStrategy, it always returns TRUE+recheck regardless of the inputs, but in other cases it uses GIN_MAYBE. Next revision of patch is attached. In this version opclass should provide at least one consistent function: binary or ternary. It's expected to achieve best performance when opclass provide both of them. However, tests shows opposite :( I've to recheck it carefully. However, it's not! This revision of patch completes my test case in 123330 ms. This is slightly faster than previous revision. Great. Committed, I finally got around to it. Some minor changes: I reworded the docs and also updated the table of support functions in xindex.sgml. I refactored the query in opr_sanity.sql, to make it easier to read, and to check more carefully that the required support functions are present. I also added a runtime check to avoid crashing if neither is present. A few things we ought to still discuss: * I just noticed that the dummy trueTriConsistentFn returns GIN_MAYBE, rather than GIN_TRUE. The equivalent boolean version returns 'true' without recheck. Is that a typo, or was there some reason for the discrepancy? * Come to think of it, I'm not too happy with the name GinLogicValue. It's too vague. It ought to include ternary or tri-value or something like that. How about renaming it to
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
David Johnston pol...@yahoo.com writes: Tom Lane-2 wrote 1. Treat this as a non-backwards-compatible change, and document that people have to use -q if they don't want the COPY tag in the output. I'm not sure this is acceptable. I've mostly used copy to with files and so wouldn't mind if STDOUT had the COPY n sent to it as long as the target file is just the copy contents. I think you're missing the point: the case I'm concerned about is exactly that the target file is psql's stdout, or more specifically the same place that the COPY status would get printed to. 2. Kluge ProcessResult so that it continues to not pass back a PGresult for the COPY TO STDOUT case, or does so only in limited circumstances (perhaps only if isatty(stdout), for instance). The main problem with this is that people will test by sending output to a TTY and see the COPY n. Although if it can be done consistently then you minimize backward incompatibility and encourage people to enforce quiet mode while the command runs... Yeah, the inconsistency of behavior that this solution would cause is not a good thing. My inclination now (see later traffic) is to suppress the status report when the COPY destination is the same as pset.queryFout (ie, a simple test whether the FILE pointers are equal). This would suppress the status report for \copy to stdout and COPY TO STDOUT cases, and also for \copy to pstdout if you'd not redirected queryFout with \o. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case against multixact GUCs
On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus j...@agliodbs.com wrote: In the 9.3.3 updates, we added three new GUCs to control multixact freezing. This was an unprecented move in my memory -- I can't recall ever adding a GUC to a minor release which wasn't backwards compatibility for a security fix. This was a mistake. I disagree. I think it was the right decision. I think it was a mistake not including all of that stuff in the first place, and I think it's good that we've now corrected that oversight. What makes these GUCs worse is that nobody knows how to set them; nobody on this list and nobody in the field. Heck, I doubt 1 in 1000 of our users (or 1 in 10 people on this list) know what a multixact *is*. Yeah, and that's a problem. See, it turns out that we love periodic full-table scans to freeze xmin so much that, in 9.3, we committed to a design that requires us to make periodic full-table scans to freeze xmax, too. That may or may not have been a good decision, but at this point we're stuck with it. People are going to have to come to understand the requirements there just as they do for freezing xmin. Denying the user the ability to adjust the thresholds is not going to accelerate the process of figuring out how they should be set. Further, there's no clear justification why these cannot be set to be the same as our other freeze ages (which our users also don't understand), or a constant calculated portion of them, or just a constant. On most systems, mxid consumption will be much slower than xid consumption because most users won't use tuple locks all that heavily. If we made all the defaults the same, then a full-table scan for xid freezing would likely conclude that the many or all of the mxids weren't old enough to be frozen yet. To the greatest extent possible, we want full-table vacuums for either XID freezing or MXID freezing to advance both relfrozenxid and relminmxid so that we don't go through and freeze for one reason and then have to come back and freeze for the other reasons shortly thereafter. Nobody knows exactly how to set the settings to make that happen just yet, so we need settings at least until people can determine what values work well in practice - and probably permanently, because unfortunately I think the answer is likely workload-dependent. Since nobody anticipated someone adding a GUC in a minor release, there was no discussion of this topic that I can find; the new GUCs were added as a side effect of fixing the multixact vacuum issue. Certainly I would have raised a red flag if the discussion of the new GUCs hadn't been buried deep inside really long emails. Alvaro did explicitly ask if anyone wanted to oppose back-patching. I don't think you can really blame him if you didn't see/read that email. Adding new GUCs which nobody has any idea how to set, or can even explain to new users, is not a service to our users. These should be removed. The need for these GUCs is an outgrowth of the fkey locking stuff. Unless we rip that out again or rewrite it completely, the need for them doesn't seem likely to go away - so we're going to need to learn to live with it, not pretend like it isn't a problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 03/12/2014 12:09 AM, Tomas Vondra wrote: Hi all, a quick question that just occured to me - do you plan to tweak the cost estimation fot GIN indexes, in this patch? IMHO it would be appropriate, given the improvements and gains, but it seems to me gincostestimate() was not touched by this patch. Good point. We have done two major changes to GIN in this release cycle: changed the data page format and made it possible to skip items without fetching all the keys (fast scan). gincostestimate doesn't know about either change. Adjusting gincostestimate for the more compact data page format seems easy. When I hacked on that, I assumed all along that gincostestimate doesn't need to be changed as the index will just be smaller, which will be taken into account automatically. But now that I look at gincostestimate, it assumes that the size of one item on a posting tree page is a constant 6 bytes (SizeOfIptrData), which is no longer true. I'll go fix that. Adjusting for the effects of skipping is harder. gincostestimate needs to do the same preparation steps as startScanKey: sort the query keys by frequency, and call consistent function to split the keys intao required and additional sets. And then model that the additional entries only need to be fetched when the other keys match. That's doable in principle, but requires a bunch of extra code. Alexander, any thoughts on that? It's getting awfully late to add new code for that, but it sure would be nice somehow take fast scan into account. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout (RE: contrib/cache_scan)
On Wed, Mar 12, 2014 at 7:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Kouhei Kaigai kai...@ak.jp.nec.com writes: WARNING: pgstat wait timeout WARNING: pgstat wait timeout WARNING: pgstat wait timeout WARNING: pgstat wait timeout Once I got above messages, write performance is dramatically degraded, even though I didn't take detailed investigation. I could reproduce it on the latest master branch without my enhancement, so I guess it is not a problem something special to me. One other strangeness is, right now, this problem is only happen on my virtual machine environment - VMware ESXi 5.5.0. I couldn't reproduce the problem on my physical environment (Fedora20, core i5-4570S). We've seen sporadic reports of that sort of behavior for years, but no developer has ever been able to reproduce it reliably. Now that you've got a reproducible case, do you want to poke into it and see what's going on? I didn't know we were trying to reproduce it, nor that it was a mystery. Do anything that causes serious IO constipation, and you will probably see that message. For example, turn off synchronous_commit and run the default pgbench transaction at a large scale but that still comfortably fits in RAM, and wait for a checkpoint sync phase to kick in. The pgstat wait timeout is a symptom, not the cause. Cheers, Jeff
Re: [HACKERS] The case against multixact GUCs
On 03/12/2014 06:26 PM, Robert Haas wrote: On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus j...@agliodbs.com wrote: In the 9.3.3 updates, we added three new GUCs to control multixact freezing. This was an unprecented move in my memory -- I can't recall ever adding a GUC to a minor release which wasn't backwards compatibility for a security fix. This was a mistake. I disagree. I think it was the right decision. I think it was a mistake not including all of that stuff in the first place, and I think it's good that we've now corrected that oversight. In hindsight, I think permanent multixids in their current form was a mistake. Before 9.3, the thing that made multixids special was that they could just be thrown away at a restart. They didn't need freezing. Now that they do, why not just use regular XIDs for them? We had to duplicate much of the wraparound and freezing logic for multixids that simply would not have been an issue if we had used regular XIDs instead. We could've perhaps kept the old multixids for their original purpose, as transient xids that can be forgotten about after all the old snapshots are gone. But for the permanent ones, it would've been simpler if we handled them more like subxids; make them part of the same XID space as regular XIDs. This is pretty hand-wavy of course, and it's too late now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout (RE: contrib/cache_scan)
Jeff Janes jeff.ja...@gmail.com writes: On Wed, Mar 12, 2014 at 7:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: We've seen sporadic reports of that sort of behavior for years, but no developer has ever been able to reproduce it reliably. Now that you've got a reproducible case, do you want to poke into it and see what's going on? I didn't know we were trying to reproduce it, nor that it was a mystery. Do anything that causes serious IO constipation, and you will probably see that message. The cases that are a mystery to me are where there's no reason to believe that I/O is particularly overloaded. But perhaps Kaigai-san's example is only that ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case against multixact GUCs
On Wed, Mar 12, 2014 at 12:45 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 03/12/2014 06:26 PM, Robert Haas wrote: On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus j...@agliodbs.com wrote: In the 9.3.3 updates, we added three new GUCs to control multixact freezing. This was an unprecented move in my memory -- I can't recall ever adding a GUC to a minor release which wasn't backwards compatibility for a security fix. This was a mistake. I disagree. I think it was the right decision. I think it was a mistake not including all of that stuff in the first place, and I think it's good that we've now corrected that oversight. In hindsight, I think permanent multixids in their current form was a mistake. Before 9.3, the thing that made multixids special was that they could just be thrown away at a restart. They didn't need freezing. Now that they do, why not just use regular XIDs for them? Well, the numbering of MXIDs is closely bound up with their storage format. To do what you're proposing, we'd need to invent some new way of associating an XID-used-as-MXID with update XID, list of lockers, and lock modes. Which is certainly possible, but it's not obvious that it's a good idea. I *am* concerned that we didn't adequately weigh the costs of adding another thing that has to be frozen before we did it. Clearly, the feature has a lot of benefit, or will once we've flushed out most of the bugs. But it's hard to say at this point how much the cost is going to be, and I do think that's cause for concern. But I'm not convinced that unifying the XID and MXID spaces would have addressed that concern to any measurable degree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Wed, Mar 12, 2014 at 8:29 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 03/12/2014 12:09 AM, Tomas Vondra wrote: Hi all, a quick question that just occured to me - do you plan to tweak the cost estimation fot GIN indexes, in this patch? IMHO it would be appropriate, given the improvements and gains, but it seems to me gincostestimate() was not touched by this patch. Good point. We have done two major changes to GIN in this release cycle: changed the data page format and made it possible to skip items without fetching all the keys (fast scan). gincostestimate doesn't know about either change. Adjusting gincostestimate for the more compact data page format seems easy. When I hacked on that, I assumed all along that gincostestimate doesn't need to be changed as the index will just be smaller, which will be taken into account automatically. But now that I look at gincostestimate, it assumes that the size of one item on a posting tree page is a constant 6 bytes (SizeOfIptrData), which is no longer true. I'll go fix that. Adjusting for the effects of skipping is harder. gincostestimate needs to do the same preparation steps as startScanKey: sort the query keys by frequency, and call consistent function to split the keys intao required and additional sets. And then model that the additional entries only need to be fetched when the other keys match. That's doable in principle, but requires a bunch of extra code. Alexander, any thoughts on that? It's getting awfully late to add new code for that, but it sure would be nice somehow take fast scan into account. Preparation we do in startScanKey requires knowledge of estimate size of posting lists/trees. We do this estimate by traversal to leaf pages. I think gincostestimate is expected to be way more cheap. So, we probably need so more rough estimate there, don't we? -- With best regards, Alexander Korotkov.
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On Wed, Mar 12, 2014 at 12:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: David Johnston pol...@yahoo.com writes: Tom Lane-2 wrote 1. Treat this as a non-backwards-compatible change, and document that people have to use -q if they don't want the COPY tag in the output. I'm not sure this is acceptable. I've mostly used copy to with files and so wouldn't mind if STDOUT had the COPY n sent to it as long as the target file is just the copy contents. I think you're missing the point: the case I'm concerned about is exactly that the target file is psql's stdout, or more specifically the same place that the COPY status would get printed to. 2. Kluge ProcessResult so that it continues to not pass back a PGresult for the COPY TO STDOUT case, or does so only in limited circumstances (perhaps only if isatty(stdout), for instance). The main problem with this is that people will test by sending output to a TTY and see the COPY n. Although if it can be done consistently then you minimize backward incompatibility and encourage people to enforce quiet mode while the command runs... Yeah, the inconsistency of behavior that this solution would cause is not a good thing. My inclination now (see later traffic) is to suppress the status report when the COPY destination is the same as pset.queryFout (ie, a simple test whether the FILE pointers are equal). This would suppress the status report for \copy to stdout and COPY TO STDOUT cases, and also for \copy to pstdout if you'd not redirected queryFout with \o. This is reasonably similar to what we already do for SELECT, isn't it? I mean, the server always sends back a command tag, but psql sometimes opts not to print it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Wed, Mar 12, 2014 at 8:02 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/26/2014 11:25 PM, Alexander Korotkov wrote: On Thu, Feb 27, 2014 at 1:07 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/09/2014 12:11 PM, Alexander Korotkov wrote: I've rebased catalog changes with last master. Patch is attached. I've rerun my test suite with both last master ('committed') and attached patch ('ternary-consistent'). Thanks! method | sum +-- committed | 143491.71501 fast-scan-11 | 126916.11199 fast-scan-light| 137321.211 fast-scan-light-heikki | 138168.02801 master | 446976.288 ternary-consistent | 125923.514 I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8 to 4. Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make sure we get similar behavior in Tomas' tests that used 6 search terms. But I always felt that it was too large for real queries, once we have the catalog changes, that's why I lowered to 4 when committing. If an opclass benefits greatly from fast scan, it should provide the ternary consistent function, and not rely on the shim implementation. I'm not sure about decision to reserve separate procedure number for ternary consistent. Probably, it would be better to add ginConfig method. It would be useful for post 9.4 improvements. Hmm, it might be useful for an opclass to provide both, a boolean and ternary consistent function, if the boolean version is significantly more efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a quick check through the array to see if there are any MAYBE arguments, within the consistent function. But I'm inclined to keep the possibility to provide both versions. As long as we support the boolean version at all, there's not much difference in terms of the amount of code to support having them both for the same opclass. A ginConfig could be useful for many other things, but I don't think it's worth adding it now. What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck? We discussed that earlier, but didn't reach any conclusion. That needs to be clarified in the docs. One possibility is to document that they're equivalent. Another is to forbid one of them. Yet another is to assign a different meaning to each. I've been thinking that it might be useful to define them so that a MAYBE result from the tri-consistent function means that it cannot decide if you have a match or not, because some of the inputs were MAYBE. And TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or FALSE, the result would be the same, TRUE+recheck. The practical difference would be that if the tri-consistent function returns TRUE+recheck, ginget.c wouldn't need to bother fetching the other entries, it could just return the entry with recheck=true immediately. While with MAYBE result, it would fetch the other entries and call tri-consistent again. ginget.c doesn't currently use the tri-consistent function that way - it always fetches all the entries for a potential match before calling tri-consistent, but it could. I had it do that in some of the patch versions, but Tomas' testing showed that it was a big loss on some queries, because the consistent function was called much more often. Still, something like that might be sensible in the future, so it might be good to distinguish those cases in the API now. Note that ginarrayproc is already using the return values like that: in GinContainedStrategy, it always returns TRUE+recheck regardless of the inputs, but in other cases it uses GIN_MAYBE. Next revision of patch is attached. In this version opclass should provide at least one consistent function: binary or ternary. It's expected to achieve best performance when opclass provide both of them. However, tests shows opposite :( I've to recheck it carefully. However, it's not! This revision of patch completes my test case in 123330 ms. This is slightly faster than previous revision. Great. Committed, I finally got around to it. Some minor changes: I reworded the docs and also updated the table of support functions in xindex.sgml. I refactored the query in opr_sanity.sql, to make it easier to read, and to check more carefully that the required support functions are present. I also added a runtime check to avoid crashing if neither is present. A few things we ought to still discuss: * I just noticed that the dummy trueTriConsistentFn returns GIN_MAYBE, rather than GIN_TRUE. The equivalent boolean version returns 'true' without recheck. Is that a typo, or was there some reason for the discrepancy? Actually,
Re: [HACKERS] GIN improvements part2: fast scan
On Wed, Mar 12, 2014 at 1:52 PM, Alexander Korotkov aekorot...@gmail.com wrote: * This patch added a triConsistent function for array and tsvector opclasses. Were you planning to submit a patch to do that for the rest of the opclasses, like pg_trgm? (it's getting awfully late for that...) Yes. I can try to get it into 9.4 if it's possible. It seems to me that we'd be wise to push that to 9.5 at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding documentation?
On Tue, Mar 11, 2014 at 4:16 PM, Andres Freund and...@2ndquadrant.com wrote: Could you perhaps commit the attached patch fixing the issues you mentioned? I committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
On 03/12/2014 12:22 AM, Magnus Hagander wrote: On Mar 12, 2014 1:46 AM, Josh Berkus j...@agliodbs.com wrote: Yeah, what we really need is encapsulated per-DB users and local superusers. I think every agrees that this is the goal, but nobody wants to put in the work to implement a generalized solution. Encapsulated would probably be the doable part. But local superuser? Given that a superuser can load and run binaries, how would you propose you restrict that superuser from doing anything they want? And if you don't need that functionality, then hows it really different from being the database owner? Well, if you really want my I want a pony list: Local superusers (maybe this concept needs another name) would be able to do the following things in a *single* database: 1 change permissions for other users on that database and its objects 2 load extensions from a predefined .so directory / list 3 create/modify untrusted language functions 4 create per-database users and change their settings 5 change database settings (SET stuff) 6 NOT change their own user settings 7 NOT change any global users 8 NOT run SET PERSISTENT or other commands with global effect The above is fairly similar to what Amazon currently offers for Postgres RDS, except that they only have 1 database per instance, so clearly they haven't worked out some of the security issues. Now, obviously permission (3) could be used to escalate a local superuser to global superuser permissions, so local superusers aren't really a secure concept, unless you don't add any untrusted languages to the list of allowed extensions. Alternately, we could drop (3) from the list of features. H. On the other foot, though: all of 1,2,4 and 5 could conceivably be done via a set of Security Definer functions loaded into the database, with a lot less complexity and security risk. So if we're sacrificing untrusted languages, then we really don't need a local superuser at all; just a bunch of variadic SD functions which can handle user creation and permissions issues. It would limit the permissions syntax we can express, but not critically so in my opinion. This could all be done as an extension, and would probably be the better for it. Assuming we have db-local users in the first place, of course. On 03/11/2014 09:39 PM, David Johnston wrote: So if dave is already a user in db1 only that specific dave can be made a global user - any other dave would be disallowed. Correct. Well, unless the other dave was promoted first. However, I personally don't see any reason why we should even support promoting users from local to global. It adds complexity to the concept, and the value of it eludes me. Would user - password be a better PK? Even with the obvious issue that password part of the key can change. user-password to database is a properly many-to-many relationship. Or see next for something simpler. I have no idea where you're going with this. A simple implementation would simply have the global users copied into each database as it is constructed. There would also be a link from each of the database-specific users and the global master so that a password change issued against the global user propagates to all the database-specific versions. That's called massive backwards compatibility failure, and I don't think we'll go any further with your concept. Be nice if all users could be global and there would be some way to give them permissions on databases. Um, there is: http://www.postgresql.org/docs/9.3/static/sql-grant.html -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 03/12/2014 07:42 PM, Alexander Korotkov wrote: Preparation we do in startScanKey requires knowledge of estimate size of posting lists/trees. We do this estimate by traversal to leaf pages. I think gincostestimate is expected to be way more cheap. So, we probably need so more rough estimate there, don't we? Yeah, maybe. We do something similar for b-tree MIN/MAX currently, but with a lot of keys, it could be a lot more expensive to traverse down to all of them. I wonder if we could easily at least catch the common skewed cases, where e.g the logic of the consistent function is to AND all the keys. The opclass would know that, but we would somehow need to pass down the information to gincostestimate. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
On 03/12/2014 02:09 PM, Josh Berkus wrote: On 03/12/2014 12:22 AM, Magnus Hagander wrote: On Mar 12, 2014 1:46 AM, Josh Berkus j...@agliodbs.com wrote: Yeah, what we really need is encapsulated per-DB users and local superusers. I think every agrees that this is the goal, but nobody wants to put in the work to implement a generalized solution. Encapsulated would probably be the doable part. But local superuser? Given that a superuser can load and run binaries, how would you propose you restrict that superuser from doing anything they want? And if you don't need that functionality, then hows it really different from being the database owner? Well, if you really want my I want a pony list: Local superusers (maybe this concept needs another name) would be able to do the following things in a *single* database: 1 change permissions for other users on that database and its objects 2 load extensions from a predefined .so directory / list 3 create/modify untrusted language functions 4 create per-database users and change their settings 5 change database settings (SET stuff) 6 NOT change their own user settings 7 NOT change any global users 8 NOT run SET PERSISTENT or other commands with global effect Item 3 gives away the store. AFAIK Amazon doesn't load untrusted languages, period. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
MauMau escribió: The raw link only gave the mail in text format. I hoped to import the mail into Windows Mail on Windows Vista, but I couldn't. You might need to run a conversion process by which you transform the raw file (in mbox format) into EML format or whatever it is that Windows Mail uses. I vaguely recall there are tools for this. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
* Josh Berkus (j...@agliodbs.com) wrote: Local superusers (maybe this concept needs another name) would be able to do the following things in a *single* database: 1 change permissions for other users on that database and its objects What about bypass permissions, ala what superuser does today? Or are you saying we'd only need to allow this new kind of role to bypass the checks in the GRANT/REVOKE system? 2 load extensions from a predefined .so directory / list This would obviously have to be a curated list that avoids things like 'adminpack'... 3 create/modify untrusted language functions Uhh, I don't believe RDS allows you to do this..? 4 create per-database users and change their settings Presumably just for the 'local' DB? 5 change database settings (SET stuff) This can be done by the database-owner already, no? 6 NOT change their own user settings Don't think this is quite that simple (passwords?). 7 NOT change any global users What about role membership, wrt local vs. global roles? 8 NOT run SET PERSISTENT or other commands with global effect Indeed, or use 'COPY'.. Now, obviously permission (3) could be used to escalate a local superuser to global superuser permissions, so local superusers aren't really a secure concept, unless you don't add any untrusted languages to the list of allowed extensions. Alternately, we could drop (3) from the list of features. That'd certainly be the main issue that I see with this proposal. Doing the rest but allowing untrusted languages would just get the naive in trouble and not help those of us who want this, as we wouldn't be able to use it. H. On the other foot, though: all of 1,2,4 and 5 could conceivably be done via a set of Security Definer functions loaded into the database, with a lot less complexity and security risk. For my part- I don't see having everyone write their own set of SECURITY DEFINER functions as being either less complex or less risk. They're also a lot less convenient to use. That's not what RDS did, is it? No, and I agree with them on that part. On 03/11/2014 09:39 PM, David Johnston wrote: So if dave is already a user in db1 only that specific dave can be made a global user - any other dave would be disallowed. Correct. Well, unless the other dave was promoted first. However, I personally don't see any reason why we should even support promoting users from local to global. It adds complexity to the concept, and the value of it eludes me. Agreed. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On Tue, Feb 25, 2014 at 1:03 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 04 February 2014 14:38, Myself wrote: On 4th February 2014, Christian kruse Wrote: On 04/02/14 12:38, Fujii Masao wrote: ISTM that the phrase Request queue is not used much around the lock. Using the phrase wait queue or Simon's suggestion sound better to at least me. Thought? Sounds reasonable to me. Attached patch changes messages to the following: Process holding the lock: A. Wait queue: B. Processes holding the lock: A, B. Wait queue: C. This looks good to me also. I have tested the revised patch and found ready to be committed. I am marking this as Ready for Committer. Committed! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
On 03/12/2014 11:25 AM, Stephen Frost wrote: * Josh Berkus (j...@agliodbs.com) wrote: Local superusers (maybe this concept needs another name) would be able to do the following things in a *single* database: 1 change permissions for other users on that database and its objects What about bypass permissions, ala what superuser does today? Or are you saying we'd only need to allow this new kind of role to bypass the checks in the GRANT/REVOKE system? More like what we have for the database owner role today. 2 load extensions from a predefined .so directory / list This would obviously have to be a curated list that avoids things like 'adminpack'... It would need to be a list created by the global superuser. By default, nothing would be on it. 4 create per-database users and change their settings Presumably just for the 'local' DB? Right. 5 change database settings (SET stuff) This can be done by the database-owner already, no? Oh, point. 6 NOT change their own user settings Don't think this is quite that simple (passwords?). Well, we already limit what things users can change about themselves; they can't promote themselves to superuser, for example. 7 NOT change any global users What about role membership, wrt local vs. global roles? Allowing global users to join local ROLEs is its own can'o'worms that would merit an entire other thread. Assuming, of course, that we had local users in the first place. 8 NOT run SET PERSISTENT or other commands with global effect Indeed, or use 'COPY'.. Yeah. In theory, we should allow the local superuser to use COPY; in practice, nobody will care because they'll be using client-side COPY since the entire use-case for this is cloud-hosted DBs anyway. For my part- I don't see having everyone write their own set of SECURITY DEFINER functions as being either less complex or less risk. They're also a lot less convenient to use. That's not what RDS did, is it? No, and I agree with them on that part. I was thinking of having an extension in contrib, actually. That is, a canonical set of security definer functions. But if you think it's easier to actually implement the permissions restrictions in the actual utility functions, I wouldn't argue. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
Robert Haas robertmh...@gmail.com writes: On Wed, Mar 12, 2014 at 12:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: My inclination now (see later traffic) is to suppress the status report when the COPY destination is the same as pset.queryFout (ie, a simple test whether the FILE pointers are equal). This would suppress the status report for \copy to stdout and COPY TO STDOUT cases, and also for \copy to pstdout if you'd not redirected queryFout with \o. This is reasonably similar to what we already do for SELECT, isn't it? I mean, the server always sends back a command tag, but psql sometimes opts not to print it. Right, the analogy to SELECT gives some comfort that this is reasonable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
Andrew Dunstan and...@dunslane.net writes: On 03/12/2014 02:09 PM, Josh Berkus wrote: Well, if you really want my I want a pony list: Local superusers (maybe this concept needs another name) would be able to do the following things in a *single* database: 1 change permissions for other users on that database and its objects 2 load extensions from a predefined .so directory / list 3 create/modify untrusted language functions 4 create per-database users and change their settings 5 change database settings (SET stuff) 6 NOT change their own user settings 7 NOT change any global users 8 NOT run SET PERSISTENT or other commands with global effect Item 3 gives away the store. Indeed. If you can do (3), you can break out of any of the other constraints. I suspect even (1) and/or (5) would be enough to mount trojan-horse attacks against real superusers who visit your database. I do not put any stock in the notion of constrained superuser. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GSoC 2014
Hello all, I am Ashutosh Dhundhara from Thapat University, Patiala-India presently pursuing Bachelors degree in Computer Science and Engineering. This year I wish to work for PostgreSQL under the flagship of GSoC 2014. So please help regarding this. I have a few questions : 1) Do I have to choose all ideas from the GSoC wiki page or any one of them ? 2) What is the deadline for fixing bugs which will account for selection procedure ? Please guide me on how to proceed. Regards, Ashutosh Dhundhara
Re: [HACKERS] GSoC 2014
On Wed, Mar 12, 2014 at 8:05 PM, Ashutosh Dhundhara ashutoshdhundh...@yahoo.com wrote: Hello all, I am Ashutosh Dhundhara from Thapat University, Patiala-India presently pursuing Bachelors degree in Computer Science and Engineering. This year I wish to work for PostgreSQL under the flagship of GSoC 2014. So please help regarding this. I have a few questions : 1) Do I have to choose all ideas from the GSoC wiki page or any one of them ? 2) What is the deadline for fixing bugs which will account for selection procedure ? Please guide me on how to proceed. You can propose your own ideas as well. You can pick any number of ideas from GSoC 2014 wiki page and send proposals for them. The deadline for proposal is next friday, I believe. Regards, Atri -- Regards, Atri *l'apprenant*
Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire
On Tue, Mar 11, 2014 at 3:53 AM, Amit Kapila amit.kapil...@gmail.com wrote: Places where tuple info not available LOG: process 5788 still waiting for ShareLock on transaction 679 after 1014.000 ms CONTEXT: while attempting to operate in relation public.idx_t1 of database postgres The way the context message is assembled piecemeal in XactLockTableWaitErrorContextCallback violates translation guidelines. You need to have completely separate strings for each variant. While attempting to operate in? That seems like unhelpful weasel-wording. I wonder if we ought to have separate messages for each possibility, like delete tuple (X,Y) when called from heap_delete(), update tuple (X,Y), check exclusion constraint on tuple (X,Y) when called from check_exclusion_constraint, etc. That seems like it would be handy information to have. Why can't check_exclusion_constraint, for example, pass the TID, so that at least that much information is available? I'm not very happy with the idea of including the tuple details only when the level is less than ERROR. For one thing, to do that in a way that respects translatability guidelines will require two versions of every string that would otherwise require only one. For another thing, it seems like it's punting a pretty important case. If we're gonna add context detail to lots of cases (instead only the still waiting case that people probably mostly care about) then we should actually print the details more-or-less consistently in all of those cases, not pretend like a solution that only works in the narrow case is more general than it really is. I think we should really try hard to make the amount of detail provided as uniform as possible across all the cases, even if that means removing information from some cases where it might have been available. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Also, GiST index is faster for create/update operations. I really hope we will improve jsonb indexing in the next one-two releases. For now I'd suggest people index expressional indexes to index just interesting keys or use GiST. On Wed, Mar 12, 2014 at 5:15 PM, Tomas Vondra t...@fuzzy.cz wrote: On 12 Březen 2014, 0:41, Peter Geoghegan wrote: On Tue, Mar 11, 2014 at 3:58 PM, Tomas Vondra t...@fuzzy.cz wrote: ERROR: index row size 1416 exceeds maximum 1352 for index gin_idx All index AMs have similar restrictions. Yes, I know and I have no problem with restrictions in general. You may run into similar issues with btree indexes on text columns with long text, for example. The thing is that people don't generally index text directly, because it usually does not make much sense, but using tsvector etc. But with jsonb it's more likely because indexing is one of the goodies (at least for me). And the discussions with several people interested in storing json data I had recently went often like this: me: It seems we'll have a better json datatype in 9.4. them: Nice! me: And it will be possible to do searches on arbitrary keys. them: Yay! me: And we actually got pretty significant improvements in GIN indexes. them: Awesome! me: But the values you may index need to be less than ~1500B. them: Bummer :-( me: Well, you can use GIST then. A good example of such header is dkim-signature which basically contains the whole message digitally signed with DKIM. The signature tends to be long and non-compressible, thanks to the signature. I'm wondering what's the best way around this, because I suspect many new users (especially those attracted by jsonb and GIN improvements) will run into this. Maybe not immediately, but eventully they'll try to insert a jsonb with long value, and it will fail ... The jsonb_hash_ops operator class just stores a 32-bit integer hash value (it always sets the recheck flag, which only some of the other default GIN opclass' strategies do). It only supports containment, and not the full variety of operators that the default opclass supports, which is why it isn't the default. I think that in practice the general recommendation will be that when indexing at the top level, use jsonb_hash_ops. When indexing nested items, use the more flexible default GIN opclass. That seems like a pretty smart trade-off to me. OK, I'll look into the jsonb_hash_ops - that sounds more or less like what I was thinking about (and sure, storing hashes makes some operations impossible to support). The other thing I was thinking about is introducing some kind of upper limit for the value length - e.g. index just the first 1kB, or something like that. My experience is most values are way shorter, or actually differ in the first 1kB, so this should allow most decisions to be made. But I'm not really that familiar with how GIN works, so maybe this is nonsense. The more I think about it, the more inclined I am to lose GiST support entirely for the time being. It lets us throw out about 700 lines of C code, which is a very significant fraction of the total, removes the one open bug, and removes the least understood part of the code. The GiST opclass is not particularly compelling for this. I disagree with that. I see GiST as a simple fallback option for the cases I described. I wasn't able to create a GIN index because of exceeding the max item length, but GiST created just fine. It was considerably slower, but it worked. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Replication slots and footguns
All: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? If I'm not, that seems like something to fix before 9.4 release. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
On 03/12/2014 11:40 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 03/12/2014 02:09 PM, Josh Berkus wrote: Well, if you really want my I want a pony list: Local superusers (maybe this concept needs another name) would be able to do the following things in a *single* database: 1 change permissions for other users on that database and its objects 2 load extensions from a predefined .so directory / list 3 create/modify untrusted language functions 4 create per-database users and change their settings 5 change database settings (SET stuff) 6 NOT change their own user settings 7 NOT change any global users 8 NOT run SET PERSISTENT or other commands with global effect Item 3 gives away the store. Indeed. If you can do (3), you can break out of any of the other constraints. I suspect even (1) and/or (5) would be enough to mount trojan-horse attacks against real superusers who visit your database. ... nobody reads my whole post, except Stephen. :-( -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
Hi, On 2014-03-12 12:00:25 -0700, Josh Berkus wrote: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? One with a connected walsender. It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? It's sufficient to terminate the walsender and then drop the slot. That seems ok for now? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
* Josh Berkus (j...@agliodbs.com) wrote: On 03/12/2014 11:25 AM, Stephen Frost wrote: * Josh Berkus (j...@agliodbs.com) wrote: Local superusers (maybe this concept needs another name) would be able to do the following things in a *single* database: 1 change permissions for other users on that database and its objects What about bypass permissions, ala what superuser does today? Or are you saying we'd only need to allow this new kind of role to bypass the checks in the GRANT/REVOKE system? More like what we have for the database owner role today. eh? Can you clarify what you think the database owner role can do today that actually matches what you're asking for above, or even what I'm suggesting? 2 load extensions from a predefined .so directory / list This would obviously have to be a curated list that avoids things like 'adminpack'... It would need to be a list created by the global superuser. By default, nothing would be on it. Hmm, so this would be independent from what's installed on the system at a file level. That's an interesting idea. 4 create per-database users and change their settings Presumably just for the 'local' DB? Right. 5 change database settings (SET stuff) This can be done by the database-owner already, no? Oh, point. 6 NOT change their own user settings Don't think this is quite that simple (passwords?). Well, we already limit what things users can change about themselves; they can't promote themselves to superuser, for example. Just saying that we need to be more specific on this point. 7 NOT change any global users What about role membership, wrt local vs. global roles? Allowing global users to join local ROLEs is its own can'o'worms that would merit an entire other thread. What would the caveats here be then..? Local roles can join global ones, but global roles can't join local ones? I'm not sure how much of an issue this will really be if they're all sharing one catalog with different namespaces (eg: the '@db' stuff). Assuming, of course, that we had local users in the first place. That's what this thread is about, isn't it..? 8 NOT run SET PERSISTENT or other commands with global effect Indeed, or use 'COPY'.. Yeah. In theory, we should allow the local superuser to use COPY; in practice, nobody will care because they'll be using client-side COPY since the entire use-case for this is cloud-hosted DBs anyway. How could we allow the local superuser to use COPY? They've be able to overwrite files in other databases, or possibly modify postgresql.conf, etc, depending on the overall system environment. For my part- I don't see having everyone write their own set of SECURITY DEFINER functions as being either less complex or less risk. They're also a lot less convenient to use. That's not what RDS did, is it? No, and I agree with them on that part. I was thinking of having an extension in contrib, actually. That is, a canonical set of security definer functions. But if you think it's easier to actually implement the permissions restrictions in the actual utility functions, I wouldn't argue. Having it in contrib would be better than nothing, but I'd advocate putting this in as part of our actual permissions model, along with the read-only/auditor-type option. Having this not-quite-superuser which is able to read all tables, possibly modify any of them, without having to change the permissions system could be extremely useful and would be rather painful to implement through SD functions. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Replication slots and footguns
On Wed, Mar 12, 2014 at 3:03 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-03-12 12:00:25 -0700, Josh Berkus wrote: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? One with a connected walsender. It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? It's sufficient to terminate the walsender and then drop the slot. That seems ok for now? Urgh. That error message looks susceptible to improvement. How about: replication slot %s cannot be dropped because it is currently in use -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On 03/12/2014 12:03 PM, Andres Freund wrote: Hi, On 2014-03-12 12:00:25 -0700, Josh Berkus wrote: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? One with a connected walsender. In a world of network proxies, a walsender could be connected for hours after the replica has ceased to exist. Fortunately, wal_sender_timeout is changeable on a reload. We check for actual standby feedback for the timeout, yes? It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? It's sufficient to terminate the walsender and then drop the slot. That seems ok for now? We have no safe way to terminate the walsender that I know of; pg_terminate_backend() doesn't include walsenders last I checked. So the procedure for this would be: 1) set wal_sender_timeout to some low value (1); 2) reload 3) call pg_drop_replication_slot('slotname') Clumsy, but it will do for a first pass; we can make it better (for example, by adding a force boolean to pg_drop_replication_slot) in 9.5. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On 2014-03-12 15:18:04 -0400, Robert Haas wrote: On Wed, Mar 12, 2014 at 3:03 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-03-12 12:00:25 -0700, Josh Berkus wrote: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? One with a connected walsender. It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? It's sufficient to terminate the walsender and then drop the slot. That seems ok for now? Urgh. That error message looks susceptible to improvement. How about: replication slot %s cannot be dropped because it is currently in use I think that'd require duplicating some code between acquire and drop, but how about replication slot %s is in use by another backend? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On 2014-03-12 12:23:01 -0700, Josh Berkus wrote: On 03/12/2014 12:03 PM, Andres Freund wrote: Hi, On 2014-03-12 12:00:25 -0700, Josh Berkus wrote: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? One with a connected walsender. In a world of network proxies, a walsender could be connected for hours after the replica has ceased to exist. Fortunately, wal_sender_timeout is changeable on a reload. We check for actual standby feedback for the timeout, yes? Yep. It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? It's sufficient to terminate the walsender and then drop the slot. That seems ok for now? We have no safe way to terminate the walsender that I know of; pg_terminate_backend() doesn't include walsenders last I checked. SELECT pg_terminate_backend(pid) FROM pg_stat_replication; works. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 2014-03-07 17:54:32 +0200, Heikki Linnakangas wrote: So there are some unexplained differences there, but based on these results, I'm still OK with committing the patch. So, I am looking at this right now. I think there are some minor things I'd like to see addressed: 1) I think there needs to be a good sized comment explaining why WaitXLogInsertionsToFinish() isn't racy due to the unlocked read at the beginning of LWLockWait(). I think it's safe because we're reading Insert-CurrBytePos inside a spinlock, and it will only ever increment. As SpinLockAcquire() has to be a read barrier we can assume that every skewed read in LWLockWait() will be for lock protecting a newer insertingAt? 2) I am not particularly happy about the LWLockWait() LWLockWakeup() function names. They sound too much like a part of the normal lwlock implementation to me. But admittedly I don't have a great idea for a better naming scheme. Maybe LWLockWaitForVar(), LWLockWakeupVarWaiter()? 3) I am the wrong one to complain, I know, but the comments above struct WALInsertLock are pretty hard to read from th sentence structure. 4) WALInsertLockAcquire() needs to comment on acquiring/waking all but the last slot. Generally the trick of exclusive xlog insertion lock acquiration only really using the last lock could use a bit more docs. 5) WALInsertLockRelease() comments on the reset of insertingAt being optional, but I am not convinced that that's true anymore. If an exclusive acquiration isn't seen as 0 or INT64CONST(0x) by another backend we're in trouble, right? Absolutely not sure without thinking on it for longer than I can concentrate right now. 6) Pretty minor, but from a style POV it seems nicer to separate exclusive/nonexclusive out of WALInsertLockAcquire(). The cases don't share any code now. A patch contianing some trivial changes is attached... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 484b9c5..8a55c6b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1628,8 +1628,6 @@ WALInsertLockRelease(void) static void WALInsertLockWakeup(XLogRecPtr insertingAt) { - int i; - if (holdingAllLocks) { /* diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index f88bf76..2695128 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -873,6 +873,9 @@ LWLockWait(LWLock *l, uint64 *valptr, uint64 oldval, uint64 *newval) int extraWaits = 0; bool result = false; + /* can't be used with shared locks for now */ + Assert(lock-shared == 0); + /* * Quick test first to see if it the slot is free right now. * @@ -905,6 +908,8 @@ LWLockWait(LWLock *l, uint64 *valptr, uint64 oldval, uint64 *newval) SpinLockAcquire(lock-mutex); #endif + Assert(lock-shared == 0); + /* Is the lock now free, and if not, does the value match? */ if (lock-exclusive == 0) { @@ -1022,6 +1027,7 @@ LWLockWakeup(LWLock *l, uint64 *valptr, uint64 val) SpinLockAcquire(lock-mutex); /* we should hold the lock */ + LWLockHeldByMe(l); Assert(lock-exclusive == 1); /* Update the lock's value */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On Wed, Mar 12, 2014 at 3:25 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-12 15:18:04 -0400, Robert Haas wrote: On Wed, Mar 12, 2014 at 3:03 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-03-12 12:00:25 -0700, Josh Berkus wrote: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? One with a connected walsender. It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? It's sufficient to terminate the walsender and then drop the slot. That seems ok for now? Urgh. That error message looks susceptible to improvement. How about: replication slot %s cannot be dropped because it is currently in use I think that'd require duplicating some code between acquire and drop, but how about replication slot %s is in use by another backend? Sold. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 12, 2014 at 6:20 AM, Tomas Vondra t...@fuzzy.cz wrote: I'm still not sure how would that look. Does that mean I'd have to create multiple GIN indexes - one for each possible key or something like that? Can you give an example? It could mean that you're obliged to create multiple indexes, yes. For an example, and to get a better sense of what I mean, look at the documentation in the patch. The idea that you're going to create one index on a jsonb, and it's going to be able to usefully index a lot of different queries doesn't seem practical for most use-cases. Mostly, people will have fairly homogeneous json documents, and they'll want to index certain nested fields common to all or at least a large majority of those documents. By indexing entire jsonb datums, do you hope to get much benefit out of the indexed values (as opposed to keys) being stored (in serialized form) in the GIN index? Because you *are* indexing a large nested structure as a value. Is that large nested structure going to appear in your query predicate, or are you just going to subscript the jsonb to get to the level that's of interest to query that? I'm pretty sure that people want the latter. Are you sure that your complaint isn't just that the default GIN opclass indexes values (as distinct from keys) that are large and unwieldy, and not terribly useful? I don't think expressional indexes are some kind of unfortunate work around for a jsonb limitation. I think that they're the natural way to approach indexing a nested structure in Postgres. MongoDB, for example, does not magically index everything. You're still required to make choices about indexing that consider the access patterns. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 12, 2014 at 11:57 AM, Oleg Bartunov obartu...@gmail.com wrote: Also, GiST index is faster for create/update operations. I really hope we will improve jsonb indexing in the next one-two releases. For now I'd suggest people index expressional indexes to index just interesting keys or use GiST. When do you ever want to index non-interesting keys? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On 03/12/2014 12:26 PM, Andres Freund wrote: On 2014-03-12 12:23:01 -0700, Josh Berkus wrote: On 03/12/2014 12:03 PM, Andres Freund wrote: It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? It's sufficient to terminate the walsender and then drop the slot. That seems ok for now? We have no safe way to terminate the walsender that I know of; pg_terminate_backend() doesn't include walsenders last I checked. SELECT pg_terminate_backend(pid) FROM pg_stat_replication; Aha! Ok, I'll work on some documentation. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On 03/12/2014 12:34 PM, Robert Haas wrote: Urgh. That error message looks susceptible to improvement. How about: replication slot %s cannot be dropped because it is currently in use I think that'd require duplicating some code between acquire and drop, but how about replication slot %s is in use by another backend? Sold. Wait ... before you go further ... I object to dropping the word active from the error message. The column is called active, and that's where a DBA should look; that word needs to stay in the error message. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Thu, Mar 13, 2014 at 12:10 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, Mar 12, 2014 at 11:57 AM, Oleg Bartunov obartu...@gmail.com wrote: Also, GiST index is faster for create/update operations. I really hope we will improve jsonb indexing in the next one-two releases. For now I'd suggest people index expressional indexes to index just interesting keys or use GiST. When do you ever want to index non-interesting keys? Regular user may just index all keys. I mean, that json can contains keys, which are not searched, so it's not needed to index them and save index size. We probably could provide option in CREATE INDEX to specify what to index and what not index, but it require planner to know that information. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 03/12/2014 04:10 PM, Peter Geoghegan wrote: On Wed, Mar 12, 2014 at 11:57 AM, Oleg Bartunov obartu...@gmail.com wrote: Also, GiST index is faster for create/update operations. I really hope we will improve jsonb indexing in the next one-two releases. For now I'd suggest people index expressional indexes to index just interesting keys or use GiST. When do you ever want to index non-interesting keys? The problem is when do you know they are interesting? One major use case for using treeish data types in the first place is that you don't know when you're designing the database exactly what shape the data will be. If you don't know that, then how are you supposed to know what in it will be interesting? It's somewhat analogous to full text indexing, where we don't know in advance what phrases or words will be interesting. Here, a key is the equivalent of a word and a key path or subpath is the equivalent of a phrase. Maybe I'm dreaming, since I have no idea how to go about this sort of indexing, but it's where I'd like to see lots of effort. I agree with Oleg that we need to be very creative about jsonb indexing. One of my hopes is that by going down the road we are on, we'll get much wider interest in this, and that both ideas and money might flow towards addressing it in a way that we probably wouldn't have seen otherwise. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On 12 March 2014 19:00, Josh Berkus j...@agliodbs.com wrote: All: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? I'm not clear on why would dropping an active replication slot would solve disk space problems related to WAL. I thought it was inactive slots that were the problem in this regard? -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Andrew, Peter: Just so I'm clear on the limits here, lemme make sure I understand this: a) GIN indexing is limited to ~~1500chars b) The value, which includes everything other than the top level set of keys, is one item as far as GIN is concerned. Therefore: we are limited to indexing JSON where nothing below a top-level key is more than 1500bytes? I'm asking for documentation purposes. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 12, 2014 at 1:37 PM, Andrew Dunstan and...@dunslane.net wrote: One major use case for using treeish data types in the first place is that you don't know when you're designing the database exactly what shape the data will be. If you don't know that, then how are you supposed to know what in it will be interesting? It's somewhat analogous to full text indexing, where we don't know in advance what phrases or words will be interesting. Here, a key is the equivalent of a word and a key path or subpath is the equivalent of a phrase. You don't know exactly how, but you have some idea. The major benefit is that you can add new things to new documents as the need arises, and that's not a big deal, nor does it require a migration with DDL. If we continue to take MongoDB as representative of how people will use jsonb, they pretty strongly encourage the idea that you have to have some structure or design. Google mongodb schema design to see what I mean - you'll find plenty. It has more to do with making querying the data possible than anything else. There is a limited amount you can do with a bunch of documents that share little in common in terms of their structure - what does a query (that can use an index just in principle) even look like there? The use case you describe here doesn't sound like something similar to full text search. It sounds like something identical. In any case, let's focus on what we have right now. I think that the indexing facilities proposed here are solid. In any case they do not preclude working on better indexing strategies as the need emerges. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 03/12/2014 04:58 PM, Peter Geoghegan wrote: In any case, let's focus on what we have right now. I think that the indexing facilities proposed here are solid. In any case they do not preclude working on better indexing strategies as the need emerges. I quite agree, didn't mean to suggest otherwise. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..
Hi, Here's an updated patch. I had to push the LIMIT processing into ModifyTable to make the behaviour sane in parallel scenarios. As usual, please ignore if you're busy with 9.4. I will work on better docs and more tests from now on and am preparing to make a solid case for adding this. ♜ update_delete_order_by_limit_v1.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 12.3.2014 20:40, Peter Geoghegan wrote: On Wed, Mar 12, 2014 at 6:20 AM, Tomas Vondra t...@fuzzy.cz wrote: I'm still not sure how would that look. Does that mean I'd have to create multiple GIN indexes - one for each possible key or something like that? Can you give an example? It could mean that you're obliged to create multiple indexes, yes. For an example, and to get a better sense of what I mean, look at the documentation in the patch. OK, will do. The idea that you're going to create one index on a jsonb, and it's going to be able to usefully index a lot of different queries doesn't seem practical for most use-cases. Mostly, people will have fairly homogeneous json documents, and they'll want to index certain nested fields common to all or at least a large majority of those documents. I think that's unfounded assumption. Many users actually have very little control over the documents or queries - a nice example may be the mail archive, with headers stored in a hstore/jsonb. I have absolutely no control over the headers or queries. But I think this is a feedback loop too - what if many users actually want that functionality, but realize that expression indexes are not sufficient for their needs and thus don't even try (and so we don't hear about them)? And my experience is that this is actualy one of the very cool hstore features - being able to index the whole structure and then do arbitrary queries over that. The only reason why I'm looking at jsonb is that it the improved support for data types (especially arrays). So I have my doubts about the claims that users have homogenous documents and only want to index some fields with expression indexes. By indexing entire jsonb datums, do you hope to get much benefit out of the indexed values (as opposed to keys) being stored (in serialized form) in the GIN index? Because you *are* indexing a large nested structure as a value. Is that large nested structure going to appear in your query predicate, or are you just going to subscript the jsonb to get to the level that's of interest to query that? I'm pretty sure that people want the latter. Are you sure that your complaint isn't just that the default GIN opclass indexes values (as distinct from keys) that are large and unwieldy, and not terribly useful? No, I don't expect a large nested structure to appear in the query. And I expect most people won't need that, although I can imagine queries @ doing that (not sure if that checks for equality or 'subset'). But I'm not sure I understand how's this related to my original post? All I was asking whether it wouldn't be enough to store a hash instead of the original value, i.e. instead of this: {from : j...@example.com, to : j...@example.com, content-type : text/plain; charset=us-ascii, dkim-signature : vry long value } this {129812 : 29382, 459821 : 1029381, 21083 : 102941, 111390 : 129010292} which would solve issues with the long values and might still support the queries (with recheck, of course). I don't know if that's what jsonb_hash_ops do or if it's even possible / compatible with GIN. I don't think expressional indexes are some kind of unfortunate work around for a jsonb limitation. I think that they're the natural way to approach indexing a nested structure in Postgres. MongoDB, for example, does not magically index everything. You're still required to make choices about indexing that consider the access patterns. For many usecases, expressional indexes are the right tool. But not for all and I see no reason to just throw some tools away. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On 03/04/2014 01:58 PM, Amit Kapila wrote: On Mon, Mar 3, 2014 at 7:57 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/16/2014 01:51 PM, Amit Kapila wrote: On Wed, Feb 5, 2014 at 5:29 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Thanks. I have to agree with Robert though that using the pglz encoding when we're just checking for a common prefix/suffix is a pretty crappy way of going about it [1]. As the patch stands, it includes the NULL bitmap when checking for a common prefix. That's probably not a good idea, because it defeats the prefix detection in a the common case that you update a field from NULL to not-NULL or vice versa. Attached is a rewritten version, which does the prefix/suffix tests directly in heapam.c, and adds the prefix/suffix lengths directly as fields in the WAL record. If you could take one more look at this version, to check if I've missed anything. I had verified the patch and found few minor points: ... Fixed those. One Question: + rdata[1].data = (char *) xlrec; Earlier it seems to store record hearder as first segment rdata[0], whats the reason of changing it? I found the code easier to read that way. The order of rdata entries used to be: 0: xl_heap_update struct 1: full-page reference to oldbuf (no data) 2: xl_heap_header_len struct for the new tuple 3-7: logical decoding stuff The prefix/suffix fields made that order a bit awkward, IMHO. They are logically part of the header, even though they're not part of the struct (they are documented in comments inside the struct). So they ought to stay together with the xl_heap_update struct. Another option would've been to move it after the xl_heap_header_len struct. Note that this doesn't affect the on-disk format of the WAL record, because the moved rdata entry is just a full-page reference, with no payload of its own. I have verified the patch by doing crash recovery for below scenario's and it worked fine: a. no change in old and new tuple b. all changed in new tuple c. half changed (update half of the values to NULLS) in new tuple d. only prefix same in new tuple e. only suffix same in new tuple f. prefix-suffix same, other columns values changed in new tuple. Thanks! Conclusion is that patch shows good WAL reduction and performance improvement for favourable cases without CPU overhead for non-favourable cases. Ok, great. Committed! I left out the regression tests. It was good to have them while developing this, but I don't think there's a lot of value in including them permanently in the regression suite. Low-level things like the alignment-sensitive test are fragile, and can easily stop testing the thing it's supposed to test, depending on the platform and future changes in the code. And the current algorithm doesn't care much about alignment anyway. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 12.3.2014 21:58, Peter Geoghegan wrote: The use case you describe here doesn't sound like something similar to full text search. It sounds like something identical. I think this very depends on the definition of full text search. In any case, let's focus on what we have right now. I think that the indexing facilities proposed here are solid. In any case they do not preclude working on better indexing strategies as the need emerges. +1 Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 12, 2014 at 2:30 PM, Tomas Vondra t...@fuzzy.cz wrote: I think that's unfounded assumption. Many users actually have very little control over the documents or queries - a nice example may be the mail archive, with headers stored in a hstore/jsonb. I have absolutely no control over the headers or queries. Maybe, but what do you want me to do to help them? Indexing a typical jsonb field is a bad idea, unless you really do want something essentially equivalent to full text search (which could be justified), or unless you know ahead of time that your documents are not going to be heavily nested. The whole basis of your complaints seems to be that people won't know that at all. For many usecases, expressional indexes are the right tool. But not for all and I see no reason to just throw some tools away. If the tool you're talking about throwing away is the GiST opclass, I do not propose to throw that away. I don't think it's important enough to justify inclusion in our first cut at this, especially given the fact that the code has bugs, and is quite a bit more complex than GIN. What's wrong with those reasons? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 12.3.2014 21:55, Josh Berkus wrote: Andrew, Peter: Just so I'm clear on the limits here, lemme make sure I understand this: a) GIN indexing is limited to ~~1500chars The exact message I get is this: ERROR: index row size 1944 exceeds maximum 1352 for index tmp_idx so it's 1352B. But IIRC this is closely related to block size, so with larger block sizes you'll get different limits. Also, this is a limit on compressed value, which makes it less user-friendly as it's difficult to predict whether the row is OK or not :-( And I just discovered this: create table tmp (val jsonb); create index tmp_gin_idx on tmp using gin (val); insert into tmp select ('{z : ' || repeat('z', 100) || '}')::jsonb; which tries to insert a well-compressible string ('z' repeated 1e6-times), and fails with this: ERROR: index row requires 11472 bytes, maximum size is 8191 So I think it's quite difficult to give simple and exact explanation in the docs, other than there are limits, but it's difficult to say when you hit them. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
* Tomas Vondra (t...@fuzzy.cz) wrote: So I think it's quite difficult to give simple and exact explanation in the docs, other than there are limits, but it's difficult to say when you hit them. Arrays have more-or-less the same issue... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] jsonb and nested hstore
On 12.3.2014 22:43, Peter Geoghegan wrote: On Wed, Mar 12, 2014 at 2:30 PM, Tomas Vondra t...@fuzzy.cz wrote: I think that's unfounded assumption. Many users actually have very little control over the documents or queries - a nice example may be the mail archive, with headers stored in a hstore/jsonb. I have absolutely no control over the headers or queries. Maybe, but what do you want me to do to help them? Indexing a typical jsonb field is a bad idea, unless you really do want something essentially equivalent to full text search (which could be justified), or unless you know ahead of time that your documents are not going to be heavily nested. The whole basis of your complaints seems to be that people won't know that at all. Well, I would be quite happy with the GIN indexing without the limit I ran into. I don't think we need to invent something entirely new. You're right that the index is pretty futile with a condition matching field/value combination. But what if I'm doing a query with multiple such conditions, and the combination matches just a small fraction of rows? GIN index works with that (and the patches from Alexander improve this case tremendously, IIRC). I still don't understand how's this similar to fulltext - that seems pretty unsuitable for a treeish structure, assuming you can't flatten it. Which you can't, if the queries use paths to access just parts of the json value. For many usecases, expressional indexes are the right tool. But not for all and I see no reason to just throw some tools away. If the tool you're talking about throwing away is the GiST opclass, I do not propose to throw that away. I don't think it's important enough to justify inclusion in our first cut at this, especially given the fact that the code has bugs, and is quite a bit more complex than GIN. What's wrong with those reasons? Meh, I accidentally mixed two responses :-/ I have no problem with expression indexes, but it's not a good solution to all problems. I certainly can't use them to achieve what I'd like and I disagree with your assumptions that it doesn't make sense to index everything / non-interesting keys, or that the documents have well-defined structure. I can live with larger / less efficient indexes on all fields. Regarding GiST - I understand your concerns about complexity, and you may be right that not shipping it now is prefferable to shipping it with bugs. The thing is it doesn't have issues with the value lengths, which prevents me from using GIN, and although GiST is slower, it's at least some indexing. But maybe jsonb_hash_ops will work, I haven't tried yet. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..
Oops. Of course shouldn't try and change how INSERT works. Latest version attached. ♜ update_delete_order_by_limit_v2.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Store Extension Options
On Mon, Mar 10, 2014 at 9:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I haven't touched pg_dump yet, but if this proposed design sits well with everyone, my intention is that the dump output will contain the pg_register_option_namespace() calls necessary so that a table definition will be able to do the SET calls to set the values the original table has, and succeed. In other words, restoring a dump will preserve the values you had, without a need of having the module loaded in the new server. I think this is what was discussed. Robert, do you agree? No, I wasn't imagining anything like pg_register_option_namespace(). My thought was that you'd need to have any relevant modules loaded at restore time. In essence, patching in a new option via an extension module would work about like adding one by patching the core code: you need a server version that supports that option in order to set it. I don't like the idea of using reloptions to let people attach arbitrary unvalidated settings to tables. I consider the way things work with GUCs to be a bug, not a feature, and definitely not something I want to propagate into every other area of the system where the underlying storage format happens to allow it. I also kind of think that what you're going to find if you try to press forward with the pg_register_option_namespace() idea is that what you really want is CREATE RELOPTION NAMESPACE, ALTER RELOPTION NAMESPACE, DROP RELOPTION NAMESPACE. Short of that, you're going to end up with a bunch of kludges, I suspect. And some kind of real DDL syntax (with better naming) is OK with me, but as you observed elsewhere on the thread, now you're looking at a new catalog and a bunch more complexity. I kind of think that this is too half-baked for 9.4 and we ought to punt it to 9.5. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Wed, Mar 12, 2014 at 5:30 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Ok, great. Committed! Awesome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On Thu, Mar 13, 2014 at 5:45 AM, Thom Brown t...@linux.com wrote: On 12 March 2014 19:00, Josh Berkus j...@agliodbs.com wrote: All: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? I'm not clear on why would dropping an active replication slot would solve disk space problems related to WAL. I thought it was inactive slots that were the problem in this regard? You could still have an active slot with a standby that is not able to catch up AFAIK. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rowtype column and domain subfield with DEFAULT and NOT NULL constraint
On Wed, Mar 12, 2014 at 11:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: Julien Tachoires julien.tachoi...@dalibo.com writes: A customer has reported us a strange behaviour regarding a rowtype column with a domain subfield: Rowtypes in general do not support defaults for component fields. And what about adding a TODO item? Support default values for component fields of rowtypes We could as well for the time being improve the documentation to mention that with some examples. For example with some more content on the page of INSERT. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
Hi, On 13/03/14 03:27, Fujii Masao wrote: Committed! Thank you very much! Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpkDoVMmXIL4.pgp Description: PGP signature
Re: [HACKERS] db_user_namespace a temporary measure
On Wed, Mar 12, 2014 at 9:19 AM, Andres Freund and...@2ndquadrant.com wrote: Isn't this just a case of creating a suitable operator and an exclusion constraint? Defining the constraint in BKI might require extra infrastructure, but it should be possible. Except that we don't have the infrastructure to perform such checks (neither partial, nor expression indexes, no exclusion constraints) on system tables atm. So it's not a entirely trivial thing to do. I'm probably woefully underinformed here, but it seems like getting exclusion constraints working might be simpler than partial indexes or expression indexes, because both of those involve being able to evaluate arbitrary predicates, whereas exclusion constraints just involve invoking index access methods to look for conflicting rows via smarts built into your index AM. The latter seems to involve less risk of circularity (but I might be wrong). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On 12 March 2014 23:17, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Mar 13, 2014 at 5:45 AM, Thom Brown t...@linux.com wrote: On 12 March 2014 19:00, Josh Berkus j...@agliodbs.com wrote: All: I was just reading Michael's explanation of replication slots (http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-replication-slots/) and realized there was something which had completely escaped me in the pre-commit discussion: select pg_drop_replication_slot('slot_1'); ERROR: 55006: replication slot slot_1 is already active LOCATION: ReplicationSlotAcquire, slot.c:339 What defines an active slot? It seems like there's no way for a DBA to drop slots from the master if it's rapidly running out of disk WAL space without doing a restart, and there's no way to drop the slot for a replica which the DBA knows is permanently offline but was connected earlier. Am I missing something? I'm not clear on why would dropping an active replication slot would solve disk space problems related to WAL. I thought it was inactive slots that were the problem in this regard? You could still have an active slot with a standby that is not able to catch up AFAIK. In that scenario, why would one wish to drop the replication slot? If it can't keep up, dropping the replication slot would likely mean you'd orphan the standby due to the primary no longer holding on to the necessary WAL, and the standby is then useless. In which case, if the standby is causing such problems, why not shut down that standby, thereby effectively decommissioning it, then delete the slot? -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9a57858f1103b89a5674f0d50c5fe1f756411df6
On the pgsql-packagers list, there has been some (OT for that list) discussion of whether commit 9a57858f1103b89a5674f0d50c5fe1f756411df6 is sufficiently serious to justify yet another immediate minor release of 9.3.x. The relevant questions seem to be: 1. Is it really bad? 2. Does it affect a lot of people or only a few? 3. Are there more, equally bad bugs that are unfixed, or perhaps even unreported, yet? Obviously, we don't want to leave serious bugs unpatched. On the other hand, as Tom pointed out in that discussion, releases are a lot of work, and we can't do them for every commit. Discuss. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql XML parsing
Hello, On 03/12/2014 09:36 AM, Ashoke wrote: Hi, I am working on adding a functionality to PostgreSQL. I need to parse the XML format query plan (produced by PostgreSQL v9.3) and save it in a simple data structure (say C structure). I was wondering if ... The only XML parsing we have is where Postgres is built with libxml, in which case we use its parser. But query plan XML is delivered to a client (or a log file, which means more or less the same thing here). As a HACKERS' matter, explain output can be obtained from ExplainPrintPlan() in any format in backend. I don't know if it is the case though. If you want to parse it then it should be parsed in the client - that's why we provide it. Inside postgres I don't see a point in parsing the XML rather than handling the query plan directly. The worst possible option would be to make a hand-cut XML parser, either in the client or the server - XML parsing has all sorts of wrinkles that can bite you badly. I agree with it. If XML input is not essential, JSON format would be parsed more easily than xml. 9.3 already intrinsically has a JSON parser infrastructure available for the purpose. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Store Extension Options
On 03/12/2014 03:58 PM, Robert Haas wrote: I don't like the idea of using reloptions to let people attach arbitrary unvalidated settings to tables. I consider the way things work with GUCs to be a bug, not a feature, and definitely not something I want to propagate into every other area of the system where the underlying storage format happens to allow it. +1. Relopts are one of the uglier warts we have. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication slots and footguns
On 03/12/2014 04:52 PM, Thom Brown wrote: On 12 March 2014 23:17, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Mar 13, 2014 at 5:45 AM, Thom Brown t...@linux.com wrote: I'm not clear on why would dropping an active replication slot would solve disk space problems related to WAL. I thought it was inactive slots that were the problem in this regard? You could still have an active slot with a standby that is not able to catch up AFAIK. In that scenario, why would one wish to drop the replication slot? If it can't keep up, dropping the replication slot would likely mean you'd orphan the standby due to the primary no longer holding on to the necessary WAL, and the standby is then useless. In which case, if the standby is causing such problems, why not shut down that standby, thereby effectively decommissioning it, then delete the slot? The problem I'm anticipating is that the replica server is actually offline, but the master doesn't know it yet. So here's the situ: 1. replica with a slot dies 2. wal logs start piling up and master is running low on disk space 3. replica is still marked active because we're waiting for default tcp timeout (3+ hours) or for the proxy to kill the connection (forever). But as Andres has shown, there's a two ways to fix the above. So we're in good shape. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] db_user_namespace a temporary measure
Robert Haas robertmh...@gmail.com writes: On Wed, Mar 12, 2014 at 9:19 AM, Andres Freund and...@2ndquadrant.com wrote: Except that we don't have the infrastructure to perform such checks (neither partial, nor expression indexes, no exclusion constraints) on system tables atm. So it's not a entirely trivial thing to do. I'm probably woefully underinformed here, but it seems like getting exclusion constraints working might be simpler than partial indexes or expression indexes, because both of those involve being able to evaluate arbitrary predicates, whereas exclusion constraints just involve invoking index access methods to look for conflicting rows via smarts built into your index AM. The latter seems to involve less risk of circularity (but I might be wrong). You might be right. I don't think anyone's ever looked at what it would take to support that particular case. We have looked at the other cases and run away screaming ... but I think that was before exclusion constraints existed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers