Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2014-02-05 12:36:42 -0500, Robert Haas wrote: It may well be that your proposal is spot on. But I'd like to see some data-structure-by-data-structure measurements, rather than assuming that alignment must be a good thing. I am fine with just aligning BufferDescriptors properly. That has clearly shown massive improvements. I thought your previous idea of increasing BUFFERALIGN to 64 bytes had a lot to recommend it. Good. I wonder if we shouldn't move that bit of logic: if (size = BUFSIZ) newStart = BUFFERALIGN(newStart); out of ShmemAlloc() and instead have a ShmemAllocAligned() and ShmemInitStructAligned() that does it. So we can sensibly can control it per struct. But that doesn't mean it doesn't need testing. I feel the need here, to say that I never said it doesn't need testing and never thought it didn't... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote: Hmm, sounds a little steep. Why is it so expensive? I'm probably missing something here, because I would have thought that planner support for partial sorts would consist mostly of considering the same sorts we consider today, but with the costs reduced by the batching. I guess it's because the patch undoes some optimizations in the mergejoin planner wrt caching merge clauses and adds a whole lot of code to find_mergeclauses_for_pathkeys. In other code paths the overhead does seem to be negligible. Notice the removal of: /* Select the right mergeclauses, if we didn't already */ /* * Avoid rebuilding clause list if we already made one; * saves memory in big join trees... */ Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 2014-02-06 05:43, Craig Ringer wrote: Based on Tom's objections, another approach is presented in rls-9.4-upd-sb-views-v5 on g...@github.com:ringerc/postgres.git . The Query node is used to record the recursive expansion parent list instead, and copying is avoided. Cannot fetch or clone. github on web says This repository is temporarily unavailable. The backend storage is temporarily offline. Usually this means the storage server is undergoing maintenance. Please contact support if the problem persists. regards, Yeb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication
On 2014-02-05 13:26:15 -0500, Robert Haas wrote: On Wed, Feb 5, 2014 at 1:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: It feels weird to me that the new columns are called transactionid and xmin. Why not xid and xmin? Actually the part of that that bothers me is xmin, which conflicts with a reserved system column name. While you can legally pick such conflicting names for view columns, it's not going to be so much fun when you try to join that view against some regular table. That's a fair point, too. So maybe we should go with something like backend_xid and backend_xmin or some other prefix that we come up with. My concern is more that I think they should be consistent somehow. Those work for me. We have a bit of a confusing situation atm, pg_prepared_xact calls it's xid transaction, pg_locks transactionid... So if we add a new speling, we should like it sufficiently to use it in the future. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [doc patch] extra_float_digits and casting from real to numeric
Re: Robert Haas 2014-02-05 ca+tgmozy4kkaptkf2pxcoqnt-jb0ysp8w2php_qge62nilv...@mail.gmail.com literal0/literal, the output is the same on every platform supported by PostgreSQL. Increasing it will produce output that more accurately represents the stored value, but may be unportable. + Casts to other numeric datatypes and the literalto_char/literal + function are not affected by this setting, it affects only the text + representation. /para /note Anyone for that patch? Well, the new text kinda recapitulates what the existing text already says. If we're going to clarify, I'd do it like this: The xref linkend=guc-extra-float-digits setting controls the number of extra significant digits included when a floating point value is converted to text for output. It does not affect the results when a floating point number is converted to some other data type or formatted using literalto_char/literal. But frankly I'm inclined to just leave it alone. It says that it affects what happens when the value is converted to text for output. That's specific and accurate. Granted, someone could misunderstand, but that's true of almost anything we might write, and being too long-winded has costs of its own. Yes, the original text is correct. The point is that the original text doesn't really make the reader (who might think the numbers he's seeing on the screen will also be used for computation/inserts) aware that he's looking at something very different from what will actually be used internally. I'm sure we were not the only one to stumble over that problem, and I even knew what extra_float_digits is for, yet I didn't connect the dots in the beginning. Including some explicit heads-up sentence in there solves that problem, either your version, or mine. Mit freundlichen Grüßen, Christoph Berg -- Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer signature.asc Description: Digital signature
Re: [HACKERS] Minor performance improvement in transition to external sort
On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). I think Wikipedia is right. Inserting an a tuple into a heap is O(lg n); doing that n times is therefore O(n lg n). Your patch likewise implements an outer loop which runs O(n) times and an inner loop which runs at most O(lg n) times, so a naive analysis of that algorithm also comes out to O(n lg n). The patch implements a siftdown. Measurements attached. -- Cheers, Jeremy work_mem baseline Sift-down baseline K siftdown K K ratio baseline K siftdown K K ratio cmps time cmps time cmps cmps cmps time time time 1024 27271 0.001 22426 0.001 26.6 21.9 82% 6.76E-07 4.93E-07 73% 2048 52153 0.001 44557 0.001 25.5 21.8 85% 6.28E-07 4.47E-07 71% 4096 108660 0.003 89591 0.002 26.5 21.9 82% 6.47E-07 4.63E-07 72% 8192 217341 0.005 179191 0.004 26.5 21.9 82% 6.57E-07 4.88E-07 74% 16384 434101 0.011 358680 0.009 26.5 21.9 83% 6.55E-07 5.42E-07 83% 32768 871110 0.022 717542 0.019 26.6 21.9 82% 6.60E-07 5.74E-07 87% 65536 1740355 0.043 1434967 0.039 26.6 21.9 82% 6.59E-07 5.96E-07 90% 131072 3478497 0.087 2869190 0.078 26.5 21.9 82% 6.62E-07 5.98E-07 90% 262144 6962693 0.173 5740518 0.154 26.6 21.9 82% 6.58E-07 5.89E-07 89% 524288 13912236 0.345 11476660 0.311 26.5 21.9 82% 6.59E-07 5.93E-07 90% 1048576 27839260 0.690 22957368 0.622 26.5 21.9 82% 6.58E-07 5.93E-07 90% -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST support for inet datatypes
2014-01-19 12:10, Emre Hasegeli e...@hasegeli.com: 2014-01-19 Andreas Karlsson andr...@proxel.se: I am a bit suspicious about your memcmp based optimization in bitncommon, but it could be good. Have you benchmarked it compared to doing the same thing with a loop? I did, when I was writing that part. I will be happy to do it again. I will post the results. I was testing it by creating GiST indexes. I realized that these test are inconsistent when BUFFERING = AUTO. I repeated them with BUFFERING = ON. The function without memcmp was faster in this case. I will change the function in the next version of the patch. The test case: Create table Network as select (a || '.' || b || '.' || c || '/24')::cidr from generate_series(0, 255) as a, generate_series(0, 255) as b, generate_series(0, 255) as c; Drop index if exists N; Create index N on Network using gist(cidr) with (buffering = on); Create table Network6 as select ('::' || to_hex(a) || ':' || to_hex(b))::inet from generate_series(0, 255) as a, generate_series(0, 65535) as b; Drop index if exists N6; Create index N6 on Network6 using gist(inet) with (buffering = on); What I could not understand is the tests with IP version 6 was much faster. The row count is same, the index size is bigger. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
Hello, I've understood how this works and seems working as expected. Anyway this is just a test module so if things works for you by changing the above way, its fine. However I wonder why its not generating .def file for you. Surely. Getting back on topic, using dsm_keep_segment, I saw postmaster keeping the section handle for the dsm segment and any session ever after the creating session gone off could recall the segment, unlike dsm_keep_mapping couldn't retain them after end of the creating session. And this exactly resembles linux version in behavior including attach failure. The orphan section handles on postmaster have become a matter of documentation. Besides all above, I'd like to see a comment for the win32 code about the 'DuplicateHandle hack', specifically, description that the DuplicateHandle pushes the copy of the section handle to the postmaster so the section can retain for the postmaster lifetime. By the way I have one additional comment. All postgres processes already keep a section handle for 'Global/PostgreSQL:pgdata file path' aside from dsm. It tells itself is for the file. On the other hand the names for the dsm sections are 'Global/PostgreSQL.%d'. This seems a bit unkindly as they don't tell what they are of. Do you mind changing it to, say, 'Global/PostgreSQL.dsm.%d' or something like that? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 02/05/2014 12:42 PM, Alexander Korotkov wrote: Attached patch is light version of fast scan. It does extra consistent function calls only on startScanKey, no extra calls during scan of the index. It finds subset of rarest entries absence of which guarantee false consistent function result. Sounds good. Since the extra consistent calls are only performed at startup, it's unlikely to cause any noticeable performance regression, even when it's not helping. I've run real-life tests based of postgresql.org logs (33318 queries). Here is a table with summary time of running whole test case. =# select method, sum(time) from test_result group by method order by method; method| sum -+-- fast-scan-11| 126916.11199 fast-scan-light | 137321.211 heikki | 466751.693 heikki-true-ternary | 454113.62397 master | 446976.288 (6 rows) where 'heikki' is gin-ternary-logic binary-heap preconsistent-only-on-new-page.patch and 'heikki-true-ternary' is version with my catalog changes promoting ternary consistent function to opclass. Looks good. We can see fast-scan-light gives almost same performance benefit on queries. However, I test only CPU effect, not IO effect. Any thoughts? This test doesn't handle lossy-pages correctly: /* * Check if all items marked as scanEntryUseForSkip is not present in * minItem. If so, we can correct advancePast. */ if (ginCompareItemPointers(minItem, minItemSkip) 0) { advancePast = minItemSkip; advancePast.ip_posid--; continue; } If minItemSkip is a lossy page, we skip all exact items on the same page. That's wrong, here's a test case to demonstrate: CREATE TABLE foo (ts tsvector); INSERT INTO foo SELECT to_tsvector('foo bar'||g) from generate_series(1, 100) g; CREATE INDEX i_foo ON foo USING gin (ts); set work_mem='64'; -- to force lossy pages -- should return 11 select count(*) from foo where ts @@ to_tsquery('foo bar4:*'); After some fiddling (including fixing the above), I ended up with the attached version of your patch. I still left out the catalog changes, again not because I don't think we should have them, but to split this into smaller patches that can be reviewed separately. I extended the both ways shim function to work with more than one maybe input. Of course, that is O(n^2), where n is the number of maybe arguments, so that won't scale, but it's OK for testing purposes. And many if not most real world queries, too. I had a very hard time understanding what it really means for an entry to be in the scanEntryUseForSkip array. What does it mean to use an entry for skipping? So I refactored that logic, to hopefully make it more clear. In essence, the entries are divided into two groups, required and other items. If none of the entries in the required set are true, then we know that the overall qual cannot match. See the comments for a more detailed explanation. I'm not wedded to the term required here; one might think that *all* the entries in the required set must be TRUE for a match, while it's actually that at least one of them must be TRUE. In keyGetItem, we fetch the minimum item among all the entries in the requiredEntries set. That's the next item we're going to return, regardless of any items in otherEntries set. But before calling the consistent function, we advance the other entries up to that point, so that we know whether to pass TRUE or FALSE to the consistent function. IOW, otherEntries only provide extra information for the consistent function to decide if we have a match or not - they don't affect which items we return to the caller. I think this is pretty close to committable state now. I'm slightly disappointed that we can't do the skipping in more scenarios. For example, in foo bar, we can currently skip bar entry up to the next foo, but we cannot skip foo up to the next bar. But this is fairly simple, and since we sort the entries by estimated frequency, should already cover all the pathological rare frequent type queries. So that's OK. - Heikki diff --git a/src/backend/access/gin/Makefile b/src/backend/access/gin/Makefile index aabc62f..db4f496 100644 --- a/src/backend/access/gin/Makefile +++ b/src/backend/access/gin/Makefile @@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global OBJS = ginutil.o gininsert.o ginxlog.o ginentrypage.o gindatapage.o \ ginbtree.o ginscan.o ginget.o ginvacuum.o ginarrayproc.o \ - ginbulk.o ginfast.o ginpostinglist.o + ginbulk.o ginfast.o ginpostinglist.o ginlogic.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c index a45d722..b6f521c 100644 ---
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
Hello, let me say in passing, However I wonder why its not generating .def file for you. Is the 'it' is Visual Studio IDE or CL? Mmm, as far as I know .def file is a stuff that programmers should write by their hands as a matter of course. I've found no way to automatically generate .def files.. (However itself would be no matter.) regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Thu, Feb 6, 2014 at 2:21 PM, Heikki Linnakangas hlinnakan...@vmware.comwrote: On 02/05/2014 12:42 PM, Alexander Korotkov wrote: Attached patch is light version of fast scan. It does extra consistent function calls only on startScanKey, no extra calls during scan of the index. It finds subset of rarest entries absence of which guarantee false consistent function result. Sounds good. Since the extra consistent calls are only performed at startup, it's unlikely to cause any noticeable performance regression, even when it's not helping. I've run real-life tests based of postgresql.org logs (33318 queries). Here is a table with summary time of running whole test case. =# select method, sum(time) from test_result group by method order by method; method| sum -+-- fast-scan-11| 126916.11199 fast-scan-light | 137321.211 heikki | 466751.693 heikki-true-ternary | 454113.62397 master | 446976.288 (6 rows) where 'heikki' is gin-ternary-logic binary-heap preconsistent-only-on-new-page.patch and 'heikki-true-ternary' is version with my catalog changes promoting ternary consistent function to opclass. Looks good. We can see fast-scan-light gives almost same performance benefit on queries. However, I test only CPU effect, not IO effect. Any thoughts? This test doesn't handle lossy-pages correctly: /* * Check if all items marked as scanEntryUseForSkip is not present in * minItem. If so, we can correct advancePast. */ if (ginCompareItemPointers(minItem, minItemSkip) 0) { advancePast = minItemSkip; advancePast.ip_posid--; continue; } If minItemSkip is a lossy page, we skip all exact items on the same page. That's wrong, here's a test case to demonstrate: CREATE TABLE foo (ts tsvector); INSERT INTO foo SELECT to_tsvector('foo bar'||g) from generate_series(1, 100) g; CREATE INDEX i_foo ON foo USING gin (ts); set work_mem='64'; -- to force lossy pages -- should return 11 select count(*) from foo where ts @@ to_tsquery('foo bar4:*'); After some fiddling (including fixing the above), I ended up with the attached version of your patch. I still left out the catalog changes, again not because I don't think we should have them, but to split this into smaller patches that can be reviewed separately. I extended the both ways shim function to work with more than one maybe input. Of course, that is O(n^2), where n is the number of maybe arguments, so that won't scale, but it's OK for testing purposes. And many if not most real world queries, too. I had a very hard time understanding what it really means for an entry to be in the scanEntryUseForSkip array. What does it mean to use an entry for skipping? So I refactored that logic, to hopefully make it more clear. In essence, the entries are divided into two groups, required and other items. If none of the entries in the required set are true, then we know that the overall qual cannot match. See the comments for a more detailed explanation. I'm not wedded to the term required here; one might think that *all* the entries in the required set must be TRUE for a match, while it's actually that at least one of them must be TRUE. In keyGetItem, we fetch the minimum item among all the entries in the requiredEntries set. That's the next item we're going to return, regardless of any items in otherEntries set. But before calling the consistent function, we advance the other entries up to that point, so that we know whether to pass TRUE or FALSE to the consistent function. IOW, otherEntries only provide extra information for the consistent function to decide if we have a match or not - they don't affect which items we return to the caller. I think this is pretty close to committable state now. I'm slightly disappointed that we can't do the skipping in more scenarios. For example, in foo bar, we can currently skip bar entry up to the next foo, but we cannot skip foo up to the next bar. But this is fairly simple, and since we sort the entries by estimated frequency, should already cover all the pathological rare frequent type queries. So that's OK. Sorry for my scanty explanation. Your version of patch looks good for me. I've rerun my testcase: =# select method, sum(time) from test_result group by method order by method; method | sum +-- fast-scan-11 | 126916.11199 fast-scan-light| 137321.211 fast-scan-light-heikki | 138168.02801 heikki | 466751.693 heikki-tru-ternary | 454113.62397 master | 446976.288 tri-state
[HACKERS] Small GIN optimizations (after 9.4)
While hacking on the GIN patches, I've come up with a few different ideas for improving performance. It's too late for 9.4, but I'll list them here if someone wants to work on them later: * Represent ItemPointers as uint64's, to speed up comparisons. ginCompareItemPointers is inlined into only a few instructions, but it's still more expensive than a single-instruction 64-bit comparison. ginCompareItemPointers is called very heavily in a GIN scan, so even a small improvement there would make for a noticeable speedup. It might be an improvement in code clarity, too. * Keep the entry streams of a GinScanKey in a binary heap, to quickly find the minimum curItem among them. I did this in various versions of the fast scan patch, but then I realized that the straightforward way of doing it is wrong, because a single GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is updated as part of advancing one key, and the entry is in a heap of another key, updating the curItem can violate the heap property of the other entry's heap. * Build a truth table (or cache) of consistent-function's results, and use that instead of calling consistent for every item. * Deduce AND or OR logic from the consistent function. Or have the opclass provide a tree of AND/OR/NOT nodes directly, instead of a consistent function. For example, if the query is foo bar, we could avoid calling consistent function altogether, and only return items that match both. * Delay decoding segments during a scan. Currently, we decode all segments of a posting tree page into a single array at once. But with fast scan, we might be able to skip over all entries in some of the segments. So it would be better to copy the segments into backend-private memory in compressed format, and decode them one segment at a time (or maybe even one item at a time), when needed. That would avoid the unnecessary decoding of segments that can be skipped over, and would also reduce memory usage of a scan. I'll add these to the TODO. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Thu, Feb 6, 2014 at 9:13 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Feb 5, 2014 at 8:50 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/05/2014 04:48 PM, Amit Kapila wrote: I have done one test where there is a large suffix match, but not large enough that it can compress more than 75% of string, the CPU overhead with wal-update-prefix-suffix-encode-1.patch is not much, but there is no I/O reduction as well. Hmm, it's supposed to compress if you save at least 25%, not 75%. Apparently I got that backwards in the patch... So If I understand the code correctly, the new check should be if (prefixlen + suffixlen (slen * need_rate) / 100) return false; rather than if (slen - prefixlen - suffixlen (slen * need_rate) / 100) return false; Considering above change as correct, I have tried to see the worst case overhead for this patch by having new tuple such that after 25% or so of suffix/prefix match, there is a small change in tuple and kept rest of tuple same as old tuple and it shows overhead for this patch as well. Updated test script is attached. Unpatched testname | wal_generated | duration --+---+-- ten long fields, 8 bytes changed | 348843824 | 5.56866788864136 ten long fields, 8 bytes changed | 348844800 | 5.84434294700623 ten long fields, 8 bytes changed | 35050 | 5.92329406738281 (3 rows) wal-update-prefix-suffix-encode-1.patch testname | wal_generated | duration --+---+-- ten long fields, 8 bytes changed | 348845624 | 6.92243480682373 ten long fields, 8 bytes changed | 348847000 | 8.35828399658203 ten long fields, 8 bytes changed | 350204752 | 7.61826491355896 (3 rows) One minor point, can we avoid having prefix tag if prefixlen is 0. + /* output prefix as a tag */ + pgrb_out_tag(ctrlp, ctrlb, ctrl, bp, prefixlen, hlen); With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com wal-update-testsuite.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On Thu, Feb 6, 2014 at 12:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Feb 5, 2014 at 2:26 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Then again, why is the behavior of schema-qualifying absolutely everything even desirable? Well, someone could create a collation in another schema with the same name as a system collation and the command would become ambiguous. Hmm, good point. I guess we don't worry much about this with pg_dump because we assume that we're restoring into an empty database (and if not, the user gets to keep both pieces). You're applying a higher standard here. Robert, that's just horsepucky. pg_dump is very careful about schemas. It's also careful to not schema-qualify names unnecessarily, which is an intentional tradeoff to improve readability of the dump --- at the cost that the dump might break if restored into a nonempty database with conflicting objects. In the case of data passed to event triggers, there's a different tradeoff to be made: people will probably value consistency over readability, so always-qualify is probably the right choice here. But in neither case are we being sloppy. I didn't mean to imply otherwise. I know the pg_dump tries to avoid excess schema-qualification for readability among other reasons; what I meant was that Alvaro is applying a higher standard specifically in regards to replayability. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Wed, Feb 5, 2014 at 8:56 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Feb 5, 2014 at 5:13 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/05/2014 07:54 AM, Amit Kapila wrote: That's not the worst case, by far. First, note that the skipping while scanning new tuple is only performed in the first loop. That means that as soon as you have a single match, you fall back to hashing every byte. So for the worst case, put one 4-byte field as the first column, and don't update it. Also, I suspect the runtimes in your test were dominated by I/O. When I scale down the number of rows involved so that the whole test fits in RAM, I get much bigger differences with and without the patch. You might also want to turn off full_page_writes, to make the effect clear with less data. So with this test, the overhead is very significant. With the skipping logic, another kind of worst case case is that you have a lot of similarity between the old and new tuple, but you miss it because you skip. This is exactly the reason why I have not kept skipping logic in second pass(loop), but I think may be it would have been better to keep it not as aggressive as in first pass. I have tried to merge pass-1 and pass-2 and kept skipping logic as same, and it have reduced the overhead to a good extent but not completely for the new case you have added. This change is to check if it can reduce overhead, if we want to proceed, may be we can limit the skip factor, so that chance of skipping some match data is reduced. New version of patch is attached with mail Unpatched testname | wal_generated | duration --+---+-- ten long fields, all changed | 348842856 | 6.93688106536865 ten long fields, all changed | 348843672 | 7.53063702583313 ten long fields, all changed | 352662344 | 7.76640701293945 (3 rows) pgrb_delta_encoding_v8.patch testname | wal_generated | duration --+---+-- ten long fields, but one changed | 348848144 | 9.22694897651672 ten long fields, but one changed | 348841376 | 9.11818099021912 ten long fields, but one changed | 352963488 | 8.37875485420227 (3 rows) pgrb_delta_encoding_v9.patch testname | wal_generated | duration --+---+-- ten long fields, but one changed | 350166320 | 8.84561610221863 ten long fields, but one changed | 348840728 | 8.45299792289734 ten long fields, but one changed | 348846656 | 8.34846496582031 (3 rows) It appears to me that it can be good idea to merge both the patches (prefix-suffix encoding + delta-encoding) in a way such that if we get reasonable compression (50% or so) with prefix-suffix, then we can return without doing delta encoding and if compression is lesser than we can do delta encoding for rest of tuple. The reason I think it will be good because by just doing prefix-suffix we might leave many cases where good compression is possible. If you think it is viable way, then I can merge both the patches and check the results. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com pgrb_delta_encoding_v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 02/06/2014 04:54 PM, Yeb Havinga wrote: On 2014-02-06 05:43, Craig Ringer wrote: Based on Tom's objections, another approach is presented in rls-9.4-upd-sb-views-v5 on g...@github.com:ringerc/postgres.git . The Query node is used to record the recursive expansion parent list instead, and copying is avoided. Cannot fetch or clone. github on web says This repository is temporarily unavailable. The backend storage is temporarily offline. Usually this means the storage server is undergoing maintenance. Please contact support if the problem persists. If this persists, or someone else has the same issue, try the HTTPS url: https://github.com/ringerc/postgres.git per: http://github.com/ringerc/postgres/ (I need to chase up my git.postgresql.org account request, it seems to have got lost). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
On Thu, Feb 6, 2014 at 4:10 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, let me say in passing, However I wonder why its not generating .def file for you. Is the 'it' is Visual Studio IDE or CL? Mmm, as far as I know .def file is a stuff that programmers should write by their hands as a matter of course. Using MSVC. We have gendef.pl which can do it. Example in Postgres project properties, in Configuration Properties-Build Events-Pre-Link Event, there is a Command Line like below which can do the required work. perl src\tools\msvc\gendef.pl Debug\postgres x64 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On 7/20/13, 9:06 PM, Greg Stark wrote: In theory this tool is promising though since it works by looking at the llvm bytecode to determine what the real syntax is. It should be able to handle the typedef issues we have with most of the the tools. Since clang-format has no options to specify include file paths, I doubt that it actually analyzes the file to the extent that it knows about typedefs. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On 2013-07-21 02:06:02 +0100, Greg Stark wrote: On Thu, Jun 27, 2013 at 10:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: AFAIR, no one has ever done a serious comparison to anything except GNU indent, and (at least at the time) it seemed to have bugs as bad as pgindent's, just different ones. I'm certainly open to another choice as long as we don't lose on portability of the tool. But who will do the legwork to test something else? Fwiw I played with clang-format a bit the other day. But I couldn't get it to resemble our coding style. It was strongly of the opinion that spacing within lines should be completely consistent and that meant it eliminated all spaces that lined up initializers and such. FWIW, there's some work going on to add more options (like pg correct differences between function definitions and declarations). So it might develop into something usable for pg. What I haven't seen in any tool yet is something to resemble our indentation style for variable declarations. Do we feel that's a dealbreaker for any tool? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 02/06/2014 12:43 PM, Craig Ringer wrote: 1. Try (again) to do row-security in the rewriter. This was previously impossible because of the definition of row-security behaviour around inheritance, but with the simplified inheritance model now proposed I think it's possible. Thanks to the simplified requirements for inheritance, this turns out to be fairly easy. There's a version rewritten to use the rewriter in the tag: rls-9.4-upd-sb-views-v6 on https://github.com/ringerc/postgres.git The trickiest bit remaining is how to register the PlanInvalItem to force plan invalidation when the user-id changes. This was easy in the optimizer, but it's not obvious how to do it cleanly in the rewriter. I've got a couple of ideas but don't much like either of them. Recommendations from the experienced welcomed. Other more minor open items, each of which look quite quick: I haven't plugged in rewriter-based recursion detection yet, but that looks fairly easy (famous last words?). It's 10pm and I have an early start, so that won't happen tonight. I haven't removed some cruft from the previous approach from the Query node either; I need to trim the diff. The regression test expected file needs adjustment to match the new inheritance rules, and to cover a couple of new tests I've added. Needs a better error when it gets an unacceptable rewrite product during security qual rewriting (modeled on the errors for CTEs). Easy, just some cookie cutter code. Need to cherry-pick the docs patch back on top of the patch tree now things are settling. Amazingly, I think this has a hope. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris j...@wizmail.org wrote: On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). I think Wikipedia is right. Inserting an a tuple into a heap is O(lg n); doing that n times is therefore O(n lg n). Your patch likewise implements an outer loop which runs O(n) times and an inner loop which runs at most O(lg n) times, so a naive analysis of that algorithm also comes out to O(n lg n). The patch implements a siftdown. Measurements attached. Uh, this spreadsheet is useless. You haven't explained how these numbers were generated, or what the columns in the spreadsheet actually mean. Ideally, you should include enough information that someone else can try to reproduce your results. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 02/06/2014 10:19 PM, Craig Ringer wrote: On 02/06/2014 12:43 PM, Craig Ringer wrote: 1. Try (again) to do row-security in the rewriter. This was previously impossible because of the definition of row-security behaviour around inheritance, but with the simplified inheritance model now proposed I think it's possible. Thanks to the simplified requirements for inheritance, this turns out to be fairly easy. There's a version rewritten to use the rewriter in the tag: rls-9.4-upd-sb-views-v6 on https://github.com/ringerc/postgres.git The trickiest bit remaining is how to register the PlanInvalItem to force plan invalidation when the user-id changes. This was easy in the optimizer, but it's not obvious how to do it cleanly in the rewriter. I've got a couple of ideas but don't much like either of them. Recommendations from the experienced welcomed. Or, after thinking about it for a second with my tired brain, not so much. We don't rerun rewrite on plan invalidation. So that means the superuser exemption won't work properly with this patch. So much for having a hope, that's not a small thing to fix. So: either I invoke the rewriter from within the optimizer on the security quals, or I make the rewriter re-run on plan invalidation. Neither is small or simple. Blast. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] open and close columns in the NEW record not allowed
Hello One of our users is having a problem with a trigger in a system running postgresql 9.3. The problem is that pl/pgsql does not accept open and close as column names when used in the NEW record in a trigger function. This page: http://www.postgresql.org/docs/9.3/static/sql-keywords-appendix.html does not say that they are reserved words in postgresql (although they are reserved words in the sql standard) In the other hand, postgres allows to create and update tables with columns named open/close without problems. We think the behavior should be consistent, either it is allow to use them or not, but not like it is today. - Test case: - CREATE TABLE test_open(id integer,open timestamp); CREATE TABLE test_close(id integer,close timestamp); CREATE TABLE test_close_trigger(id integer,close timestamp); CREATE TABLE test_open_trigger(id integer,open timestamp); CREATE OR REPLACE FUNCTION test_open() RETURNS trigger LANGUAGE plpgsql AS $function$ BEGIN INSERT INTO test_open_trigger (id, open) VALUES (NEW.id, NEW.open); RETURN NEW; END; $function$; CREATE OR REPLACE FUNCTION test_close() RETURNS trigger LANGUAGE plpgsql AS $function$ BEGIN INSERT INTO test_close_trigger (id, close) VALUES (NEW.id, NEW.close); RETURN NEW; END; $function$; # INSERT INTO test_open (id,open) VALUES (1,now()); INSERT 0 1 # INSERT INTO test_close (id,close) VALUES (1,now()); INSERT 0 1 # SELECT * FROM test_open; id |open + 1 | 2014-02-06 15:17:52.654977 (1 row) # SELECT * FROM test_close; id | close + 1 | 2014-02-06 15:17:53.893911 (1 row) CREATE TRIGGER test_open AFTER INSERT ON test_open FOR EACH ROW EXECUTE PROCEDURE test_open(); CREATE TRIGGER test_close AFTER INSERT ON test_close FOR EACH ROW EXECUTE PROCEDURE test_close(); # INSERT INTO test_open (id,open) VALUES (1,now()); ERROR: record new has no field open LINE 3: VALUES (NEW.id, NEW.open) ^ QUERY: INSERT INTO public.test_open_trigger (id, open) VALUES (NEW.id, NEW.open) CONTEXT: PL/pgSQL function test_open() line 3 at SQL statement # INSERT INTO test_close (id,close) VALUES (1,now()); ERROR: record new has no field close LINE 3: VALUES (NEW.id, NEW.close) ^ QUERY: INSERT INTO public.test_close_trigger (id, close) VALUES (NEW.id, NEW.close) CONTEXT: PL/pgSQL function test_close() line 3 at SQL statement - Thanks in advance. regards, -- Rafael Martinez Guerrero Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On Tue, Feb 4, 2014 at 6:07 PM, David E. Wheeler da...@justatheory.com wrote: The install failed, of course, because extensions want to install in $PGROOT/share/extensions. Homebrew sounds kind of confused. Having a non-root user have access to make global system changes sounds like privilege escalation vulnerability by design. However putting that aside, it is fairly standard for software to provide two directories for extensions/modules/plugins/etc. One for distribution-built software such as /usr/share/emacs/site-lisp/ and another for sysadmin customizations such as /usr/local/share/emacs/site-lisp. The same idea as /usr/share/perl and /usr/local/share/perl or with Python or anything else. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Andres Freund wrote: What I haven't seen in any tool yet is something to resemble our indentation style for variable declarations. Do we feel that's a dealbreaker for any tool? I find our style more aesthetically pleasing than any other I've seen; but anyway the changes if we used a tool that formatted them differently would cause way too much havoc to even be considered, IMO. You know what I wouldn't mind losing? The indentation of our switch/case blocks. It wastes too much valuable space at the left. But then there are ~1000 uses of it in the tree so perhaps it'd be too much trouble as well. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Andres Freund and...@2ndquadrant.com writes: What I haven't seen in any tool yet is something to resemble our indentation style for variable declarations. Do we feel that's a dealbreaker for any tool? Well, we're unlikely to change pgindent's rules to conform to the behavior of some random emacs mode, if that's what you mean ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
Craig Ringer cr...@2ndquadrant.com writes: We don't rerun rewrite on plan invalidation. Don't we? plancache.c certainly does, in fact it starts from the raw grammar output. Skipping the rewriter would mean failing to respond to CREATE OR REPLACE VIEW, for example. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] open and close columns in the NEW record not allowed
On 02/06/2014 06:35 AM, Rafael Martinez Guerrero wrote: Hello One of our users is having a problem with a trigger in a system running postgresql 9.3. The problem is that pl/pgsql does not accept open and close as column names when used in the NEW record in a trigger function. This page: http://www.postgresql.org/docs/9.3/static/sql-keywords-appendix.html does not say that they are reserved words in postgresql (although they are reserved words in the sql standard) In the other hand, postgres allows to create and update tables with columns named open/close without problems. We think the behavior should be consistent, either it is allow to use them or not, but not like it is today. The catch all from here: http://www.postgresql.org/docs/9.3/interactive/sql-keywords-appendix.html is: As a general rule, if you get spurious parser errors for commands that contain any of the listed key words as an identifier you should try to quote the identifier to see if the problem goes away. Which indeed solves the problem on my end at least: test= CREATE OR REPLACE FUNCTION public.test_open() RETURNS trigger LANGUAGE plpgsql AS $function$ BEGIN INSERT INTO test_open_trigger (id, open) VALUES (NEW.id, NEW.open); RETURN NEW; END; $function$ ; test= \d test_open Table public.test_open Column |Type | Modifiers +-+--- id | integer | open | timestamp without time zone | Triggers: test_open AFTER INSERT ON test_open FOR EACH ROW EXECUTE PROCEDURE test_open() test= INSERT INTO test_open (id,open) VALUES (1,now()); INSERT 0 1 Thanks in advance. regards, -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] open and close columns in the NEW record not allowed
Rafael Martinez Guerrero r.m.guerr...@usit.uio.no writes: The problem is that pl/pgsql does not accept open and close as column names when used in the NEW record in a trigger function. Yup. Those words (and other words that can start a plpgsql statement) are reserved so far as plpgsql is concerned. This page: http://www.postgresql.org/docs/9.3/static/sql-keywords-appendix.html does not say that they are reserved words in postgresql (although they are reserved words in the sql standard) It is not the business of that page to document the behavior of plpgsql. Perhaps the plpgsql chapter should document what it considers to be reserved words, but for now, you could look at the list in http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/pl/plpgsql/src/pl_scanner.c We think the behavior should be consistent, either it is allow to use them or not, but not like it is today. That would require giving plpgsql a privileged position over all other PLs, which isn't going to happen ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On 2014-02-06 10:04:54 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: What I haven't seen in any tool yet is something to resemble our indentation style for variable declarations. Do we feel that's a dealbreaker for any tool? Well, we're unlikely to change pgindent's rules to conform to the behavior of some random emacs mode, if that's what you mean ... Nah, not for that ;). I mean I haven't seen any replacement for pgindent that generates equivalent variable indentations. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
Andres Freund and...@2ndquadrant.com writes: Nah, not for that ;). I mean I haven't seen any replacement for pgindent that generates equivalent variable indentations. Seems odd. I thought netbsd's indent was pretty much the granddaddy of them all --- so you'd expect newer tools to be able to replicate that style. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] open and close columns in the NEW record not allowed
On Thu, 2014-02-06 at 07:11 -0800, Adrian Klaver wrote: On 02/06/2014 06:35 AM, Rafael Martinez Guerrero wrote: We think the behavior should be consistent, either it is allow to use them or not, but not like it is today. As a general rule, if you get spurious parser errors for commands that contain any of the listed key words as an identifier you should try to quote the identifier to see if the problem goes away. Which indeed solves the problem on my end at least: Hello Thanks for the feedback. Our problem is that an application decides the name of the columns in the tables and XDB replication from EnterpriseDB decides the triggers. We have no control over the code :-( regards, -- Rafael Martinez Guerrero Center for Information Technology Services University of Oslo, Norway -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] open and close columns in the NEW record not allowed
On Thu, Feb 06, 2014 at 04:21:41PM +0100, Rafael Martinez Guerrero wrote: On Thu, 2014-02-06 at 07:11 -0800, Adrian Klaver wrote: On 02/06/2014 06:35 AM, Rafael Martinez Guerrero wrote: We think the behavior should be consistent, either it is allow to use them or not, but not like it is today. As a general rule, if you get spurious parser errors for commands that contain any of the listed key words as an identifier you should try to quote the identifier to see if the problem goes away. Which indeed solves the problem on my end at least: Hello Thanks for the feedback. Our problem is that an application decides the name of the columns in the tables and XDB replication from EnterpriseDB decides the triggers. We have no control over the code :-( regards, -- Rafael Martinez Guerrero Center for Information Technology Services University of Oslo, Norway Hi Rafael, It sounds like a bug in the XDB trigger generation code. Maybe file a bug report. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] updated emacs configuration
On 2014-02-06 10:24:34 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Nah, not for that ;). I mean I haven't seen any replacement for pgindent that generates equivalent variable indentations. Seems odd. I thought netbsd's indent was pretty much the granddaddy of them all --- so you'd expect newer tools to be able to replicate that style. I don't know, I guess not many projects use the indented variable names in declarations style these days. But I guess it's not too hard to add. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
* Greg Stark (st...@mit.edu) wrote: On Tue, Feb 4, 2014 at 6:07 PM, David E. Wheeler da...@justatheory.com wrote: The install failed, of course, because extensions want to install in $PGROOT/share/extensions. Homebrew sounds kind of confused. Having a non-root user have access to make global system changes sounds like privilege escalation vulnerability by design. I've not played w/ Homebrew myself, but it's installing into /usr/local and presumably that includes installing things into /usr/local/bin, so the notion that installing something from Homebrew isn't already (and intended to be) making global system changes doesn't quite line up. The end-admin would have to modify the system-installed postgresql.conf anyway to enable this other directory. David wasn't suggesting that Homebrew *should* be able to do so, he was pointing out that it *can't*, which all makes sense imv. However putting that aside, it is fairly standard for software to provide two directories for extensions/modules/plugins/etc. One for distribution-built software such as /usr/share/emacs/site-lisp/ and another for sysadmin customizations such as /usr/local/share/emacs/site-lisp. The same idea as /usr/share/perl and /usr/local/share/perl or with Python or anything else. Agreed. Thanks, Stephen signature.asc Description: Digital signature
adt Makefile, was Re: [HACKERS] jsonb and nested hstore
On 02/01/2014 05:20 PM, Andres Freund wrote: diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 1ae9fa0..fd93d9b 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \ tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ tsvector.o tsvector_op.o tsvector_parser.o \ txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \ - rangetypes_typanalyze.o rangetypes_selfuncs.o + rangetypes_typanalyze.o rangetypes_selfuncs.o \ + jsonb.o jsonb_support.o Odd, most OBJS lines are kept in alphabetical order, but that doesn't seem to be the case here. This whole list is a mess, and we don't even have all the range_types files following each other. Worth cleaning up? I'm actually wondering if it might be worth having some subgroups of object files and then combining them into $OBJS. Or it could just be left more or less as is - it's hardly a breakthrough advance. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small GIN optimizations (after 9.4)
i think there is one more thing which would be really good in GIN and which would solve a ton of issues. atm GIN entries are sorted by item pointer. if we could sort them by a column it would fix a couple of real work issues such as ... SELECT ... FROM foo WHERE tsearch_query ORDER BY price DESC LIMIT 10 ... or so. it many cases you want to search for a, say, product and find the cheapest / most expensive one. if the tsearch_query yields a high number of rows (which it often does) the subsequent sort will kill you. many thanks, hans On Feb 6, 2014, at 12:36 PM, Heikki Linnakangas wrote: While hacking on the GIN patches, I've come up with a few different ideas for improving performance. It's too late for 9.4, but I'll list them here if someone wants to work on them later: * Represent ItemPointers as uint64's, to speed up comparisons. ginCompareItemPointers is inlined into only a few instructions, but it's still more expensive than a single-instruction 64-bit comparison. ginCompareItemPointers is called very heavily in a GIN scan, so even a small improvement there would make for a noticeable speedup. It might be an improvement in code clarity, too. * Keep the entry streams of a GinScanKey in a binary heap, to quickly find the minimum curItem among them. I did this in various versions of the fast scan patch, but then I realized that the straightforward way of doing it is wrong, because a single GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is updated as part of advancing one key, and the entry is in a heap of another key, updating the curItem can violate the heap property of the other entry's heap. * Build a truth table (or cache) of consistent-function's results, and use that instead of calling consistent for every item. * Deduce AND or OR logic from the consistent function. Or have the opclass provide a tree of AND/OR/NOT nodes directly, instead of a consistent function. For example, if the query is foo bar, we could avoid calling consistent function altogether, and only return items that match both. * Delay decoding segments during a scan. Currently, we decode all segments of a posting tree page into a single array at once. But with fast scan, we might be able to skip over all entries in some of the segments. So it would be better to copy the segments into backend-private memory in compressed format, and decode them one segment at a time (or maybe even one item at a time), when needed. That would avoid the unnecessary decoding of segments that can be skipped over, and would also reduce memory usage of a scan. I'll add these to the TODO. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: adt Makefile, was Re: [HACKERS] jsonb and nested hstore
Andrew Dunstan and...@dunslane.net writes: On 02/01/2014 05:20 PM, Andres Freund wrote: Odd, most OBJS lines are kept in alphabetical order, but that doesn't seem to be the case here. This whole list is a mess, and we don't even have all the range_types files following each other. Worth cleaning up? +1. It's just neatnik-ism, but isn't compulsive neatnik-ism pretty much a job requirement for programmers? It's hard enough dealing with necessary complexities without having to wonder if some seemingly arbitrary choice has hidden meanings. I'm actually wondering if it might be worth having some subgroups of object files and then combining them into $OBJS. Nah, let's just alphabetize them and be done. The Makefile has no reason to care about subgroups of those files. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: adt Makefile, was Re: [HACKERS] jsonb and nested hstore
Andrew Dunstan wrote: This whole list is a mess, and we don't even have all the range_types files following each other. Worth cleaning up? I'm actually wondering if it might be worth having some subgroups of object files and then combining them into $OBJS. Doesn't the MSVC build stuff parse OBJS definitions? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On Feb 6, 2014, at 6:51 AM, Greg Stark st...@mit.edu wrote: Homebrew sounds kind of confused. Having a non-root user have access to make global system changes sounds like privilege escalation vulnerability by design. Well, the point is that it *doesn’t* make global system changes. I got an error on OS X Server with my original formula, because there was no permission to install in $PGROOT/share/extensions. However putting that aside, it is fairly standard for software to provide two directories for extensions/modules/plugins/etc. One for distribution-built software such as /usr/share/emacs/site-lisp/ and another for sysadmin customizations such as /usr/local/share/emacs/site-lisp. The same idea as /usr/share/perl and /usr/local/share/perl or with Python or anything else. Right. And you can also add additional paths for those applications to search. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On Feb 6, 2014, at 7:32 AM, Stephen Frost sfr...@snowman.net wrote: The end-admin would have to modify the system-installed postgresql.conf anyway to enable this other directory. David wasn't suggesting that Homebrew *should* be able to do so, he was pointing out that it *can't*, which all makes sense imv. Yeah, or be able to add a directory as a Postgres super user at runtime. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: adt Makefile, was Re: [HACKERS] jsonb and nested hstore
On 02/06/2014 11:38 AM, Alvaro Herrera wrote: Andrew Dunstan wrote: This whole list is a mess, and we don't even have all the range_types files following each other. Worth cleaning up? I'm actually wondering if it might be worth having some subgroups of object files and then combining them into $OBJS. Doesn't the MSVC build stuff parse OBJS definitions? Good point. At least in some cases it does. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Feb 5, 2014, at 3:59 PM, Andrew Dunstan and...@dunslane.net wrote: I got a slightly earlier start ;-) For people wanting to play along, here's what this change looks like: https://github.com/feodor/postgres/commit/3fe899b3d7e8f806b14878da4a4e2331b0eb58e8 Man I love seeing all that read. :-) D -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On Thu, Feb 6, 2014 at 5:49 PM, David E. Wheeler da...@justatheory.com wrote: On Feb 6, 2014, at 6:51 AM, Greg Stark st...@mit.edu wrote: Homebrew sounds kind of confused. Having a non-root user have access to make global system changes sounds like privilege escalation vulnerability by design. Well, the point is that it *doesn't* make global system changes. I got an error on OS X Server with my original formula, because there was no permission to install in $PGROOT/share/extensions. Installing into /usr/local is a global system change. Only root should be able to do that and any user that can do that can easily acquire root privileges. However putting that aside, it is fairly standard for software to provide two directories for extensions/modules/plugins/etc. One for distribution-built software such as /usr/share/emacs/site-lisp/ and another for sysadmin customizations such as /usr/local/share/emacs/site-lisp. The same idea as /usr/share/perl and /usr/local/share/perl or with Python or anything else. Right. And you can also add additional paths for those applications to search. Well, users can do whatever they want at run-time but there are blessed paths that are the correct place to install things that these systems are configured to search automatically. My point was just that there are generally two such blessed paths, one for the distribution and one for the local sysadmin. What you do not want is to have a different path for each piece of software. That way lies the /usr/local/kde/bin:/usr/local/gnome/bin:/usr/local/myfavouritehack/bin:... madness. You can do this with Python or Perl but they won't do it automatically and everyone who does this with environment variables or command line flags eventually realizes what a mess it is. (Except Java programmers) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris j...@wizmail.org wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Thanks for working on this. Several times I've wondered why we didn't use siftdown for heapifying, as it is a well known optimization. How big of sets were you sorting in each case? Was it always just slightly bigger than would fit in work_mem? Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? We will also need to test it on strings. I usually use md5(random()::text) to generate strings for such purposes, at least for a first pass. The name of the attached patch is a bit unfortunate. And I'm not sure what you are doing with tupindex, the change there doesn't seem to be necessary to the rest of your work, so some explanation on that would be helpful. The project coding style usually has comments in blocks before loops on which they comment, rather than interspersed throughout the clauses of the for statement. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). Siftdown is linear in all cases. Siftup may be linear in the typical case, but is n log n in the worst case. Another optimization that is possible is to do only one comparison at each level (between the two children) all the way down the heap, and then compare the parent to the chain of smallest children in reverse order. This can substantially reduce the number of comparisons on average, because most tuples sift most of the way down the heap (because most of the tuples are at the bottom of the heap). Robert submitted a patch to do this in the main siftdown routine (which for some reason is named tuplesort_heap_siftup, rather than siftdown), and I found it a substantial improvement but it never got pushed forward. I think Robert was concerned it might make some obscure cases worse rather than better, and no one put it through enough serious testing to overcome that concern. (Also, I think you should make your siftdown code look more like the existing siftdown code.) Cheers, Jeff
[HACKERS] 'dml' value for log_statement
Hi all, Attaching patch provides new value 'dml' for log_statement. Currently, The server logs modification statements AND data definition statements if log_statement is set 'mod'. So we need to set the 'all' value for log_statement and remove unnecessary information if we would like to log only DML. This patch allows the user to set in detail the setting. The server logs statement only when INSERT. UPDATE, DELETE and TRUNCATE statement are executed. ( same as 'mod' value which does not log the DDL) Comments? Regards, --- Sawada Masahiko log_statement_dml.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On Feb 6, 2014, at 9:14 AM, Greg Stark st...@mit.edu wrote: Installing into /usr/local is a global system change. Only root should be able to do that and any user that can do that can easily acquire root privileges. I agree with you, but I don’t think the Homebrew folks do. Or at least their current implementation doesn’t. OT though. Well, users can do whatever they want at run-time but there are blessed paths that are the correct place to install things that these systems are configured to search automatically. My point was just that there are generally two such blessed paths, one for the distribution and one for the local sysadmin. Yeah, two blessed would be very useful, but I think the ability to add any number of paths would be even better. What you do not want is to have a different path for each piece of software. That way lies the /usr/local/kde/bin:/usr/local/gnome/bin:/usr/local/myfavouritehack/bin:... madness. You can do this with Python or Perl but they won't do it automatically and everyone who does this with environment variables or command line flags eventually realizes what a mess it is. (Except Java programmers) Agreed. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Thu, Feb 6, 2014 at 3:39 AM, Marti Raudsepp ma...@juffo.org wrote: On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote: Hmm, sounds a little steep. Why is it so expensive? I'm probably missing something here, because I would have thought that planner support for partial sorts would consist mostly of considering the same sorts we consider today, but with the costs reduced by the batching. I guess it's because the patch undoes some optimizations in the mergejoin planner wrt caching merge clauses and adds a whole lot of code to find_mergeclauses_for_pathkeys. In other code paths the overhead does seem to be negligible. Notice the removal of: /* Select the right mergeclauses, if we didn't already */ /* * Avoid rebuilding clause list if we already made one; * saves memory in big join trees... */ Yeah, I noticed that. My feeling is that those optimizations got put in there because someone found them to be important, so I'm skeptical about removing them. It may be that having the capability to do a partial sort makes it seem worth spending more CPU looking for merge joins, but I'd vote for making any such change a separate patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Fwd: [HACKERS] Proposal: variant of regclass
On Tue, Jan 28, 2014 at 10:38 AM, Yugo Nagata nag...@sraoss.co.jp wrote: I revised the patch. Could you please review this? I didn't test the patch due to the duplicate OID compilation error. But a few things stuck out from the diffs: * You added some unnecessary spaces at the beginning of the linein OpernameGetCandidates. * regclass_guts and regtype_guts can be simplified further by moving the ereport() code into regclassin, thereby saving the if (missing_ok) * I would rephrase the documentation paragraph as: to_regclass, to_regproc, to_regoper and to_regtype are functions similar to the regclass, regproc, regoper and regtype casts, except that they return InvalidOid (0) when the object is not found, instead of raising an error. On Wed, Jan 22, 2014 at 1:04 PM, Tatsuo Ishii is...@postgresql.org wrote: I thought the consensus was that returning NULL is better than InvalidOid? From an earlier message: Not sure. There's already at least one counter example: pg_my_temp_schema() oid OID of session's temporary schema, or 0 if none And there are pg_relation_filenode, pg_stat_get_backend_dbid, pg_stat_get_backend_userid which return NULL::oid. In general I don't like magic values like 0 in SQL. For example, this query would give unexpected results because InvalidOid has some other meaning: select * from pg_aggregate where aggfinalfn=to_regproc('typo'); Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Thu, Feb 6, 2014 at 9:15 PM, Robert Haas robertmh...@gmail.com wrote: It may be that having the capability to do a partial sort makes it seem worth spending more CPU looking for merge joins, but I'd vote for making any such change a separate patch. Agreed. Alexander, should I work on splitting up the patch in two, or do you want to do it yourself? Should I merge my coding style and enable_partialsort patches while at it, or do you still have reservations about those? Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
Robert Haas robertmh...@gmail.com writes: On Thu, Feb 6, 2014 at 3:39 AM, Marti Raudsepp ma...@juffo.org wrote: I guess it's because the patch undoes some optimizations in the mergejoin planner wrt caching merge clauses and adds a whole lot of code to find_mergeclauses_for_pathkeys. In other code paths the overhead does seem to be negligible. Yeah, I noticed that. My feeling is that those optimizations got put in there because someone found them to be important, so I'm skeptical about removing them. I put them in, and yeah they are important. Even with those, and even with the rather arbitrary heuristic restrictions that joinpath.c puts on what mergeclause lists to consider, the existing planner spends a whole lot of effort on mergejoins --- possibly disproportionate to their actual value. I think that any patch that removes those optimizations is not going to fly. If anything, it'd be better to reduce the number of mergejoins considered even further, because a lot of the possible plans are not usefully different. It's already the case that we expect indxpath.c to predict the useful orderings (by reference to query_pathkeys and available mergejoin clauses) and generate suitable paths, rather than trying to identify the orderings at join time. Can't that approach be extended to cover this technique? In any case, the bottom line is that we don't want this patch to cause the planner to consider large numbers of new but useless sort orderings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [DOCS] Re: Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)
On 02/05/2014 07:27 PM, Robert Haas wrote: On Tue, Feb 4, 2014 at 11:43 PM, Noah Misch n...@leadboat.com wrote: Right. I mean, a lot of the links say things like Section 26.2 which obviously makes no sense in a standalone text file. For xrefs normally displayed that way, text output could emit a URL, either inline or in the form of a footnote. For link targets (e.g. SQL commands) having a friendly text fragment for xref sites, use the normal fragment. True. If we're going to keep these things around, something like that would avoid some annoyances for documentation authors. But I still think we ought to just nuke them, because who cares? I dont care about HISTORY and even less so about regress_README but I would prefer to keep INSTALL because I know that people do look at that one... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Mon, Feb 3, 2014 at 12:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: What version were you running before 9.1.11 exactly? I took a look through all the diffs from 9.1.9 up to 9.1.11, and couldn't find any changes that seemed even vaguely related to this. There are some changes in known-transaction tracking, but it's hard to see a connection there. Most of the other diffs are in code that wouldn't execute during WAL replay at all. Both the primary and the standby were 9.1.11 from the get-go. The database the primary was forked off of was 9.1.10 but as far as I can tell the primary in the current pair has no problems. What's worse is we created a new standby from the same base backup and replayed the same records and it didn't reproduce the problem. This means either it's a hardware problem -- but we've seen it on multiple standbys on this database and at least one other database which is in a different data centre -- or it's a race condition --but that's hard to credit in the recovery code which is basically single-threaded. And these records are from before the standby reaches a consistency so it's hard to see how a connection from a hot standby client could cause any kind of race condition. The only other thread that could conceivably cause a heisenbug is the bgwriter. It's hard to imagine how a race condition in there could be so easy to hit that it would happen four times on one restore but otherwise go mostly unnoticed. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mvcc catalo gsnapshots and TopTransactionContext
Andres Freund and...@2ndquadrant.com writes: On 2014-02-02 15:16:45 -0500, Tom Lane wrote: I've been thinking about this for the past little while, and I believe that it's probably okay to have RelationClearRelation leave the relcache entry un-rebuilt, but with rd_isvalid = false so it will be rebuilt when next opened. The rationale is explained in the comments in the attached patch. I've checked that this fixes Noah's test case and still passes the existing regression tests. Hm, a bit scary, but I don't see an immediate problem. The following comment now is violated for nailed relations *We assume that at the time we are called, we have at least AccessShareLock *on the target index. (Note: in the calls from RelationClearRelation, *this is legitimate because we know the rel has positive refcount.) but that should be easy to fix. Ah, good point. So we should keep the refcnt test in the nailed-rel case. I wonder though, if we couldn't just stop doing the RelationReloadIndexInfo() for nailed indexes. No; you're confusing nailed indexes with mapped indexes. There are nailed indexes that aren't on mapped catalogs, see the load_critical_index calls in RelationCacheInitializePhase3. The corresponding comment says: * If it's a nailed index, then we need to re-read the pg_class row to see * if its relfilenode changed. This comment could use some work I guess; needs to say nailed but not mapped index. Do you plan to backpatch this? If so, even to 8.4? I'm of two minds about that. I think this is definitely a bug, but we do not currently have any evidence that there's an observable problem in practice. On the other hand, we certainly get reports of irreproducible issues from time to time, so I don't care to rule out the theory that some of them might be caused by faulty cache reloads. That possibility has to be balanced against the risk of introducing new issues with this patch. Thoughts? (BTW, I see that the CLOBBER_CACHE_ALWAYS members of the buildfarm are unhappy with the IsTransactionState check in RelationIdGetRelation. Will look at that ... but it seems to be in initdb which breaks a lot of these rules anyway, so I think it's probably not significant.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: Both the primary and the standby were 9.1.11 from the get-go. The database the primary was forked off of was 9.1.10 but as far as I can tell the primary in the current pair has no problems. What's worse is we created a new standby from the same base backup and replayed the same records and it didn't reproduce the problem. This means either it's a hardware problem -- but we've seen it on multiple standbys on this database and at least one other database which is in a different data centre -- or it's a race condition --but that's hard to credit in the recovery code which is basically single-threaded. And these records are from before the standby reaches a consistency so it's hard to see how a connection from a hot standby client could cause any kind of race condition. The only other thread that could conceivably cause a heisenbug is the bgwriter. It's hard to imagine how a race condition in there could be so easy to hit that it would happen four times on one restore but otherwise go mostly unnoticed. I had noticed that the WAL records that were mis-replayed seemed to be bunched pretty close together (two of them even adjacent). Could you confirm that? If so, it seems like we're looking for some condition that makes mis-replay fairly probable for a period of time, but in itself might be quite improbable. Not that that helps much at nailing it down. You might well be on to something with the bgwriter idea, considering that none of the WAL replay code was originally written with any concurrent execution in mind. We might've missed some place where additional locking is needed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mvcc catalo gsnapshots and TopTransactionContext
On 2014-02-06 16:35:07 -0500, Tom Lane wrote: I wonder though, if we couldn't just stop doing the RelationReloadIndexInfo() for nailed indexes. No; you're confusing nailed indexes with mapped indexes. There are nailed indexes that aren't on mapped catalogs, see the load_critical_index calls in RelationCacheInitializePhase3. Oh. I'd somehow remembered nailed catalogs would be a subset of mapped ones. But as you say, they are not. Not sure I like that, but it's certainly set for now. Do you plan to backpatch this? If so, even to 8.4? I'm of two minds about that. I think this is definitely a bug, but we do not currently have any evidence that there's an observable problem in practice. On the other hand, we certainly get reports of irreproducible issues from time to time, so I don't care to rule out the theory that some of them might be caused by faulty cache reloads. That possibility has to be balanced against the risk of introducing new issues with this patch. Thoughts? Let's let it stew a while in master, there certainly are enough subtleties around this that I'd hesitate to add it to a point release in the not too far away future. Doesn't seem that urgent to me. But after that I'd say lets backpatch it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Release schedule for 9.3.3?
Are there any tentative plans for the 9.3.3 release date? 9.3.2 was released in December and it’s getting close to the two month mark for another micro release, or at least one seems like one should be right around the corner. It’s a little early, but I haven’t seen any movement or discussion so I thought I’d prod and ask. Thanks in advance. -sc -- Sean Chittenden s...@chittenden.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Release schedule for 9.3.3?
On Thu, Feb 6, 2014 at 10:55 PM, Sean Chittenden s...@chittenden.org wrote: Are there any tentative plans for the 9.3.3 release date? 9.3.2 was released in December and it's getting close to the two month mark for another micro release, or at least one seems like one should be right around the corner. Bug fix releases are released when we find bugs, not on a schedule. That said, there were some bugs fixed in the last few weeks including one fairly serious one so I would expect one in not too long. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 06/02/14 18:21, Jeff Janes wrote: How big of sets were you sorting in each case? Big enough to go external. The timings and compare-counts given are purely for the heapify stage not the total for the sort, so are constrained by the work_mem not by the sort size per se. I'm limited to working on a small machine, so the work_mem value of 1e+6 is approaching my top limit of sort-time pain unless I rip the heapify out into a test harness. What range of work_mem is it useful to test over? Was it always just slightly bigger than would fit in work_mem? I didn't check. Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? Not yet, but I can. What variety of pipe-organ shape is of interest (I can think of several that might be called that)? We will also need to test it on strings. I usually use md5(random()::text) to generate strings for such purposes, at least for a first pass. OK. The name of the attached patch is a bit unfortunate. Is there a project standard for this? And I'm not sure what you are doing with tupindex, the change there doesn't seem to be necessary to the rest of your work, so some explanation on that would be helpful. I'll add commentary. The project coding style usually has comments in blocks before loops on which they comment, rather than interspersed throughout the clauses of the for statement. I'll adjust that. Another optimization that is possible is to do only one comparison at each level (between the two children) all the way down the heap, and then compare the parent to the chain of smallest children in reverse order. This can substantially reduce the number of comparisons on average, because most tuples sift most of the way down the heap (because most of the tuples are at the bottom of the heap). Sounds interesting; I'll see if I can get that going. (Also, I think you should make your siftdown code look more like the existing siftdown code.) A function called by the outer loop? Can do; the existing does that because the function is called from multiple places but this will only be used the once. Thanks for the review. -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 'dml' value for log_statement
Sawada Masahiko wrote Hi all, Attaching patch provides new value 'dml' for log_statement. Currently, The server logs modification statements AND data definition statements if log_statement is set 'mod'. So we need to set the 'all' value for log_statement and remove unnecessary information if we would like to log only DML. This patch allows the user to set in detail the setting. The server logs statement only when INSERT. UPDATE, DELETE and TRUNCATE statement are executed. ( same as 'mod' value which does not log the DDL) Comments? Regards, --- Sawada Masahiko -1 I'm not fully versed on what log levels provide what data but if you care about changes to data then ALTER TABLE table is just as important an activity as UPDATE table so mod exhibits the recommended behavior and dml provides something that should be of minimal value. DML by itself has value since monitoring schema changes without worrying about transient data updates is understandable. But a schema-change, by definition, alters data. Maybe further description of why you find mod unsatisfactory would be helpful. Though it is simple enough to just let people use their own judgement David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/dml-value-for-log-statement-tp5790895p5790925.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 06/02/14 14:22, Robert Haas wrote: On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris j...@wizmail.org wrote: On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). I think Wikipedia is right. Inserting an a tuple into a heap is O(lg n); doing that n times is therefore O(n lg n). Your patch likewise implements an outer loop which runs O(n) times and an inner loop which runs at most O(lg n) times, so a naive analysis of that algorithm also comes out to O(n lg n). The patch implements a siftdown. Measurements attached. Uh, this spreadsheet is useless. You haven't explained how these numbers were generated, or what the columns in the spreadsheet actually mean. Ideally, you should include enough information that someone else can try to reproduce your results. I'm sorry I wasn't clear. Test code was groups of the form: set work_mem to 1024; CREATE TABLE jgh (i integer); INSERT INTO jgh (i) SELECT 2 * random() FROM generate_series(1, 2); VACUUM ANALYZE jgh; explain analyze SELECT * FROM jgh ORDER BY i; set enable_siftdown_heapify to off; explain analyze SELECT * FROM jgh ORDER BY i; set enable_siftdown_heapify to on; DROP TABLE jgh; .. for the tabulated work_mem, and scaled values for the INSERT. Compares were counted for the heapify stage (only), and timings taken using pg_rusage_init() before and pg_rusage_show() after it. Times are in seconds. Baseline value for compares and time used 858ec11858. Sift-down compares and time are for the patch. Baseline K cmps is compares divided by work_mem (which should be a scaled version of compares-per-tuple being heapified) pre-patch. Siftdown K cmps is likewise with the patch. K ratio cmps is the ratio (patched compares)/(unpatched compares). Repeat for time measurements. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Thu, Feb 6, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: I had noticed that the WAL records that were mis-replayed seemed to be bunched pretty close together (two of them even adjacent). Could you confirm that? If so, it seems like we're looking for some condition that makes mis-replay fairly probable for a period of time, but in itself might be quite improbable. Not that that helps much at nailing it down. Well one thing that argues for is hardware problems. It could be that the right memory mapping just happened to line up with that variable on the stack for that short time and then it was mapped to something else entirely. Or the machine was overheating for a short time and then the temperature became more reasonable. Or the person with the x-ray source walked by in that short time window. That doesn't explain the other instance or the other copies of this database. I think the most productive thing I can do is switch my attention to the other database to see if it really looks like the same problem. You might well be on to something with the bgwriter idea, considering that none of the WAL replay code was originally written with any concurrent execution in mind. We might've missed some place where additional locking is needed. Except that the bgwriter has been in there for a few years already. Unless there's been some other change, possibly involving copying some code that was safe in some context but not where it was copied to. The problem with the bgwriter being at fault is that from what I can see the bgwriter will never extend a file. That means the xlog recovery code must have done it. That means even if the bgwriter came along and looked at the buffer we just read in it would already be too late to cause mischief. The xlog code extends the file *first* then reads in the backup block into a buffer. I can't see how it could corrupt the stack or the wal recovery buffer in the window between reading in the wal buffer and deciding to extend the relation. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On 2014-02-06 23:41:19 +0100, Greg Stark wrote: The problem with the bgwriter being at fault is that from what I can see the bgwriter will never extend a file. That means the xlog recovery code must have done it. That means even if the bgwriter came along and looked at the buffer we just read in it would already be too late to cause mischief. The xlog code extends the file *first* then reads in the backup block into a buffer. I can't see how it could corrupt the stack or the wal recovery buffer in the window between reading in the wal buffer and deciding to extend the relation. That's not necessarily true. If e.g. the buffer mapping would change racily, the result write from the bgwriter could very well end up increasing the file size, leaving a hole inbetween its write and the original size. Are there any other relations that are as big as the corrupted relations got extended to? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mvcc catalo gsnapshots and TopTransactionContext
I wrote: (BTW, I see that the CLOBBER_CACHE_ALWAYS members of the buildfarm are unhappy with the IsTransactionState check in RelationIdGetRelation. Will look at that ... but it seems to be in initdb which breaks a lot of these rules anyway, so I think it's probably not significant.) So what's happening is that in bootstrap mode, we run each BKI command in a separate transaction. (Since these transactions all use the same XID, it's not clear what's the point of doing that rather than running the whole BKI script as one transaction. But it's been like that for twenty years so I'm disinclined to mess with it.) Bootstrap mode is peculiar in that it expects relcache entries to stay open across these transactions, which we implement by keeping their refcnts positive. Throwing CLOBBER_CACHE_ALWAYS into the mix means we do a relcache flush during transaction start, during which RelationClearRelation tries to rebuild the cache entries right away because they have positive refcnt, triggering the Assert because it has to access system relations such as pg_class and we're not in TRANS_INPROGRESS state yet. So in short, the patch I proposed upthread fixes this, since it postpones the actual cache entry reload into the main part of the transaction. So rather than lobotomize that Assert with an IsBootstrapMode exception, I think I'll just go ahead and commit this patch, with the cleanups we discussed earlier. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mvcc catalo gsnapshots and TopTransactionContext
Andres Freund and...@2ndquadrant.com writes: On 2014-02-06 16:35:07 -0500, Tom Lane wrote: Thoughts? Let's let it stew a while in master, there certainly are enough subtleties around this that I'd hesitate to add it to a point release in the not too far away future. Doesn't seem that urgent to me. But after that I'd say lets backpatch it. Seems reasonable to me. The urgency would go up if we could positively trace any field trouble reports to this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote: That's not necessarily true. If e.g. the buffer mapping would change racily, the result write from the bgwriter could very well end up increasing the file size, leaving a hole inbetween its write and the original size. a) the segment isn't sparse and b) there were whole segments full of nuls between the end of the tables and the final blocks. So the file was definitely extended by Postgres, not the OS and the bgwriter passes EXTENSION_FAIL which means it wouldn't create those intervening segments. Are there any other relations that are as big as the corrupted relations got extended to? I was wondering the same thing. But no, the extended file is 39GB larger than the next largest relation. Also, the final block in the first relation is clearly a version of the same block from the correct location. Look at the ctids and the xids on the rows for lp 3 in the attached file for example. That second copy is from the location in the xlog record. -- greg =# select * from (select relfilenode,(pg_relation_size(oid)/8192)::integer-1 as blockno,((pg_relation_size(oid)/8192)::integer-1)/1024/128 as segnum,((pg_relation_size(oid)/8192)::integer-1)%(1024*1024*128) as offset, (page_header(get_raw_page(oid::regclass::text,'main',(pg_relation_size(oid)/8192)::integer-1))).* from pg_class where pg_relation_size(oid) 4*8192 and (page_header(get_raw_page(oid::regclass::text,'main',(pg_relation_size(oid)/8192)::integer-2))).upper=0 order by lsn) as x ; relfilenode | blockno | segnum | offset |lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid -+--++--++-+---+---+---+-+--+-+ 473158 | 18090059 |138 | 18090059 | EA1/625E28 | 6 | 0 | 144 | 896 |8192 | 8192 | 4 | 1401029863 1366221 | 2208159 | 16 | 2208159 | EA1/62DDF0 | 6 | 0 | 1180 | 3552 |8176 | 8192 | 4 | 0 1261982 | 7141472 | 54 | 7141472 | EA1/638988 | 6 | 0 | 1240 | 3312 |8176 | 8192 | 4 | 0 1364767 | 3631161 | 27 | 3631161 | EA1/63A0A8 | 6 | 0 | 1180 | 3552 |8176 | 8192 | 4 | 0 1519286 | 311808 | 2 | 311808 | EA1/708B08 | 6 | 1 | 312 | 3040 |8192 | 8192 | 4 | 0 (5 rows) =# select (heap_page_items(get_raw_page(16531::regclass::text,'main',18090059))).*; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid| t_infomask2 | t_infomask | t_hoff | t_bits | t_oid ++--++++--+--+-+++--+--- 1 | 7024 |1 |328 | 1358232426 | 1401029863 |1 | (9386399,3) | 49163 | 9475 | 32 | 1010 | 2 | 1 |2 | 0 ||| | | ||| | 3 | 6560 |1 |464 | 1401029863 | 0 |1 | (9386399,3) | 32779 | 10499 | 32 | 1010 | 4 | 5 |2 | 0 ||| | | ||| | 5 | 8008 |1 |184 | 2 | 0 |0 | (9386399,5) | 32779 | 10499 | 32 | 1010 | 6 | 6416 |1 |144 | 1418089052 | 1418089053 |0 | (9386399,7) | 16395 |259 | 32 | 1010 | 7 | 6272 |1 |144 | 1418089053 | 0 |0 | (9386399,7) | 32779 | 10243 | 32 | 1010 | 8 | 6128 |1 |144 | 1418089056 | 1418089060 |0 | (9386399,11) | 16395 | 1283 | 32 | 1010 | 9 | 10 |2 | 0 ||| | | ||| | 10 | 7864 |1 |144 | 2 | 0 |0 | (9386399,10) | 32779 | 10499 | 32 |
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Greg Stark st...@mit.edu writes: On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote: That's not necessarily true. If e.g. the buffer mapping would change racily, the result write from the bgwriter could very well end up increasing the file size, leaving a hole inbetween its write and the original size. a) the segment isn't sparse and b) there were whole segments full of nuls between the end of the tables and the final blocks. So the file was definitely extended by Postgres, not the OS and the bgwriter passes EXTENSION_FAIL which means it wouldn't create those intervening segments. But ... when InRecovery, md.c will create such segments too. We had dismissed that on the grounds that the files would be sparse because of the way md.c creates them. However, it is real damn hard to see how the loop in XLogReadBufferExtended could've accessed a bogus block, other than hardware misfeasance which I don't believe any more than you do. The blkno that's passed to that function came directly out of a WAL record that's in the private memory of the startup process and recently passed a CRC check. You'd have to believe some sort of asynchronous memory clobber inside the startup process. On the other hand, if _mdfd_getseg did the deed, there's a whole lot more space for something funny to have happened, because now we're talking about a buffer being written in preparation for eviction from shared buffers, long after WAL replay filled it. So I'm wondering if there's something wrong with our deduction from file non-sparseness. In this connection, google quickly found me a report of XFS losing the sparse state of a file across multiple writes: http://oss.sgi.com/archives/xfs/2011-06/msg00225.html I wonder whether that bug or a similar one exists in your production kernel. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
On 2014-02-06 18:42:05 -0500, Tom Lane wrote: Greg Stark st...@mit.edu writes: On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote: That's not necessarily true. If e.g. the buffer mapping would change racily, the result write from the bgwriter could very well end up increasing the file size, leaving a hole inbetween its write and the original size. a) the segment isn't sparse and b) there were whole segments full of nuls between the end of the tables and the final blocks. So the file was definitely extended by Postgres, not the OS and the bgwriter passes EXTENSION_FAIL which means it wouldn't create those intervening segments. But ... when InRecovery, md.c will create such segments too. We had dismissed that on the grounds that the files would be sparse because of the way md.c creates them. However, it is real damn hard to see how the loop in XLogReadBufferExtended could've accessed a bogus block, other than hardware misfeasance which I don't believe any more than you do. The blkno that's passed to that function came directly out of a WAL record that's in the private memory of the startup process and recently passed a CRC check. You'd have to believe some sort of asynchronous memory clobber inside the startup process. That reminds me, not that I directly see how it could be responsible, there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I don't think we came to a agreement in that thread how to fix the problem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: adt Makefile, was Re: [HACKERS] jsonb and nested hstore
On Fri, Feb 7, 2014 at 1:18 AM, Andrew Dunstan and...@dunslane.net wrote: On 02/01/2014 05:20 PM, Andres Freund wrote: diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 1ae9fa0..fd93d9b 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \ tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ tsvector.o tsvector_op.o tsvector_parser.o \ txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \ - rangetypes_typanalyze.o rangetypes_selfuncs.o + rangetypes_typanalyze.o rangetypes_selfuncs.o \ + jsonb.o jsonb_support.o Odd, most OBJS lines are kept in alphabetical order, but that doesn't seem to be the case here. This whole list is a mess, and we don't even have all the range_types files following each other. Worth cleaning up? +1. Yes please. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery inconsistencies, standby much larger than primary
Andres Freund and...@2ndquadrant.com writes: That reminds me, not that I directly see how it could be responsible, there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I don't think we came to a agreement in that thread how to fix the problem. Hm, yeah. I'm not sure I believe Heikki's argument that we need to avoid closing the smgr relation. If that stuff is being used in any performance-critical paths then we've got trouble already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
On Thu, Feb 6, 2014 at 3:42 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, I've understood how this works and seems working as expected. Anyway this is just a test module so if things works for you by changing the above way, its fine. However I wonder why its not generating .def file for you. Surely. Getting back on topic, using dsm_keep_segment, I saw postmaster keeping the section handle for the dsm segment and any session ever after the creating session gone off could recall the segment, unlike dsm_keep_mapping couldn't retain them after end of the creating session. And this exactly resembles linux version in behavior including attach failure. The orphan section handles on postmaster have become a matter of documentation. Besides all above, I'd like to see a comment for the win32 code about the 'DuplicateHandle hack', specifically, description that the DuplicateHandle pushes the copy of the section handle to the postmaster so the section can retain for the postmaster lifetime. Thanks. I shall handle all comments raised by you and send an updated patch tomorrow. By the way I have one additional comment. All postgres processes already keep a section handle for 'Global/PostgreSQL:pgdata file path' aside from dsm. It tells itself is for the file. On the other hand the names for the dsm sections are 'Global/PostgreSQL.%d'. This seems a bit unkindly as they don't tell what they are of. Do you mind changing it to, say, 'Global/PostgreSQL.dsm.%d' or something like that? I am not sure if there is any benefit of changing the name as it is used for identification of segment internally and it is uniquely identified by segment identifier (segment handle), also I think something similar is use for posix implementation in dsm_impl_posix(), so we might need to change that as well. Also as this name is not introduced by this patch, so I think it will better to keep this as note to committer for this patch and if there is an agreement for changing it, I will update the patch. Whats your opinion? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 02/06/2014 11:11 PM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: We don't rerun rewrite on plan invalidation. Don't we? plancache.c certainly does, in fact it starts from the raw grammar output. Skipping the rewriter would mean failing to respond to CREATE OR REPLACE VIEW, for example. I was thinking about exactly that case as I went to sleep - especially as it's things like CREATE OR REPLACE VIEW that prevent me from just rewriting the security qual when it's initially added, before storage in the catalogs. I could've sworn discussion around row security in the past concluded that it couldn't be properly done in the rewriter because of an inability to correctly invalidate plans. Searching the archives, though, I don't find anything to support that. I'll just say I'm glad to be wrong, and proceed from there. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
(2014/02/05 14:52), Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: On 02/05/2014 06:29 AM, Tom Lane wrote: I had been okay with the manual PGDLLIMPORT-sprinkling approach (not happy with it, of course, but prepared to tolerate it) as long as I believed the buildfarm would reliably tell us of the need for it. That assumption has now been conclusively disproven, though. I'm kind of horrified that the dynamic linker doesn't throw its toys when it sees this. Indeed :-(. The truly strange part of this is that it seems that the one Windows buildfarm member that's telling the truth (or most nearly so, anyway) is narwhal, which appears to have the oldest and cruftiest toolchain of the lot. I'd really like to come out the other end of this investigation with a clear understanding of why the newer toolchains are failing to report a link problem, and yet not building working executables. Is it a linkage error? Could you please show me the error message concretely? regards, Hiroshi Inoue -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Failback without rebuild
I've just noticed that on PostgreSQL 9.3 I can do the following with a master node A and a slave node B (as long as I have set recovery_target_timeline = 'latest'): 1. Stop Node A 2. Promote Node B 3. Attach Node A as slave This is sufficient for my needs (I know it doesn't cover a crash), can anyone see any potential problems with this approach? Cheers, James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 On Wed, Feb 5, 2014 at 6:03 PM, Michael Paquier michael.paqu...@gmail.comwrote: On Wed, Feb 5, 2014 at 3:14 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Feb 5, 2014 at 10:30 AM, James Sewell james.sew...@lisasoft.com I've seen some proposals and a tool (pg_rewind), but all seem to have draw backs. As far as I remember, one of the main drawbacks for pg_rewind was related to hint bits which can be avoided by wal_log_hints. pg_rewind is not part of core PostgreSQL code, however if you wish, you can try that tool to see if can it solve your purpose. For 9.3, pg_rewind is only safe with page checksums enabled. For 9.4, yes wal_log_hints or checksums is mandatory. The code contains as well some safety checks as well to ensure that a node not using those parameters cannot be rewinded. Regards, -- Michael -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration
On Tue, Feb 4, 2014 at 9:36 AM, Tatsuo Ishii is...@postgresql.org wrote: As you can see, at 2014-02-04 12:47:27.210981+09 the query SELECT count(*) FROM pg_catalog.pg_class... is active and it seems still running. On the other side, Here is an excerpt from PostgreSQL log: 21850 2014-02-04 12:47:11.241 JST LOG: execute pgpool21805/pgpool21805: SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u' 21850 2014-02-04 12:47:11.241 JST LOG: duration: 0.078 ms The duration was shown as 0.078 ms thus it seems the query has been already finished. The reason why pg_stat_activity thinks that the query in question is, sync message has not been sent to the backend yet (at least from what I read from postgres.c). I think that is the probable reason for the above mentioned behaviour. As I understand here, the problem is that 'state' of backend is shown as active along with 'query' which according to docs (If state is active this field shows the currently executing query.) means that query is executing. This statement holds true for simple query but for prepared statement (using message 'P', 'B', 'D', 'E', 'S') it might not be completely right as we update the state only after sync message which can confuse some users as you have stated. However I don't think it is good idea to change state in between different messages or at least with the current set of states. I think this inconsistency is not very intutive to users... Do you think we can fix it in any easy way, or might be updating docs can make users understand the current situation better? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration
On Fri, Feb 7, 2014 at 10:43 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Feb 4, 2014 at 9:36 AM, Tatsuo Ishii is...@postgresql.org wrote: As you can see, at 2014-02-04 12:47:27.210981+09 the query SELECT count(*) FROM pg_catalog.pg_class... is active and it seems still running. On the other side, Here is an excerpt from PostgreSQL log: 21850 2014-02-04 12:47:11.241 JST LOG: execute pgpool21805/pgpool21805: SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u' 21850 2014-02-04 12:47:11.241 JST LOG: duration: 0.078 ms The duration was shown as 0.078 ms thus it seems the query has been already finished. The reason why pg_stat_activity thinks that the query in question is, sync message has not been sent to the backend yet (at least from what I read from postgres.c). I think that is the probable reason for the above mentioned behaviour. One more catch is that I think it can happen for simple query statements as well, because we first logs the query end duration and then updates the state and both are not related in code, so there is always a small window during which this can happen. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
OK I can see the error message at PostgreSQL Build Farm Log. I see a similar problem in a simple test case. gcc -DCALLPG=\import1\ -c export.c -o export1.o dlltool --export-all --output-def export1.def export1.o dlltool --dllname export1.exe --def export1.def --output-lib export1.a dlltool --dllname export1.exe --output-exp export1.exp --def export1.def gcc -o export1.exe -Wl,--base-file,export1.base export1.exp export1.o dlltool --dllname export1.exe --base-file export1.base --output-exp export1.exp --def export1.def gcc -Wl,--stack=65536 -o export1.exe export1.exp export1.o rm -f export1.exp export1.base gcc -DCALLPG=\import1\ -c import.c -o import1.o dlltool --export-all --output-def import1.def import1.o dllwrap -o import1.dll --dllname import1.dll import1.o export1.a --def import1.def Info: resolving _intnum by linking to __imp__intnum (auto-import) fu01.o:(.idata$2+0xc): undefined reference to `export1_a_iname' nmth00.o:(.idata$4+0x0): undefined reference to `_nm__intnum' collect2: ld returned 1 exit status Adding __declspec(dllimport) fixes the problem. Using gcc only build also fixes the problem. gcc -DCALLPG=\import2\ -c export.c -o export2.o gcc -Wl,--export-all-symbols -Wl,--out-implib=export2.a -o export2 export2.o Creating library file: export2.a gcc -DCALLPG=\import2\ -c import.c -o import2.o gcc -shared -o import2.dll import2.o export2.a -Wl,--out-implib=import2.dll.a Info: resolving _intnum by linking to __imp__intnum (auto-import) Creating library file: import2.dll.a Though I'm not a MINGW expert at all, I know dllwrap is a deprecated tool and dlltool is almost a deprecated tool. Cygwin port is removing the use of dllwrap and dlltool now. Isn't it better for MINGW port to follow it? Changing the machine from the one using gcc 3.4.5 to another one using 4.6.1 also fixes the problem. regards, Hiroshi Inoue (2014/02/07 12:25), Hiroshi Inoue wrote: (2014/02/05 14:52), Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: On 02/05/2014 06:29 AM, Tom Lane wrote: I had been okay with the manual PGDLLIMPORT-sprinkling approach (not happy with it, of course, but prepared to tolerate it) as long as I believed the buildfarm would reliably tell us of the need for it. That assumption has now been conclusively disproven, though. I'm kind of horrified that the dynamic linker doesn't throw its toys when it sees this. Indeed :-(. The truly strange part of this is that it seems that the one Windows buildfarm member that's telling the truth (or most nearly so, anyway) is narwhal, which appears to have the oldest and cruftiest toolchain of the lot. I'd really like to come out the other end of this investigation with a clear understanding of why the newer toolchains are failing to report a link problem, and yet not building working executables. Is it a linkage error? Could you please show me the error message concretely? regards, Hiroshi Inoue -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Failback without rebuild
On Fri, Feb 7, 2014 at 1:57 PM, James Sewell james.sew...@lisasoft.comwrote: I've just noticed that on PostgreSQL 9.3 I can do the following with a master node A and a slave node B (as long as I have set recovery_target_timeline = 'latest'): 1. Stop Node A 2. Promote Node B 3. Attach Node A as slave This is sufficient for my needs (I know it doesn't cover a crash), can anyone see any potential problems with this approach? Yes, node A could get ahead of the point where WAL forked when promoting B. In this case you cannot reconnect A to B, and need to actually recreate a node from a fresh base backup, or rewind it. pg_rewind targets the latter, postgres core is able to to the former, and depending on things like your environment and/or the size of your server, you might prefer one or the other. Regards, -- Michael
Re: [HACKERS] PostgreSQL Failback without rebuild
Node A could get ahead even if it has been shut down cleanly BEFORE the promotion? I'd always assumed if I shut down the master the slave would be at the same point after shutdown - is this incorrect? Cheers, James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 On Fri, Feb 7, 2014 at 4:58 PM, Michael Paquier michael.paqu...@gmail.comwrote: On Fri, Feb 7, 2014 at 1:57 PM, James Sewell james.sew...@lisasoft.comwrote: I've just noticed that on PostgreSQL 9.3 I can do the following with a master node A and a slave node B (as long as I have set recovery_target_timeline = 'latest'): 1. Stop Node A 2. Promote Node B 3. Attach Node A as slave This is sufficient for my needs (I know it doesn't cover a crash), can anyone see any potential problems with this approach? Yes, node A could get ahead of the point where WAL forked when promoting B. In this case you cannot reconnect A to B, and need to actually recreate a node from a fresh base backup, or rewind it. pg_rewind targets the latter, postgres core is able to to the former, and depending on things like your environment and/or the size of your server, you might prefer one or the other. Regards, -- Michael -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
[HACKERS] Typo in a comment?
In src/backend/storage/freespace/freespace.c, * * MaxFSMRequestSize depends on the architecture and BLCKSZ, but assuming * default 8k BLCKSZ, and that MaxFSMRequestSize is 24 bytes, the categories * look like this * Is 24 bytes a typo considering that #define MaxFSMRequestSize MaxHeapTupleSize ? -- Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration
I think that is the probable reason for the above mentioned behaviour. As I understand here, the problem is that 'state' of backend is shown as active along with 'query' which according to docs (If state is active this field shows the currently executing query.) means that query is executing. This statement holds true for simple query but for prepared statement (using message 'P', 'B', 'D', 'E', 'S') it might not be completely right as we update the state only after sync message which can confuse some users as you have stated. However I don't think it is good idea to change state in between different messages or at least with the current set of states. I think this inconsistency is not very intutive to users... Do you think we can fix it in any easy way, or might be updating docs can make users understand the current situation better? One idea is, calling pgstat_report_activity(STATE_IDLE) in exec_execute_message() of postgres.c. The function has already called pgstat_report_activity(STATE_RUNNING) which shows active state in pg_stat_actviity view. So why cann't we call pgstat_report_activity(STATE_IDLE) here. Somebody might claim that idle is a transaction state term. In the case, I propose to add new state name, say finished. So above proposal would calling pgstat_report_activity(STATE_FINISHED) instead. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration
On Fri, Feb 7, 2014 at 11:46 AM, Tatsuo Ishii is...@postgresql.org wrote: I think that is the probable reason for the above mentioned behaviour. As I understand here, the problem is that 'state' of backend is shown as active along with 'query' which according to docs (If state is active this field shows the currently executing query.) means that query is executing. This statement holds true for simple query but for prepared statement (using message 'P', 'B', 'D', 'E', 'S') it might not be completely right as we update the state only after sync message which can confuse some users as you have stated. However I don't think it is good idea to change state in between different messages or at least with the current set of states. I think this inconsistency is not very intutive to users... Do you think we can fix it in any easy way, or might be updating docs can make users understand the current situation better? One idea is, calling pgstat_report_activity(STATE_IDLE) in exec_execute_message() of postgres.c. The function has already called pgstat_report_activity(STATE_RUNNING) which shows active state in pg_stat_actviity view. So why cann't we call pgstat_report_activity(STATE_IDLE) here. Somebody might claim that idle is a transaction state term. Idle means The backend is waiting for a new client command., which is certainly not true especially in case of 'E' message as still sync message processing is left. In the case, I propose to add new state name, say finished. So above proposal would calling pgstat_report_activity(STATE_FINISHED) instead. Okay, so by state finish, it can mean The backend has finished execution of a query.. In that case I think this might need to be called at end of exec_simple_query() as well, but then there will be very less difference between idle and finish for simple query. The argument here could be do we really need a new state for such a short window between completion of 'E' message and processing of 'S' sync message considering updation of state is not a very light call which can be called between processing of 2 messages. It might make sense for cases where sync message processing can take longer time. Would it be not sufficient, If we just explain this in docs. Do users really face any inconvenience or it's a matter of clear understanding for users? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Failback without rebuild
On Fri, Feb 7, 2014 at 3:02 PM, James Sewell james.sew...@lisasoft.comwrote: Node A could get ahead even if it has been shut down cleanly BEFORE the promotion? I'd always assumed if I shut down the master the slave would be at the same point after shutdown - is this incorrect? Yes and no. A node will wait at shutdown until all the *connected* slaves have flushed necessary WALs to disk. But if the slave is not connected at the moment of shutdown, it might not be the case. -- Michael
Re: [HACKERS] PostgreSQL Failback without rebuild
Michael Paquier wrote: On Fri, Feb 7, 2014 at 3:02 PM, James Sewell james.sew...@lisasoft.com wrote: Node A could get ahead even if it has been shut down cleanly BEFORE the promotion? I'd always assumed if I shut down the master the slave would be at the same point after shutdown - is this incorrect? Yes and no. A node will wait at shutdown until all the *connected* slaves have flushed necessary WALs to disk. But if the slave is not connected at the moment of shutdown, it might not be the case. Even if the slave is connected, there was a bug: http://www.postgresql.org/message-id/e1urwwa-0004lr...@gemulon.postgresql.org So you need at least 9.3.0, 9.2.5 or 9.1.10 for this to work properly. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers