Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Feb 9, 2018 at 6:53 AM, Peter Geogheganwrote: > On Wed, Feb 7, 2018 at 7:51 PM, Pavan Deolasee > wrote: > > I understand getting EPQ semantics right is very important. Can you > please > > (once again) summarise your thoughts on what you think is the *most* > > appropriate behaviour? I can then think how much efforts might be > involved > > in that. If the efforts are disproportionately high, we can discuss if > > settling for some not-so-nice semantics, like we apparently did for > > partition key updates. > > I personally believe that the existing EPQ semantics are already > not-so-nice. They're what we know, though, and we haven't actually had > any real world complaints, AFAIK. > I agree. > > My concern is mostly just that MERGE manages to behave in a way that > actually "provides a single SQL statement that can conditionally > INSERT, UPDATE or DELETE rows, a task that would otherwise require > multiple procedural language statements", as the docs put it. As long > as MERGE manages to do something as close to that high level > description as possible in READ COMMITTED mode (with our current > semantics for multiple statements in RC taken as the baseline), then > I'll probably be happy. > IMO it will be quite hard, if not impossible, to guarantee the same semantics to a single statement MERGE and multi statement UPDATE/DELETE/INSERT in RC mode. For example, the multi statement model will execute each statement with a new MVCC snapshot and hence the rows visible to individual statement may vary. Whereas in MERGE, everything runs with a single snapshot. There could be other such subtle differences. > > Some novel new behavior -- "EPQ with a twist"-- is clearly necessary. > I feel a bit uneasy about it because anything that anybody suggests is > likely to be at least a bit arbitrary (EPQ itself is kind of > arbitrary). We only get to make a decision on how "EPQ with a twist" > will work once, and that should be a decision that is made following > careful deliberation. Ambiguity is much more likely to kill a patch > than a specific technical defect, at least in my experience. Somebody > can usually just fix a technical defect. > While I agree, I think we need to make these decisions in a time bound fashion. If there is too much ambiguity, then it's not a bad idea to settle for throwing appropriate errors instead of providing semantically wrong answers, even in some remote corner case. > > > > > TBH I did not consider partitioning any less complex and it was indeed > very > > complex, requiring at least 3 reworks by me. And from what I understood, > it > > would have been a blocker too. So is subquery handling and RLS. That's > why I > > focused on addressing those items while you and Simon were still debating > > EPQ semantics. > > Sorry if I came across as dismissive of that effort. That was > certainly not my intention. I am pleasantly surprised that you've > managed to move a number of things forward rather quickly. > > I'll rephrase: while it would probably have been a blocker in theory > (I didn't actually weigh in on that), I doubted that it would actually > end up doing so in practice (and it now looks like I was right to > doubt that, since you got it done). It was a theoretical blocker, as > opposed to an open item that could drag on indefinitely despite > everyone's best efforts. Obviously details matter, and obviously there > are a lot of details to get right outside of RC semantics, but it > seems wise to focus on the big risk that is EPQ/RC conflict handling. > Ok. I am now back from holidays and I will too start thinking about this. I've also requested a colleague to help us with comparing it against Oracle's behaviour. N That's not a gold standard for us, but knowing how other major databases handle RC conflicts, is not a bad idea. I see the following important areas and as long as we have a consistent and coherent handling of these cases, we should not have difficulty agreeing on a outcome. 1. Concurrent UPDATE does not affect MATCHED case. The WHEN conditions may or may not be affected. 2. Concurrently UPDATEd tuple fails the join qual and the current source tuple no longer matches with the updated target tuple that the EPQ is set for. It matches no other target tuple either. So a MATCHED case is turned into a NOT MATCHED case. 3. Concurrently UPDATEd tuple fails the join qual and the current source tuple no longer matches with the updated target tuple that the EPQ is set for. But it matches some other target tuple. So it's still a MATCHED case, but with different target tuple(s). 4. Concurrent UPDATE/INSERT creates a matching target tuple for a source tuple, thus turning a NOT MATCHED case to a MATCHED case. 5. Concurrent DELETE turns a MATCHED case into NOT MATCHED case Any other case that I am missing? Assuming all cases are covered, what should we do in each of these cases, so that there is no or very little ambiguity
Re: A space-efficient, user-friendly way to store categorical data
On Mon, Feb 12, 2018 at 12:24 PM, Andrew Dunstanwrote: > On Mon, Feb 12, 2018 at 9:10 AM, Tom Lane wrote: >> Andrew Kane writes: >>> A better option could be a new "dynamic enum" type, which would have >>> similar storage requirements as an enum, but instead of labels being >>> declared ahead of time, they would be added as data is inserted. >> >> You realize, of course, that it's possible to add labels to an enum type >> today. (Removing them is another story.) >> >> You haven't explained exactly what you have in mind that is going to be >> able to duplicate the advantages of the current enum implementation >> without its disadvantages, so it's hard to evaluate this proposal. >> > > > This sounds rather like the idea I have been tossing around in my head > for a while, and in sporadic discussions with a few people, for a > dictionary object. The idea is to have an append-only list of labels > which would not obey transactional semantics, and would thus help us > avoid the pitfalls of enums - there wouldn't be any rollback of an > addition. The use case would be for a jsonb representation which > would replace object keys with the oid value of the corresponding > dictionary entry rather like enums now. We could have a per-table > dictionary which in most typical json use cases would be very small, > and we know from some experimental data that the compression in space > used from such a change would often be substantial. > > This would have to be modifiable dynamically rather than requiring > explicit additions to the dictionary, to be of practical use for the > jsonb case, I believe. > > I hadn't thought about this as a sort of super enum that was usable > directly by users, but it makes sense. > > I have no idea how hard or even possible it would be to implement. I have had thoughts over the years about something similar, but going the other way and hiding it from the end user. If you could declare a column to have a special compressed property (independently of the type) then it could either automatically maintain a dictionary, or at least build a new dictionary for your when you next run some kind of COMPRESS operation. There would be no user visible difference except footprint. In ancient DB2 they had a column property along those lines called "VALUE COMPRESSION" (they also have a row-level version, and now they have much more advanced kinds of adaptive compression that I haven't kept up with). In some ways it'd be a bit like toast with shared entries, but I haven't seriously looked into how such a thing might be implemented. -- Thomas Munro http://www.enterprisedb.com
Re: Minor version upgrades and extension packaging
On 2018-02-11 22:19:30 -0500, Tom Lane wrote: > Not sure what to do about it at this point. We could move that field to > the end for 10.3, leaving 10.2 as the only ABI-incompatible minor release, > but I don't know that that really makes things any better than leaving it > as-is. Somewhere around the dot-two minor release is where uptake of a > new PG branch starts to become significant, IME, so preserving ABI > compatibility going forward from 10.2 might be more useful than preserving > it against 10.0/10.1. Yea, I think the damage is done in this case, and we shouldn't make things even more complicated. Greetings, Andres Freund
Re: Minor version upgrades and extension packaging
Mat Aryewrites: > We recently found that people who had compiled the TimescaleDB extension > against 10.1 (or installed our binary versions via yum, apt, etc.) had > their extension break when they upgraded to 10.2 due to changes of some > underlying structs between the two minor versions. > In particular, in the commit > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1597948c962a1407c01fc492c44917c097efa92e > the structure of the ColumnDef struct changed. BTW, while there was surely not a huge amount of daylight between that commit on 2 Feb and the 10.2 wrap on 5 Feb, there was enough time to have fixed the problem given prompt feedback. I suggest that Timescale would be well advised to set up a buildfarm animal that has an additional module to run basic binary-compatibility testing against your extension in the back branches. I don't know a lot about writing additional test modules for the buildfarm script, but it's definitely possible. Andrew Dunstan might be able to offer more specific advice. regards, tom lane
Re: Minor version upgrades and extension packaging
Andres Freundwrites: > On 2018-02-11 21:50:32 -0500, Mat Arye wrote: >> In particular, in the commit >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1597948c962a1407c01fc492c44917c097efa92e >> the structure of the ColumnDef struct changed. > Ugh. I personally would say that's because that commit did stuff that we > normally trie hard not to do. While ColumnDef at least isn't serialized > into catalogs, we normally trie hard to break struct layout. Peter, > shouldn't that field at the very least have been added at the end? Yeah. The position of the field makes sense for HEAD, but it would have been good practice to add it at the end in released branches. That's what we normally do when we have to make struct changes in back branches. That isn't a 100% fix for ABI compatibility problems --- if you're making new instances of the node type in an extension, you still lose --- but it avoids problems in many cases. Not sure what to do about it at this point. We could move that field to the end for 10.3, leaving 10.2 as the only ABI-incompatible minor release, but I don't know that that really makes things any better than leaving it as-is. Somewhere around the dot-two minor release is where uptake of a new PG branch starts to become significant, IME, so preserving ABI compatibility going forward from 10.2 might be more useful than preserving it against 10.0/10.1. regards, tom lane
Re: Minor version upgrades and extension packaging
Hi, On 2018-02-11 21:50:32 -0500, Mat Arye wrote: > We recently found that people who had compiled the TimescaleDB extension > against 10.1 (or installed our binary versions via yum, apt, etc.) had > their extension break when they upgraded to 10.2 due to changes of some > underlying structs between the two minor versions. > > In particular, in the commit > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1597948c962a1407c01fc492c44917c097efa92e > the structure of the ColumnDef struct changed. Since TimescaleDB uses an > event hook that makes use of the ColumnDef structure, the TimescaleDB > shared library compiled under 10.1 expected a different struct packing for > ColumnDef than what was available in 10.2. Ugh. I personally would say that's because that commit did stuff that we normally trie hard not to do. While ColumnDef at least isn't serialized into catalogs, we normally trie hard to break struct layout. Peter, shouldn't that field at the very least have been added at the end? > I had three questions: > > 1) Are similar changes to struct packing expected under minor postgres > releases (we haven't encountered this issue before)? Normally not. > 2) Is it expected to have to recompile extensions for minor version > upgrades of Postgres? Normally only in extraordinary cases. There e.g. had been security issues that required changes in PL handlers. Greetings, Andres Freund
Minor version upgrades and extension packaging
Hi All, I am writing to get some advice on extension packaging for minor version upgrades in Postgres. We recently found that people who had compiled the TimescaleDB extension against 10.1 (or installed our binary versions via yum, apt, etc.) had their extension break when they upgraded to 10.2 due to changes of some underlying structs between the two minor versions. In particular, in the commit https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1597948c962a1407c01fc492c44917c097efa92e the structure of the ColumnDef struct changed. Since TimescaleDB uses an event hook that makes use of the ColumnDef structure, the TimescaleDB shared library compiled under 10.1 expected a different struct packing for ColumnDef than what was available in 10.2. I had three questions: 1) Are similar changes to struct packing expected under minor postgres releases (we haven't encountered this issue before)? 2) Is it expected to have to recompile extensions for minor version upgrades of Postgres? 3) How do other extensions deal with this issue? Any other suggestions to handle these types of issues would be appreciated. One obvious solution with binary releases is to have one for every minor version, although this leads to both release and deployment complexity. Thanks, Mat TimescaleDB
OCD questions about LWTRANCHE_REPLICATION_ORIGIN
Hi, Just curious about a harmless inconsistency, really: why does src/backend/replication/logical/origin.c bother to copy LWTRANCHE_REPLICATION_ORIGIN into shm and then LWLockRegisterTranche() in every process from the shm copy? I guess because it used to allocate the tranche ID dynamically with LWLockNewTrancheId(), but it doesn't anymore. So my questions are: 1. Why shouldn't we just call LWLockRegisterTranche(LWTRANCHE_REPLICATION_ORIGIN, "replication_origin") and get rid of "tranche_id" in shm? 2. What is our basis for registering some built-in LWLock tranches in XXXShmemInit() and the rest in RegisterLWLockTranches()? Why shouldn't we just move this one over to RegisterLWLockTranches()? Then it'd be easier to verify that we've registered them all. Come to think of it, I suppose someone might even argue that we should have lwtranchenames.txt, much like lwlocknames.txt, that would produce the enumerators and register the display name for all built-in tranches. And then a hypothetical cross-check script that runs under check-world as previously mooted could verify that every display name mentioned in there is also mentioned in monitoring.sgml. I say that as someone who recently screwed up and forgot to register the display name of a new built-in tranche[1]... [1] https://www.postgresql.org/message-id/CAEepm%3D1WM%3D02cKT7hinnx8xRhN2QdF%2BWfE6gk5oMPorwLi%2BXiA%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Removing shm_mq.c's volatile qualifiers
Hi, As far as I can see, all the volatile qualifiers in shm_mq.c have been redundant since ec9037df263. Here's a patch to remove them (like several similar patches -- see commit message). Does this make sense? Is there something special about that pointer to volatile pointer to PGPROC? If so I don't see it. -- Thomas Munro http://www.enterprisedb.com 0001-Remove-volatile-qualifiers-from-shm_mq.c.patch Description: Binary data
Re: [HACKERS] A misconception about the meaning of 'volatile' in GetNewTransactionId?
On Sun, Apr 30, 2017 at 1:19 PM, Tom Lanewrote: > Thomas Munro writes: >> I was reading xact.c and noticed this block: >> ... >> Isn't this insufficient on non-TSO systems like POWER and Arm? > > Yeah, I think you're right. That code probably predates our support > for memory barriers, so "volatile" was the best we could do at the > time --- but as you say, it doesn't fix hardware-level rearrangements. Here is an experimental patch, for discussion only, to drop some apparently useless volatile qualifiers and introduce a write barrier when extending the array and a corresponding read barrier when scanning or copying the array from other processes. I wonder about this code that shrinks the array: #define XidCacheRemove(i) \ do { \ MyProc->subxids.xids[i] = MyProc->subxids.xids[MyPgXact->nxids - 1]; \ MyPgXact->nxids--; \ } while (0) If a concurrent process saw the decremented nxids value before seeing the effect of xids[i] = xids[final], then it would miss an arbitrary running subtransaction (not the aborting subtransaction being removed from the array, but whichever xid had the bad luck to be in final position). In the patch I added pg_write_barrier(), but I suspect that that might be not really a problem because of higher level interlocking that I'm missing, because this code makes no mention of the problem and doesn't (ab)use volatile qualifiers like the code that extends the array (so it has neither compiler barrier/volatile nor memory barrier so could be broken even on TSO assumptions at the whim of the compiler if my guess were right about that). -- Thomas Munro http://www.enterprisedb.com 0001-Use-explicit-memory-barriers-when-manipulating-MyPro.patch Description: Binary data
Re: A space-efficient, user-friendly way to store categorical data
On Mon, Feb 12, 2018 at 9:10 AM, Tom Lanewrote: > Andrew Kane writes: >> A better option could be a new "dynamic enum" type, which would have >> similar storage requirements as an enum, but instead of labels being >> declared ahead of time, they would be added as data is inserted. > > You realize, of course, that it's possible to add labels to an enum type > today. (Removing them is another story.) > > You haven't explained exactly what you have in mind that is going to be > able to duplicate the advantages of the current enum implementation > without its disadvantages, so it's hard to evaluate this proposal. > This sounds rather like the idea I have been tossing around in my head for a while, and in sporadic discussions with a few people, for a dictionary object. The idea is to have an append-only list of labels which would not obey transactional semantics, and would thus help us avoid the pitfalls of enums - there wouldn't be any rollback of an addition. The use case would be for a jsonb representation which would replace object keys with the oid value of the corresponding dictionary entry rather like enums now. We could have a per-table dictionary which in most typical json use cases would be very small, and we know from some experimental data that the compression in space used from such a change would often be substantial. This would have to be modifiable dynamically rather than requiring explicit additions to the dictionary, to be of practical use for the jsonb case, I believe. I hadn't thought about this as a sort of super enum that was usable directly by users, but it makes sense. I have no idea how hard or even possible it would be to implement. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: A space-efficient, user-friendly way to store categorical data
Andrew Kanewrites: > A better option could be a new "dynamic enum" type, which would have > similar storage requirements as an enum, but instead of labels being > declared ahead of time, they would be added as data is inserted. You realize, of course, that it's possible to add labels to an enum type today. (Removing them is another story.) You haven't explained exactly what you have in mind that is going to be able to duplicate the advantages of the current enum implementation without its disadvantages, so it's hard to evaluate this proposal. regards, tom lane
Re: ALTER TABLE ADD COLUMN fast default
On Sun, Feb 11, 2018 at 2:50 PM, Petr Jelinekwrote: >> >> >> Here's a version that fixes the above issue and also the issue with >> VACUUM that Tomas Vondra reported. I'm still working on the issue with >> aggregates that Tomas also reported. >> > > I see the patch does not update the ALTER TABLE docs section which > discusses table rewrites and it seems like it should. > Umm it changes the second and third paras of the Notes section, which refer to rewrites. Not sure what else it should change. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
A space-efficient, user-friendly way to store categorical data
Hi, I'm hoping to get feedback on an idea for a new data type to allow for efficient storage of text values while keeping reads and writes user-friendly. Suppose you want to store categorical data like current city for users. There will be a long list of cities, and many users will have the same city. Some options are: - Use a text column - Use an enum column - saves space, but labels must be set ahead of time - Create another table for cities (normalize) - saves space, but complicates reads and writes A better option could be a new "dynamic enum" type, which would have similar storage requirements as an enum, but instead of labels being declared ahead of time, they would be added as data is inserted. It'd be great to hear what others think of this (or if I'm missing something). Another direction could be to deduplicate values for TOAST-able data types. Thanks, Andrew
Re: [HACKERS] Fix warnings and typo in dshash
On Mon, Sep 4, 2017 at 2:18 PM, Amit Kapilawrote: > On Sun, Sep 3, 2017 at 2:56 PM, Thomas Munro > wrote: >> I think it should be (size_t) 1, not UINT64CONST(1). See attached. > > Okay, that makes sense. Do you think we should also change type > casting in BUCKETS_PER_PARTITION so that we are consistent? +1 Here's a patch to fix that. I was reminded to come back and tidy this up when I spotted a silly mistake in a nearby comment, also fixed in this patch. -- Thomas Munro http://www.enterprisedb.com 0001-Minor-clean-up-in-dshash.-c-h.patch Description: Binary data
persistent read cache
Hi. I wonder if there is such a thing or extension in the PG world. Here is my use case. I am using PG (PG10 to be more specific) in a cloud VM environment. The tables are stored in RAID0 managed SSD backed attached storage. Depending on the VM I am using, I usually have 256GB local SSD unused. I wonder if PG could leverage this local SSD as a read (page/block) cache, to complement/extend the DRAM by used by shared_buffer today. Thanks.
Re: JIT compiling with LLVM v9.0
On Thu, Jan 25, 2018 at 9:40 AM, Konstantin Knizhnikwrote: > As far as I understand generation of native code is now always done for all > supported expressions and individually by each backend. > I wonder it will be useful to do more efforts to understand when compilation > to native code should be done and when interpretation is better. > For example many JIT-able languages like Lua are using traces, i.e. query is > first interpreted and trace is generated. If the same trace is followed > more than N times, then native code is generated for it. > > In context of DBMS executor it is obvious that only frequently executed or > expensive queries have to be compiled. > So we can use estimated plan cost and number of query executions as simple > criteria for JIT-ing the query. > May be compilation of simple queries (with small cost) should be done only > for prepared statements... > > Another question is whether it is sensible to redundantly do expensive work > (llvm compilation) in all backends. > This question refers to shared prepared statement cache. But even without > such cache, it seems to be possible to use for library name some signature > of the compiled expression and allow > to share this libraries between backends. So before starting code > generation, ExecReadyCompiledExpr can first build signature and check if > correspondent library is already present. > Also it will be easier to control space used by compiled libraries in this > case. Totally agree; these considerations are very important. I tested several queries in my application that had >30 second compile times against a one second run time,. Not being able to manage when compilation happens is making it difficult to get a sense of llvm performance in the general case. Having explain analyze print compile time and being able to prepare llvm compiled queries ought to help measurement and tuning. There may be utility here beyond large analytical queries as the ability to optimize spreads through the executor with the right trade off management. This work is very exciting...thank you. merlin
Re: [HACKERS] generated columns
On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote: > That would be nice. I'm going to study this some more to see what can > be done. By the way, cannot we consider just doing stored generated columns as a first cut? Both virtual and stored columns have their use cases, but stored values have less complication and support actually a larger set of features, including rowtypes, index and constraint support. So it seems to me that if something goes into v11 then stored columns would be a better choice at this stage of the development cycle. Other DBMSs support stored values by default as well, and your v1 patch had a large portion of the work done if I recall correctly. -- Michael signature.asc Description: PGP signature
Re: [PATCH][PROPOSAL] Add enum releation option type
В письме от 9 февраля 2018 18:45:29 пользователь Alvaro Herrera написал: > If this patch gets in, I wonder if there are any external modules that > use actual strings. An hypothetical example would be something like a > SSL cipher list; it needs to be somewhat free-form that an enum would > not cut it. If there are such modules, then even if we remove all > existing in-core use cases we should keep the support code for strings. I did not remove string option support from the code. It might be needed later. > Maybe we need some in-core user to verify the string case still works. > A new module in src/test/modules perhaps? This sound as a good idea. I am too do not feel really comfortable with that this string options possibility exists, but is not tested. I'll have a look there, it it will not require a great job, I will add tests for string options there. > On the other hand, if we can > find no use for these string reloptions, maybe we should just remove the > support, since as I recall it's messy enough. No, the implementation of string options is quite good. And may be needed later. > > Possible flaws: > > > > 1. I've changed error message from 'Valid values are "XXX", "YYY" and > > "ZZZ".' to 'Valid values are "XXX", "YYY", "ZZZ".' to make a code a bit > > simpler. If it is not acceptable, please let me know, I will add "and" to > > the string. > I don't think we care about this, but is this still the case if you use > a stringinfo? May be not. I'll try to do it better. > > 2. Also about the string with the list of acceptable values: the code that > > creates this string is inside parse_one_reloption function now. > > I think you could save most of that mess by using appendStringInfo and > friends. Thanks. I will rewrite this part using these functions. That was really helpful. > I don't much like the way you've represented the list of possible values > for each enum. I think it'd be better to have a struct that represents > everything about each value (string name and C symbol. I actually do not like it this way too. I would prefer one structure, not two lists. But I did not find way how to do it in one struct. How to gave have string value and C symbol in one structure, without defining C symbols elsewhere. Otherwise it will be two lists again. I do not have a lot of hardcore C development experience, so I can miss something. Can you provide an example of the structure you are talking about? > Maybe the > numerical value too if that's needed, but is it? I suppose all code > should use the C symbol only, so why do we care about the numerical > value?). It is comfortable to have numerical values when debugging. I like to write something like elog(WARNING,"buffering_mode=%i",opt.buffering_mode); to check that everything works as expected. Such cases is the only reason to keep numerical value. -- Do code for fun. signature.asc Description: This is a digitally signed message part.
Re: Is there a cache consistent interface to tables ?
On 11 February 2018 at 04:44, Gary Mwrote: > Thanks Craig, > > As I'm back in pg code after many years, I'm feeling much better there's > one (1) or two (2) items causing the hiccup. Rereading your comments, I'm > agreeing with you. I'm considering bumping up the ram to 512gb as a RAM > disk just for consistent profiling (an old bad habit). > > I'm reassessing my profiling plan. I'm getting ahead of myself thinking > the OS is not having a significant effect. > Definitely. Especially since PostgreSQL uses buffered I/O. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services