Re: Fwd: [HACKERS] Syncing sql extension versions with shared library versions
On Mon, Jul 24, 2017 at 6:16 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Mat Arye <m...@timescaledb.com> writes: > > On Mon, Jul 24, 2017 at 1:38 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> I'm not really sure why planner hooks would have anything to do with > your > >> exposed SQL API? > > > Sorry what I meant was i'd like to package different versions of my > > extension -- both .sql and .c -- > > and have the extension act consistently for any version until I do a > ALTER > > EXTENSION UPDATE. > > So, I'd prefer a DB with an older extension to have the logic/code in the > > hook not change even if I install a newer version's .so for use in > another > > database > > (but don't update the extension to the newer version). Does that make > any > > sense? > > The newer version's .so simply is not going to load into the older > version; we intentionally prevent that from happening. It's not necessary > anyway because versions do not share library directories. Therefore, > you can have foo.so for 9.5 in your 9.5 version's library directory, > and foo.so for 9.6 in your 9.6 version's library directory, and the > filesystem will keep them straight for you. It is not necessary to > call them foo-9.5.so and foo-9.6.so. > I meant the extension version not the PG version. Let me try to explain: If version 0.1.0 has optimization A in the planner hook, and 0.2.0 has optimization B, I'd like the property that even if I install foo-0.2.0.so (and also have foo-0.1.0.so) in the cluster, any database that has not done an ALTER EXTENSION UPDATE will still do A while any databases that have updated the extension will do B. I'd also like to avoid doing a bunch of if/else statements to make this happen. But that's the ideal, not sure if I can make this happen. > > As for the other point, the usual idea is that if you have a > SQL-accessible C function xyz() that needs to behave differently after an > extension version update, then you make the extension update script point > the SQL function to a different library entry point. If your 1.0 > extension version originally had > > CREATE FUNCTION xyz(...) RETURNS ... > LANGUAGE C AS 'MODULE_PATHNAME', 'xyz'; > > (note that the second part of the AS clause might have been implicit; > no matter), then your update script for version 1.1 could do > > CREATE OR REPLACE FUNCTION xyz(...) RETURNS ... > LANGUAGE C AS 'MODULE_PATHNAME', 'xyz_1_1'; > > Then in the 1.1 version of the C code, the xyz_1_1() C function provides > the new behavior, while the xyz() C function provides the old behavior, > or maybe just throws an error if you conclude it's impractical to emulate > the old behavior exactly. As I mentioned earlier, you can usually set > things up so that you can share much of the code between two such > functions. > Thanks for that explanation. It's clear now. > > The pgstattuple C function in contrib/pgstattuple is one example of > having changed a C function's behavior in this way over multiple versions. > > regards, tom lane >
Fwd: [HACKERS] Syncing sql extension versions with shared library versions
(adding -hackers back into thread, got dropped by my email client, sorry) On Mon, Jul 24, 2017 at 1:38 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Mat Arye <m...@timescaledb.com> writes: > > I tried looking in the contrib modules and didn't find many with lots of > > planner hook usage. > > I'm not really sure why planner hooks would have anything to do with your > exposed SQL API? > Sorry what I meant was i'd like to package different versions of my extension -- both .sql and .c -- and have the extension act consistently for any version until I do a ALTER EXTENSION UPDATE. So, I'd prefer a DB with an older extension to have the logic/code in the hook not change even if I install a newer version's .so for use in another database (but don't update the extension to the newer version). Does that make any sense? > > You will need to have separate builds of your extension for each PG > release branch you work with; we force that through PG_MODULE_MAGIC > whether you like it or not. But that doesn't translate to needing > different names for the library .so files. Generally people either > mantain separate source code per-branch (just as the core code does) > or put in a lot of #ifs testing CATALOG_VERSION_NO to see which > generation of PG they're compiling against. > Yeah we plan to use different branches for different PG versions. > > regards, tom lane >
Re: [HACKERS] Syncing sql extension versions with shared library versions
On Sat, Jul 22, 2017 at 10:50 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Fri, Jul 21, 2017 at 4:17 PM, Mat Arye <m...@timescaledb.com> wrote: > > (I > > want to avoid having to keep backwards-compatibility for all functions in > > future shared-libraries). > > Are you sure that's a good idea? No :). But we have a lot of (most of) code that is not user-called-functions (planner/other hooks etc.). It seems dangerous to update that code in the .so and have it possibly affect customers that are using old versions of the extension. While it's possible to do that kind of _v1 suffix code for planner functions as well, this seems like a nightmare in terms of code maintenance (we already have 1000s of lines of C code). I think a dynamic loader might be more work upfront but have major payoffs for speed of development in the long term for us. It may also have advantages in terms of update safety. It's also worth noting that our C code has some SPI upcalls, so keeping some sync between the sql and C code is even more of an issue for us (if we can't make the dynamic/stub loader approach work, this might be an anti-pattern and we may have to review doing upcalls at all). > It seems like swimming upstream > against the design. I mean, instead of creating a dispatcher library > that loads either v1 or v2, maybe you could just put it all in one > library, add a "v1" or "v2" suffix to the actual function names where > appropriate, and then set up the SQL definitions to call the correct > one. I mean, it's the same thing, but with less chance of the dynamic > loader ruining your day. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
[HACKERS] Syncing sql extension versions with shared library versions
Hi All, I am developing the TimescaleDB extension for postgres ( https://github.com/timescale/timescaledb) and have some questions about versioning. First of all, I have to say that the versioning system on the sql side is wonderful. It's really simple to write migrations etc. However when thinking through the implications of having a database cluster with databases having different extension versions installed, it was not apparently clear to me how to synchronize the installed extension version with a shared library version. For example, if I have timescaledb version 0.1.0 in one db and version 0.2.0 in another db, I'd like for timescaledb-0.1.0.so and timescaledb-0.2.0.so to be used, respectively. (I want to avoid having to keep backwards-compatibility for all functions in future shared-libraries). In our case, this is further complicated by the fact that we need to preload the shared library since we are accessing the planner hooks etc. Below, I'll describe some solutions I have been thinking about, but wanted to hear if anyone else on this list has already solved this problem and has some insight. It is also quite possible I am missing something. Issue 1: Preloading the right shared library. When preloading libraries (either via local_preload_libraries, or session_preload_libraries, shared_preload_libraries), it would be nice to preload the shared_library according to it's version. But, I looked through the code and found no logic for adding version numbers to shared library names. Solution 1: Set session_preload_libraries on the database via ALTER DATABASE SET. This can be done in the sql and the sql version-migration scripts can change this value as you change extensions versions. I think this would work, but it seems very hack-ish and less-than-ideal. Solution 2: Create some kind of stub shared-library that will, in turn, load another shared library of the correct version. This seems like the cleaner approach. Has anybody seen/implemented something like this already? Issue 2: module_pathname I believe that for user defined function the MODULE_PATHNAME substitution will not work since that setting is set once per-extension. Thus, for example, the migration scripts that include function definitions for older versions would use the latest .so file if MODULE_PATHNAME was used in the definition. This would be a problem if upgrading to an intermediate (not latest) version. Solution: MODULE_PATHNAME cannot be used, and we should build our own templating/makefile infrastructure to link files to the right-versioned shared library in the CREATE FUNCTION definition. Thanks a lot in advance, Mat Arye http://www.timescale.com/
Re: [HACKERS] Question about toasting code
On Sun, May 7, 2017 at 3:48 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Mat Arye <m...@timescale.com> writes: > > This is in arrayfuncs.c:5022 (postgre 9.6.2) > > > /* > > * Ensure pass-by-ref stuff is copied into mcontext; and detoast it too if > > * it's varlena. (You might think that detoasting is not needed here > > * because construct_md_array can detoast the array elements later. > > * However, we must not let construct_md_array modify the ArrayBuildState > > * because that would mean array_agg_finalfn damages its input, which is > > * verboten. Also, this way frequently saves one copying step.) > > */ > > > I am a bit confused by the comment. > > > Does PG_DETOAST_DATUM_COPY(dvalue) modify dvalue? > > No. > > What the comment is on about is that construct_md_array does this: > > /* make sure data is not toasted */ > if (elmlen == -1) > elems[i] = PointerGetDatum(PG_DETOAST_DATUM(elems[i])); > > that is, it intentionally modifies the passed-in elems array in > the case that one of the elements is a toasted value. For most > callers, the elems array is only transient storage anyway. But > it's problematic for makeMdArrayResult, because it would mean > that the ArrayBuildState is changed --- and while it's not changed > in a semantically significant way, that's still a problem, because > the detoasted value would be allocated in a context that is possibly > shorter-lived than the ArrayBuildState is. (In a hashed aggregation > situation, the ArrayBuildState is going to live in storage that is > persistent across the aggregation, but we are calling makeMdArrayResult > in the context where we want the result value to be created, which > is of only per-output-tuple lifespan.) > > You could imagine fixing this by having construct_md_array do some > context switches internally, but that would complicate it. And in > any case, fixing the problem where it is allows us to combine the > possible detoasting with copying the value into the ArrayBuildState's > context, so we'd probably keep doing that even if it was safe not to. > > Thanks. That clears it up. > > The thing I am struggling with is with the serialize/deserialize > functions. > > Can I detoast inside the serialize function if I use _COPY? Or is there a > > reason I have to detoast during the sfunc? > > Should be able to detoast in serialize if you want. Are you having > trouble with that? (It might be inefficient to do it that way, if > you have to serialize the same value multiple times. But I'm not > sure if that can happen.) > > I haven't run into any actual problems yet, just wanted to figure out a clean mental model before implementing. Thanks a lot for the clarification. > regards, tom lane >
[HACKERS] Question about toasting code
Hi, I am trying to create a custom aggregate and have run across some puzzling code while trying to figure out how to implement it. This is in arrayfuncs.c:5022 (postgre 9.6.2) /* * Ensure pass-by-ref stuff is copied into mcontext; and detoast it too if * it's varlena. (You might think that detoasting is not needed here * because construct_md_array can detoast the array elements later. * However, we must not let construct_md_array modify the ArrayBuildState * because that would mean array_agg_finalfn damages its input, which is * verboten. Also, this way frequently saves one copying step.) */ if (!disnull && !astate->typbyval) { if (astate->typlen == -1) dvalue = PointerGetDatum(PG_DETOAST_DATUM_COPY(dvalue)); else dvalue = datumCopy(dvalue, astate->typbyval, astate->typlen); } I am a bit confused by the comment. Does PG_DETOAST_DATUM_COPY(dvalue) modify dvalue? Shouldn't that not modify the value (implied by _COPY)? If so then why does the detoasting have to happen ahead of time and cannot happen at a later stage (in the finalfunc or serializefunc etc.)? I understand that at those later stages you cannot modify the input, but why would you have to in order to DETOAST? The thing I am struggling with is with the serialize/deserialize functions. Can I detoast inside the serialize function if I use _COPY? Or is there a reason I have to detoast during the sfunc? Thanks, Mat TimescaleDB
[HACKERS] Order-preserving function transforms and EquivalenceClass
Hi, I am on a team developing an open-source extension for time-series data storage in PostgreSQL (https://github.com/timescaledb/timescaledb). We are trying to extend/hook into the planner so that it understands that date_trunc('minute', time) has the same ordering as time (or rather that a sort ordering on the latter is always a valid sort ordering on the former). But this question really applies to any order-preserving transform such as (time+1) vs (time). Let's assume we can detect such cases. How do we tell the planner about it. I see two ways of doing this: (1) Having an EquivalenceClass with two members - the "time" and "date_trunc" member. Then the pathkey for the sort would reference this member and understand the equivalence. I think this is a pretty clean way of doing things. But this equivalence between "time" and "date_trunc" only applies to sorts. It should not apply to joins (for which EquivalenceClass is also used) because these values are not actually equivalent. They are only equivalent for sorting purposes. I know an EquivalenceClass has a ec_has_volatile field which marks the EquivalenceClass as only for sorts and not for joins. But is it an incorrect hack to set this for cases where the EquivalenceMembers are not volatile? Is there some other way as marking an EquivalenceClass as only for sorts that I am missing? Another wrinkle is that while a date_trunc sort can be safely represented by a time sort the reverse isn't true. Has there been any discussion on supporting such cases in EquivalenceClasses? I'd be happy to submit a patch to the core if people think this is worthwhile. (2) I can create new EquivalenceClass objects and pathkeys and use planner hooks to substitute the appropriate pathkeys for doing things like finding indexes etc. I have a prototype where this works, but I just think approach 1 is much cleaner. Any thoughts would be greatly appreciated. I am pretty new to the planner codebase and just want to avoid any obvious pitfalls. Thanks, Matvey Arye