Re: Fwd: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-25 Thread Mat Arye
On Mon, Jul 24, 2017 at 6:16 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Mat Arye <m...@timescaledb.com> writes:
> > On Mon, Jul 24, 2017 at 1:38 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> I'm not really sure why planner hooks would have anything to do with
> your
> >> exposed SQL API?
>
> > Sorry what I meant was i'd like to package different versions of my
> > extension -- both .sql and .c --
> > and have the extension act consistently for any version until I do a
> ALTER
> > EXTENSION UPDATE.
> > So, I'd prefer a DB with an older extension to have the logic/code in the
> > hook not change even if I install a newer version's .so for use in
> another
> > database
> > (but don't update the extension to the newer version).  Does that make
> any
> > sense?
>
> The newer version's .so simply is not going to load into the older
> version; we intentionally prevent that from happening.  It's not necessary
> anyway because versions do not share library directories.  Therefore,
> you can have foo.so for 9.5 in your 9.5 version's library directory,
> and foo.so for 9.6 in your 9.6 version's library directory, and the
> filesystem will keep them straight for you.  It is not necessary to
> call them foo-9.5.so and foo-9.6.so.
>

I meant the extension version not the PG version. Let me try to explain:
If version 0.1.0 has optimization A in the planner hook, and 0.2.0 has
optimization B,
I'd like the property that even if I install foo-0.2.0.so (and also have
foo-0.1.0.so) in the
cluster, any database that has not done an ALTER EXTENSION UPDATE will
still do A
while any databases that have updated the extension will do B. I'd also
like to avoid doing a bunch
of if/else statements to make this happen. But that's the ideal, not sure
if I can make this happen.



>
> As for the other point, the usual idea is that if you have a
> SQL-accessible C function xyz() that needs to behave differently after an
> extension version update, then you make the extension update script point
> the SQL function to a different library entry point.  If your 1.0
> extension version originally had
>
> CREATE FUNCTION xyz(...) RETURNS ...
>   LANGUAGE C AS 'MODULE_PATHNAME', 'xyz';
>
> (note that the second part of the AS clause might have been implicit;
> no matter), then your update script for version 1.1 could do
>
> CREATE OR REPLACE FUNCTION xyz(...) RETURNS ...
>   LANGUAGE C AS 'MODULE_PATHNAME', 'xyz_1_1';
>
> Then in the 1.1 version of the C code, the xyz_1_1() C function provides
> the new behavior, while the xyz() C function provides the old behavior,
> or maybe just throws an error if you conclude it's impractical to emulate
> the old behavior exactly.  As I mentioned earlier, you can usually set
> things up so that you can share much of the code between two such
> functions.
>

Thanks for that explanation. It's clear now.


>
> The pgstattuple C function in contrib/pgstattuple is one example of
> having changed a C function's behavior in this way over multiple versions.
>
> regards, tom lane
>


Fwd: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-24 Thread Mat Arye
(adding -hackers back into thread, got dropped by my email client, sorry)

On Mon, Jul 24, 2017 at 1:38 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Mat Arye <m...@timescaledb.com> writes:
> > I tried looking in the contrib modules and didn't find many with lots of
> > planner hook usage.
>
> I'm not really sure why planner hooks would have anything to do with your
> exposed SQL API?
>

Sorry what I meant was i'd like to package different versions of my
extension -- both .sql and .c --
and have the extension act consistently for any version until I do a ALTER
EXTENSION UPDATE.
So, I'd prefer a DB with an older extension to have the logic/code in the
hook not change even if I install a newer version's .so for use in another
database
(but don't update the extension to the newer version).  Does that make any
sense?


>
> You will need to have separate builds of your extension for each PG
> release branch you work with; we force that through PG_MODULE_MAGIC
> whether you like it or not.  But that doesn't translate to needing
> different names for the library .so files.  Generally people either
> mantain separate source code per-branch (just as the core code does)
> or put in a lot of #ifs testing CATALOG_VERSION_NO to see which
> generation of PG they're compiling against.
>

Yeah we plan to use different branches for different PG versions.


>
> regards, tom lane
>


Re: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-24 Thread Mat Arye
On Sat, Jul 22, 2017 at 10:50 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Jul 21, 2017 at 4:17 PM, Mat Arye <m...@timescaledb.com> wrote:
> > (I
> > want to avoid having to keep backwards-compatibility for all functions in
> > future shared-libraries).
>
> Are you sure that's a good idea?


No :). But we have a lot of (most of) code that is not
user-called-functions (planner/other hooks etc.). It seems dangerous to
update that code in the .so and have it possibly affect customers that are
using old versions of the extension. While it's possible to do that kind of
_v1 suffix code for planner functions as well, this seems like a nightmare
in terms of code maintenance (we already have 1000s of lines of C code). I
think a dynamic loader might be more work upfront but have major payoffs
for speed of development in the long term for us. It may also have
advantages in terms of update safety.  It's also worth noting that our C
code has some SPI upcalls, so keeping some sync between the sql and C code
is even more of an issue for us (if we can't make the dynamic/stub loader
approach work, this might be an anti-pattern and we may have to review
doing upcalls at all).


> It seems like swimming upstream
> against the design.  I mean, instead of creating a dispatcher library
> that loads either v1 or v2, maybe you could just put it all in one
> library, add a "v1" or "v2" suffix to the actual function names where
> appropriate, and then set up the SQL definitions to call the correct
> one.  I mean, it's the same thing, but with less chance of the dynamic
> loader ruining your day.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


[HACKERS] Syncing sql extension versions with shared library versions

2017-07-21 Thread Mat Arye
Hi All,

I am developing the TimescaleDB extension for postgres (
https://github.com/timescale/timescaledb) and have some questions about
versioning. First of all, I have to say that the versioning system on the
sql side is wonderful. It's really simple to write migrations etc.

However when thinking through the implications of having a database cluster
with databases having different extension versions installed, it was not
apparently clear to me how to synchronize the installed extension version
with a shared library version. For example, if I have timescaledb version
0.1.0 in one db and version 0.2.0 in another db, I'd like for
timescaledb-0.1.0.so and  timescaledb-0.2.0.so to be used, respectively. (I
want to avoid having to keep backwards-compatibility for all functions in
future shared-libraries). In our case, this is further complicated by the
fact that we need to preload the shared library since we are accessing the
planner hooks etc. Below, I'll describe some solutions I have been thinking
about, but wanted to hear if anyone else on this list has already solved
this problem and has some insight. It is also quite possible I am missing
something.

Issue 1: Preloading the right shared library.
When preloading libraries (either via local_preload_libraries, or
session_preload_libraries, shared_preload_libraries), it would be nice to
preload the shared_library according to it's version. But, I looked through
the code and found no logic for adding version numbers to shared library
names.
Solution 1: Set session_preload_libraries on the database via ALTER
DATABASE SET. This can be done in the sql and the sql version-migration
scripts can change this value as you change extensions versions. I think
this would work, but it seems very hack-ish and less-than-ideal.
Solution 2: Create some kind of stub shared-library that will, in turn,
load another shared library of the correct version. This seems like the
cleaner approach. Has anybody seen/implemented something like this already?

Issue 2: module_pathname
I believe that for user defined function the MODULE_PATHNAME substitution
will not work since that setting is set once per-extension. Thus, for
example, the migration scripts that include function definitions for older
versions would use the latest .so file if MODULE_PATHNAME was used in the
definition. This would be a problem if upgrading to an intermediate (not
latest) version.
Solution: MODULE_PATHNAME cannot be used, and we should build our own
templating/makefile infrastructure to link files to the right-versioned
shared library in the CREATE FUNCTION definition.

Thanks a lot in advance,
Mat Arye
http://www.timescale.com/


Re: [HACKERS] Question about toasting code

2017-05-07 Thread Mat Arye
On Sun, May 7, 2017 at 3:48 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Mat Arye <m...@timescale.com> writes:
> > This is in arrayfuncs.c:5022 (postgre 9.6.2)
>
> > /*
> > * Ensure pass-by-ref stuff is copied into mcontext; and detoast it too if
> > * it's varlena.  (You might think that detoasting is not needed here
> > * because construct_md_array can detoast the array elements later.
> > * However, we must not let construct_md_array modify the ArrayBuildState
> > * because that would mean array_agg_finalfn damages its input, which is
> > * verboten.  Also, this way frequently saves one copying step.)
> > */
>
> > I am a bit confused by the comment.
>
> > Does PG_DETOAST_DATUM_COPY(dvalue) modify dvalue?
>
> No.
>
> What the comment is on about is that construct_md_array does this:
>
> /* make sure data is not toasted */
> if (elmlen == -1)
> elems[i] = PointerGetDatum(PG_DETOAST_DATUM(elems[i]));
>
> that is, it intentionally modifies the passed-in elems array in
> the case that one of the elements is a toasted value.  For most
> callers, the elems array is only transient storage anyway.  But
> it's problematic for makeMdArrayResult, because it would mean
> that the ArrayBuildState is changed --- and while it's not changed
> in a semantically significant way, that's still a problem, because
> the detoasted value would be allocated in a context that is possibly
> shorter-lived than the ArrayBuildState is.  (In a hashed aggregation
> situation, the ArrayBuildState is going to live in storage that is
> persistent across the aggregation, but we are calling makeMdArrayResult
> in the context where we want the result value to be created, which
> is of only per-output-tuple lifespan.)
>
> You could imagine fixing this by having construct_md_array do some
> context switches internally, but that would complicate it.  And in
> any case, fixing the problem where it is allows us to combine the
> possible detoasting with copying the value into the ArrayBuildState's
> context, so we'd probably keep doing that even if it was safe not to.
>
>
Thanks. That clears it up.


> > The thing I am struggling with is with the serialize/deserialize
> functions.
> > Can I detoast inside the serialize function if I use _COPY? Or is there a
> > reason I have to detoast during the sfunc?
>
> Should be able to detoast in serialize if you want.  Are you having
> trouble with that?  (It might be inefficient to do it that way, if
> you have to serialize the same value multiple times.  But I'm not
> sure if that can happen.)
>
>
I haven't run into any actual problems yet, just wanted to figure out a
clean mental model before implementing. Thanks a lot for the clarification.


> regards, tom lane
>


[HACKERS] Question about toasting code

2017-05-07 Thread Mat Arye
Hi,

I am trying to create a custom aggregate and have run across some puzzling
code while trying to figure out how to implement it.

This is in arrayfuncs.c:5022 (postgre 9.6.2)

/*
* Ensure pass-by-ref stuff is copied into mcontext; and detoast it too if
* it's varlena.  (You might think that detoasting is not needed here
* because construct_md_array can detoast the array elements later.
* However, we must not let construct_md_array modify the ArrayBuildState
* because that would mean array_agg_finalfn damages its input, which is
* verboten.  Also, this way frequently saves one copying step.)
*/
if (!disnull && !astate->typbyval)
{
if (astate->typlen == -1)
dvalue = PointerGetDatum(PG_DETOAST_DATUM_COPY(dvalue));
else
dvalue = datumCopy(dvalue, astate->typbyval, astate->typlen);
}

I am a bit confused by the comment.

Does PG_DETOAST_DATUM_COPY(dvalue) modify dvalue? Shouldn't that  not
modify the value (implied by _COPY)? If so then why does the detoasting
have to happen ahead of time and cannot happen at a later stage (in the
finalfunc or serializefunc etc.)? I understand that at those later stages
you cannot modify the input, but why would you have to in order to DETOAST?

The thing I am struggling with is with the serialize/deserialize functions.
Can I detoast inside the serialize function if I use _COPY? Or is there a
reason I have to detoast during the sfunc?


Thanks,
Mat
TimescaleDB


[HACKERS] Order-preserving function transforms and EquivalenceClass

2017-03-23 Thread Mat Arye
Hi,

I am on a team developing an open-source extension for time-series data
storage in PostgreSQL (https://github.com/timescaledb/timescaledb).

We are trying to extend/hook into the planner so that it understands that
date_trunc('minute', time) has the same ordering as time (or rather that a
sort ordering on the latter is always a valid sort ordering on the former).
But this question really applies to any order-preserving transform such as
(time+1) vs (time).

Let's assume we can detect such cases. How do we tell the planner about it.

I see two ways of doing this:

(1) Having an EquivalenceClass with two members - the "time" and
"date_trunc" member. Then the pathkey for the sort would reference this
member and understand the equivalence. I think this is a pretty clean way
of doing things. But this equivalence between "time" and "date_trunc" only
applies to sorts. It should not apply to joins (for which EquivalenceClass
is also used) because these values are not actually equivalent. They are
only equivalent for sorting purposes. I know an EquivalenceClass has
a ec_has_volatile field which marks the EquivalenceClass as only for sorts
and not for joins. But is it an incorrect hack to set this for cases where
the EquivalenceMembers are not volatile? Is there some other way as marking
an EquivalenceClass as only for sorts that I am missing? Another wrinkle is
that while a date_trunc sort can be safely represented by a time sort the
reverse isn't true. Has there been any discussion on supporting such cases
in EquivalenceClasses? I'd be happy to submit a patch to the core if people
think this is worthwhile.

(2) I can create new EquivalenceClass objects and pathkeys and use planner
hooks to substitute the appropriate pathkeys for doing things like finding
indexes etc. I have a prototype where this works, but I just think approach
1 is much cleaner.

Any thoughts would be greatly appreciated. I am pretty new to the planner
codebase and just want to avoid any obvious pitfalls.

Thanks,
Matvey Arye