Re: [HACKERS] Add Roman numeral conversion to to_number

2017-09-19 Thread Doug Doole
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Applied clean and runs fine. Previous comments were addressed.

There's been some discussion in the thread about whether this is worth doing, 
but given that it is done it seems a bit of a moot point. So passing this off 
to the committers to make a final call.

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add Roman numeral conversion to to_number

2017-09-14 Thread Doug Doole
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Code looks fine, but one niggly complaint at line 146 of the patch file ("while 
(*cp) {"). A K style brace slipped in, which doesn't match the formatting of 
the file.

Given that this is providing new formatting options, there should be new tests 
added that validate the options and error handling.

There's also the "do we want this?" debate from the discussion thread that 
still needs to be resolved. (I don't have an opinion either way.)

I'm sending this back to the author to address the first two issues.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small code improvement for btree

2017-09-14 Thread Doug Doole
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Looks good to me.

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-04-26 Thread Doug Doole
>
> A naive option would be to invalidate anything that depends on table or
> view *.FOOBAR. You could probably make it a bit smarter by also requiring
> that schema A appear in the path.
>

This has been rumbling around in my head. I wonder if you could solve this
problem by registering dependencies on objects which don't yet exist.
Consider:

CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;

For T1, you'd register a hard dependency on C.T1 and no virtual
dependencies since the table is explicitly qualified.

For T2, you'd register a hard dependency on C.T2 since that is the table
that was selected for the query. You'd also register virtual dependencies
on A.T2 and B.T2 since if either of those tables (or views) are created you
need to recompile the statement. (Note that no virtual dependency is
created on D.T2() since that table would never be selected by the compiler.)

The catch is that virtual dependencies would have to be recorded and
searched as strings, not OIDs since the objects don't exist. Virtual
dependencies only have to be checked during CREATE processing though, so
that might not be too bad.

But this is getting off topic - I just wanted to capture the idea while it
was rumbling around.


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole
>
> (FWIW, on this list we don't do top-quotes)
>

I know. Forgot and just did "reply all". My bad.

It's not always that simple, at least in postgres, unless you disregard
> search_path.  Consider e.g. cases like
>
> CREATE SCHEMA a;
> CREATE SCHEMA b;
> CREATE TABLE a.foobar(somecol int);
> SET search_patch = 'b,a';
> SELECT * FROM foobar;
> CREATE TABLE b.foobar(anothercol int);
> SELECT * FROM foobar; -- may not be cached plan from before!
>
> it sounds - my memory of DB2 is very faint, and I never used it much -
> like similar issues could arise in DB2 too?
>

DB2 does handle this case. Unfortunately I don't know the details of how it
worked though.

A naive option would be to invalidate anything that depends on table or
view *.FOOBAR. You could probably make it a bit smarter by also requiring
that schema A appear in the path.

- Doug


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole
Plan invalidation was no different than for any SQL statement. DB2 keeps a
list of the objects the statement depends on. If any of the objects changes
in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called
the "compilation environment". This is a collection of everything that
impacts how a statement is compiled (SQL path, optimization level, etc.).
Plan lookup is done using both the statement text and the compilation
environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
path is ANDRES, MYTEAM, SYSIBM we will have different compilation
environments. If we both issue "SELECT * FROM T" we'll end up with
different cache entries even if T in both of our statements resolves to
MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
execute "SELECT * FROM T" again, I have a new compilation environment so
the second invocation of the statement will create a new entry in the
cache. The first entry is not kicked out - it will still be there for
re-use if I change my SQL path back to my original value (modulo LRU for
cache memory management of course).

With literal replacement, the cache entry is on the modified statement
text. Given the modified statement text and the compilation environment,
you're guaranteed to get the right plan entry.

On Tue, Apr 25, 2017 at 2:47 PM Andres Freund <and...@anarazel.de> wrote:

> On 2017-04-25 21:11:08 +, Doug Doole wrote:
> > When I did this in DB2, I didn't use the parser - it was too expensive. I
> > just tokenized the statement and used some simple rules to bypass the
> > invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> > disallow replacement replacement until I hit the end of the current
> > subquery or statement.
>
> How did you manage plan invalidation and such?
>
> - Andres
>


Re: [HACKERS] Cached plans and statement generalization

2017-04-25 Thread Doug Doole
When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

There are a few limitations to this approach. For example, DB2 allowed you
to cast using function notation like VARCHAR(foo, 10). This meant I would
never replace the second parameter of any VARCHAR function. Now it's
possible that when the statement was fully compiled we'd find that
VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in
cast function. Our thinking that these cases were rare enough that we
wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax
avoids this problem completely.)

Because SQL is so structured, the implementation ended up being quite
simple (a few hundred line of code) with no significant maintenance issues.
(Other developers had no problem adding in new cases where constants had to
be preserved.)

The simple tokenizer was also fairly extensible. I'd prototyped using the
same code to also normalize statements (uppercase all keywords, collapse
whitespace to a single blank, etc.) but that feature was never added to the
product.

- Doug

On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> On 04/25/2017 11:40 PM, Serge Rielau wrote:
>
>
> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>
> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>
> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
>
>
> I am substituting only string literals. So the query above will be
> transformed to
>
> SELECT $1::CHAR(10) || $2, 5 + 6;
>
> What's wrong with it?
>
>
> Oh, well that leaves a lot of opportunities on the table, doesn’t it?
>
>
> Well, actually my primary intention was not to make badly designed
> programs (not using prepared statements) work faster.
> I wanted to address cases when it is not possible to use prepared
> statements.
> If we want to substitute with parameters as much literals as possible,
> then parse+deparse tree seems to be the only reasonable approach.
> I will try to implement it also, just to estimate parsing overhead.
>
>
>
>
> Cheers
> Serge
>
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: [HACKERS] Improving executor performance

2016-11-07 Thread Doug Doole

Attached (in patch 0003) is a proof-of-concept implementing an
expression evalution framework that doesn't use recursion. Instead
ExecInitExpr2 computes a number of 'steps' necessary to compute an
expression. These steps are stored in a linear array, and executed one
after another (save boolean expressions, which can jump to later steps).
E.g. to compute colname = 1 the steps are 1) fetch var, 2) evaluate
const, 3) call function.
We've been having trouble with the performance of simple expressions in 
PLpgSQL so I started playing with this patch. (No sense re-inventing the 
wheel after all.) It was straightforward to extend to simple expressions 
and showed an immediate improvement (~10% faster on a simple test). 
Running in our full environment highlighted a few areas that I think are 
worth more investigation.


However, before I tackle that, is the posted proof-of-concept still the 
latest and greatest? If not, any chance of getting the latest?


Going forward, I'd be happy to collaborate on our efforts.

- Doug Doole
Salesforce


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance

2016-11-04 Thread Doug Doole

On 07/13/2016 06:18 PM, Andres Freund wrote:

Attached (in patch 0003) is a proof-of-concept implementing an
expression evalution framework that doesn't use recursion. Instead
ExecInitExpr2 computes a number of 'steps' necessary to compute an
expression. These steps are stored in a linear array, and executed one
after another (save boolean expressions, which can jump to later steps).
E.g. to compute colname = 1 the steps are 1) fetch var, 2) evaluate
const, 3) call function.


We've been having trouble with the performance of simple expressions in 
PLpgSQL so I started playing with this patch. (No sense re-inventing the 
wheel after all.) It was straightforward to extend to simple expressions 
and showed an immediate improvement (~10% faster on a simple test). 
Running in our full environment highlighted a few areas that I think are 
worth more investigation.


However, before I tackle that, is the posted proof-of-concept still the 
latest and greatest? If not, any chance of getting the latest?


Going forward, I'd like to collaborate on our efforts if you're interested.

- Doug Doole
Salesforce


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2016-09-07 Thread Doug Doole
>
> This isn't a problem for Postgres, or at least wouldn't be right now,
> because we don't have case insensitive collations.


I was wondering if Postgres might be that way. It does avoid the RI
constraint problem, but there are still troubles with range based
predicates. (My previous project wanted case/accent insensitive collations,
so we got to deal with it all.)


> So, we use a strcmp()/memcmp() tie-breaker when strcoll() indicates
> equality, while also making the general notion of text equality actually
> mean binary equality.


We used a similar tie breaker in places. (e.g. Index keys needed to be
identical, not just equal. We also broke ties in sort to make its behaviour
more deterministic.)

I would like to get case insensitive collations some day, and was
> really hoping that ICU would help. That being said, the need for a
> strcmp() tie-breaker makes that hard. Oh well.
>

Prior to adding ICU to my previous project, it had the assumption that
equal meant identical as well. It turned out to be a lot easier to break
this assumption than I expected, but that code base had religiously used
its own string comparison function for user data - strcmp()/memcmp() was
never called for user data. (I don't know if the same can be said for
Postgres.) We found that very few places needed to be aware of values that
were equal but not identical. (Index and sort were the big two.)

Hopefully Postgres will be the same.

--
Doug Doole


Re: [HACKERS] ICU integration

2016-09-07 Thread Doug Doole
> I understand that in principle, but I don't see operating system
> providers shipping a bunch of ICU versions to facilitate that.  They
> will usually ship one.

Yep. If you want the protection I've described, you can't depend on the OS
copy of ICU. You need to have multiple ICU libraries that are
named/installed such that you can load the specific version you want. It
also means that you can have dependencies on versions of ICU that are no
longer supported. (In my previous project, we were shipping 3 copies of the
ICU library, going back to 2.x. Needless to say, we didn't add support for
every drop of ICU.)

On Wed, Sep 7, 2016 at 5:53 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/6/16 1:40 PM, Doug Doole wrote:
> > We carried the ICU version numbers around on our collation and locale
> > IDs (such as fr_FR%icu36) . The database would load multiple versions of
> > the ICU library so that something created with ICU 3.6 would always be
> > processed with ICU 3.6. This avoided the problems of trying to change
> > the rules on the user. (We'd always intended to provide tooling to allow
> > the user to move an existing object up to a newer version of ICU, but we
> > never got around to doing it.) In the code, this meant we were
> > explicitly calling the versioned API so that we could keep the calls
> > straight.
>
> I understand that in principle, but I don't see operating system
> providers shipping a bunch of ICU versions to facilitate that.  They
> will usually ship one.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] ICU integration

2016-09-06 Thread Doug Doole
> The ICU ABI (not API) is also versioned.  The way that this is done is
> that all functions are actually macros to a versioned symbol.  So
> ucol_open() is actually a macro that expands to, say, ucol_open_57() in
> ICU version 57.  (They also got rid of a dot in their versions a while
> ago.)  It's basically hand-crafted symbol versioning.  That way, you can
> link with multiple versions of ICU at the same time.  However, the
> purpose of that, as I understand it, is so that plugins can have a
> different version of ICU loaded than the main process or another plugin.
 > In terms of postgres using the right version of ICU, it doesn't buy
> anything beyond what the soname mechanism does.

You can access the versioned API as well, it's just not documented. (The
ICU team does support this - we worked very closely with them when doing
all this.) We exploited the versioned API when we learned that there is no
guarantee of backwards compatibility in collations. You can't just change a
collation under a user (at least that was our opinion) since it can cause
all sorts of problems. Refreshing a collation (especially on the fly) is a
lot more work than we were prepared to take on. So we exploited the
versioned APIs.

We carried the ICU version numbers around on our collation and locale IDs
(such as fr_FR%icu36) . The database would load multiple versions of the
ICU library so that something created with ICU 3.6 would always be
processed with ICU 3.6. This avoided the problems of trying to change the
rules on the user. (We'd always intended to provide tooling to allow the
user to move an existing object up to a newer version of ICU, but we never
got around to doing it.) In the code, this meant we were explicitly calling
the versioned API so that we could keep the calls straight. (Of course this
was abstracted in a set of our own locale functions so that the rest of the
engine was ignorant of the ICU library fun that was going on.)

> We can refine the guidance.  But indexes are the most important issue, I
> think, because changing the sorting rules in the background makes data
> silently disappear.

I'd say that collation is the most important issue, but collation impacts a
lot more than indexes.

Unfortunately as part of changing companies I had to leave my "screwy stuff
that has happened in collations" presentation behind so I don't have
concrete examples to point to, but I can cook up illustrative examples:

- Suppose in ICU X.X, AA = Å but in ICU Y.Y AA != Å. Further suppose there
was an RI constraint where the primary key used AA and the foreign key
used Å. If ICU was updated, the RI constraint between the rows would break,
leaving an orphaned foreign key.

- I can't remember the specific language but they had the collation rule
that "CH" was treated as a distinct entity between C and D. This gave the
order C < CG < CI < CZ < CH < D. Then they removed CH as special which gave
C < CG < CH < CI < CZ < D. Suppose there was the constraint CHECK (COL
BETWEEN 'C' AND 'CH'). Originally it would allow (almost) all strings that
started with C. After the change it the constraint would block everything
that started with CI - CZ leaving many rows that no longer qualify in the
database.

It could be argued that these are edge cases and not likely to be hit.
That's likely true for a lot of users. But for a user who hits this, their
database is going to be left in a mess.

--
Doug Doole

On Tue, Sep 6, 2016 at 8:37 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 8/31/16 4:24 PM, Doug Doole wrote:
> > ICU explicitly does not provide stability in their locales and
> collations. We pushed them hard to provide this, but between changes to the
> CLDR data and changes to the ICU code it just wasn’t feasible for them to
> provide version to version stability.
> >
> > What they do offer is a compile option when building ICU to version all
> their APIs. So instead of calling icu_foo() you’d call icu_foo46(). (Or
> something like this - it’s been a few years since I actually worked with
> the ICU code.) This ultimately allows you to load multiple versions of the
> ICU library into a single program and provide stability by calling the
> appropriate version of the library. (Unfortunately, the OS - at least my
> Linux box - only provides the generic version of ICU and not the version
> annotated APIs, which means a separate compile of ICU is needed.)
> >
> > The catch with this is that it means you likely want to expose the
> version information. In another note it was suggested to use something like
> fr_FR%icu. If you want to pin it to a specific version of ICU, you’ll
> likely need something like fr_FR%icu46. (There’s nothing wrong with
> supporting fr_FR%icu to give users an easy way of saying “give me the
> latest and greatest”, but you’d p

Re: [HACKERS] ICU integration

2016-08-31 Thread Doug Doole
Hi all. I’m new to the PostgreSQL code and the mailing list, but I’ve had a lot 
of experience with using ICU in a different database product. So while I’m not 
up to speed on the code yet, I can offer some insights on using ICU.

> On Aug 30, 2016, at 9:12 PM, Peter Eisentraut 
> <peter.eisentr...@2ndquadrant.com> wrote:
>> How stable are the UCU locales? Most importantly, does ICU offer any
>> way to "pin" a locale version, so we can say "we want de_DE as it was
>> in ICU 4.6" and get consistent behaviour when the user sets up a
>> replica on some other system with ICU 4.8? Even if the German
>> government has changed its mind (again) about some details of the
>> language and 4.8 knows about the changes but 4.4 doesn’t?

ICU explicitly does not provide stability in their locales and collations. We 
pushed them hard to provide this, but between changes to the CLDR data and 
changes to the ICU code it just wasn’t feasible for them to provide version to 
version stability.

What they do offer is a compile option when building ICU to version all their 
APIs. So instead of calling icu_foo() you’d call icu_foo46(). (Or something 
like this - it’s been a few years since I actually worked with the ICU code.) 
This ultimately allows you to load multiple versions of the ICU library into a 
single program and provide stability by calling the appropriate version of the 
library. (Unfortunately, the OS - at least my Linux box - only provides the 
generic version of ICU and not the version annotated APIs, which means a 
separate compile of ICU is needed.)

The catch with this is that it means you likely want to expose the version 
information. In another note it was suggested to use something like fr_FR%icu. 
If you want to pin it to a specific version of ICU, you’ll likely need 
something like fr_FR%icu46. (There’s nothing wrong with supporting fr_FR%icu to 
give users an easy way of saying “give me the latest and greatest”, but you’d 
probably want to harden it to a specific ICU version internally.)

> I forgot to mention this, but the patch adds a collversion column that
> stores the collation version (provided by ICU).  And then when you
> upgrade ICU to something incompatible you get
> 
> +   if (numversion != collform->collversion)
> +   ereport(WARNING,
> +   (errmsg("ICU collator version mismatch"),
> +errdetail("The database was created using
> version 0x%08X, the library provides version 0x%08X.",
> +  (uint32) collform->collversion,
> (uint32) numversion),
> +errhint("Rebuild affected indexes, or build
> PostgreSQL with the right version of ICU.")));
> 
> So you still need to manage this carefully, but at least you have a
> chance to learn about it.

Indexes are the obvious place where collation comes into play, and are 
relatively easy to address. But consider all the places where string 
comparisons can be done. For example, check constraints and referential 
constraints can depend on string comparisons. If the collation rules change 
because of a new version of ICU, the database can become inconsistent and will 
need a lot more work than an index rebuild.

> Suggestions for refining this are welcome.
> 
> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Doug Doole
Salesforce



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers