Re: Partial aggregates pushdown

2023-08-01 Thread Finnerty, Jim
When it is valid to filter based on a HAVING clause predicate, it should 
already have been converted into a WHERE clause predicate, except in the 
special case of an LIMIT TO .k .. ORDER BY case where the HAVING clause 
predicate can be determined approximately after having found k fully qualified 
tuples and then that predicate is successively tightened as more qualified 
records are found.

*that*, by the way, is a very powerful optimization.

/Jim F

On 7/20/23, 6:24 AM, "Alexander Pyhalov" mailto:a.pyha...@postgrespro.ru>> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






fujii.y...@df.mitsubishielectric.co.jp 
 писал 2023-07-19 03:43:
> Hi Mr.Pyhalov, hackers.


> 3)
> I modified the patch to safely do a partial aggregate pushdown for
> queries which contain having clauses.
>


Hi.
Sorry, but I don't see how it could work.
For example, the attached test returns wrong result:


CREATE FUNCTION f() RETURNS INT AS $$
begin
return 10;
end
$$ LANGUAGE PLPGSQL;


SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY
1;
b | sum
+-
0 | 0
10 | 0
20 | 0
30 | 0
40 | 0
+(5 rows)


In fact the above query should have returned 0 rows, as


SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
b | sum
+--
0 | 600
1 | 660
2 | 720
3 | 780
4 | 840
5 | 900
6 | 960
7 | 1020
8 | 1080
9 | 1140
10 | 600
11 | 660
12 | 720

shows no such rows.


Or, on the same data


SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY
1;


You'll get 0 rows.


But
SELECT b, sum(a) FROM pagg_tab GROUP BY b;
b | sum
+--
42 | 720
29 | 1140
4 | 840
34 | 840
41 | 660
0 | 600
40 | 600
gives.


The issue is that you can't calculate "partial" having. You should
compare full aggregate in filter, but it's not possible on the level of
one partition.
And you have this in plans


Finalize GroupAggregate
Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
Group Key: pagg_tab.b
Filter: (sum(pagg_tab.a) < 700)
-> Sort
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
Sort Key: pagg_tab.b
-> Append
-> Foreign Scan
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)),
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
Filter: ((PARTIAL sum(pagg_tab.a)) < 700) 
<--- here we can't compare anything yet, sum is incomplete.
Relations: Aggregate on (public.fpagg_tab_p1
pagg_tab)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1
-> Foreign Scan
Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)),
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL
sum(pagg_tab_1.a))
Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p2
pagg_tab_1)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1
-> Foreign Scan
Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)),
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL
sum(pagg_tab_2.a))
Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p3
pagg_tab_2)
Remote SQL: SELECT b, avg_p_int4(a), max(a),
count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1


In foreign_grouping_ok()
6586 if (IsA(expr, Aggref))
6587 {
6588 if (partial)
6589 {
6590 mark_partial_aggref((Aggref
*) expr, AGGSPLIT_INITIAL_SERIAL);
6591 continue;
6592 }
6593 else if (!is_foreign_expr(root,
grouped_rel, expr))
6594 return false;
6595
6596 tlist = add_to_flat_tlist(tlist,
list_make1(expr));
6597 }


at least you shouldn't do anything with expr, if is_foreign_expr()
returned false. If we restrict pushing down queries with havingQuals,
I'm not quite sure how Aggref can appear in local_conds.


As for changes in planner.c (setGroupClausePartial()) I have several
questions.


1) Why don't we add non_group_exprs to pathtarget->exprs when
partial_target->exprs is not set?


2) We replace extra->partial_target->exprs with partial_target->exprs
after processing. Why are we sure that after this tleSortGroupRef is
correct?


--
Best regards,
Alexander Pyhalov,
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-08-01 Thread Finnerty, Jim
Peter, I'm very glad to hear that you're researching this!

Will this include skip-scan optimizations for OR or IN predicates, or when the 
number of distinct values in a leading non-constant index column(s) is 
sufficiently small? e.g. suppose there is an ORDER BY b, and WHERE clause 
predicates (a = 1 AND b = 5) OR (c > 12 AND b BETWEEN 100 AND 200).  Then a 
single index scan on an index with leading column b could visit b = 5, and then 
the range b from 100:200, and deliver the rows in the order requested.  Or if 
the predicate is (a = 1 AND b = 5) OR (c LIKE 'XYZ' AND b < 12), then you can 
scan just b < 12.  Or if the index is defined on (a, b) and you know that b = 
100, and that there are only 4 distinct values of column a, then you could skip 
each distinct value of a where b = 100, and so on.  

If you have an ORDER BY clause and a lower and upper bound on the first column 
of the ORDER BY list, you have a potential to reduce search effort versus a 
full index scan, even when that upper and lower bound needs to be derived from 
a complex predicate.

Of course, if you have an IN list you can either skip to the distinct values 
listed or scan the entire index, depending on estimated cost.

/Jim F

On 8/1/23, 3:43 PM, "Peter Geoghegan" mailto:p...@bowt.ie>> 
wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






On Mon, Jul 31, 2023 at 9:38 AM Alena Rybakina mailto:lena.riback...@yandex.ru>> wrote:
> I noticed only one thing there: when we have unsorted array values in
> SOAP, the query takes longer than
> when it has a sorted array. I'll double-check it just in case and write
> about the results later.


I would expect the B-Tree preprocessing by _bt_preprocess_array_keys()
to be very slightly faster when the query is written with presorted,
duplicate-free constants. Sorting is faster when you don't really have
to sort. However, I would not expect the effect to be significant
enough to matter, except perhaps in very extreme cases.
Although...some of the cases you care about are very extreme cases.


> I am also testing some experience with multi-column indexes using SAOPs.


Have you thought about a similar transformation for when the row
constructor syntax happens to have been used?


Consider a query like the following, against a table with a composite
index on (a, b):


select * from multi_test where ( a, b ) in (( 1, 1 ), ( 2, 1 ));


This query will get a BitmapOr based plan that's similar to the plans
that OR-based queries affected by your transformation patch get today,
on HEAD. However, this equivalent spelling has the potential to be
significantly faster:


select * from multi_test where a = any('{1,2}') and b = 1;


(Of course, this is more likely to be true with my nbtree SAOP patch in place.)


Note that we currently won't use RowCompareExpr in many simple cases
where the row constructor syntax has been used. For example, a query
like this:


select * from multi_test where ( a, b ) = (( 2, 1 ));


This case already involves a transformation that is roughly comparable
to the one you're working on now. We'll remove the RowCompareExpr
during parsing. It'll be as if my example row constructor equality
query was written this way instead:


select * from multi_test where a = 2 and b = 1;


This can be surprisingly important, when combined with other things,
in more realistic examples.


The nbtree code has special knowledge of RowCompareExpr that makes the
rules for comparing index tuples different to those from other kinds
of index scans. However, due to the RowCompareExpr transformation
process I just described, we don't need to rely on that specialized
nbtree code when the row constructor syntax is used with a simple
equality clause -- which is what makes the normalization process have
real value. If the nbtree code cannot see RowCompareExpr index quals
then it cannot have this problem in the first place. In general it is
useful to "normalize to conjunctive normal form" when it might allow
scan key preprocessing in the nbtree code to come up with a much
faster approach to executing the scan.


It's easier to understand what I mean by showing a simple example. The
nbtree preprocessing code is smart enough to recognize that the
following query doesn't really need to do any work, due to having
quals that it recognizes as contradictory (it can set so->qual_okay to
false for unsatisfiable quals):


select * from multi_test where ( a, b ) = (( 2, 1 )) and a = -1;


However, it is not smart enough to perform the same trick if we change
one small detail with the query:


select * from multi_test where ( a, b ) >= (( 2, 1 )) and a = -1;


Ideally, the optimizer would canonicalize/normalize everything in a
way that made all of the nbtree preprocessing optimizations work just
as well, without introducing any new special cases. Obviously, there
is no reason why we 

Re: parse partition strategy string in gram.y

2022-10-25 Thread Finnerty, Jim
Or if you know the frequencies of the highly frequent values of the 
partitioning key at the time the partition bounds are defined, you could define 
hash ranges that contain approximately the same number of rows in each 
partition.  A parallel sequential scan of all partitions would then perform 
better because data skew is minimized. 



Re: parse partition strategy string in gram.y

2022-10-25 Thread Finnerty, Jim
It will often happen that some hash keys are more frequently referenced than 
others.  Consider a scenario where customer_id is the hash key, and one 
customer is very large in terms of their activity, like IBM, and other keys 
have much less activity.  This asymmetry creates a noisy neighbor problem.  
Some partitions may have more than one noisy neighbor, and in general it would 
be more flexible to be able to divide the work evenly in terms of activity 
instead of evenly with respect to the encoding of the keys.

On 10/24/22, 8:50 PM, "Tom Lane"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Alvaro Herrera  writes:
> On 2022-Oct-24, Finnerty, Jim wrote:
>> The advantage of hash partition bounds is that they are not
>> domain-specific, as they are for ordinary RANGE partitions, but they
>> are more flexible than MODULUS/REMAINDER partition bounds.

I'm more than a bit skeptical of that claim.  Under what
circumstances (other than a really awful hash function,
perhaps) would it make sense to not use equi-sized hash
partitions?  



regards, tom lane



Re: parse partition strategy string in gram.y

2022-10-24 Thread Finnerty, Jim
Is there a reason why HASH partitioning does not currently support range 
partition bounds, where the values in the partition bounds would refer to the 
hashed value?

The advantage of hash  partition bounds is that they are not domain-specific, 
as they are for ordinary RANGE partitions, but they are more flexible than 
MODULUS/REMAINDER partition bounds.

On 10/21/22, 9:48 AM, "Japin Li"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Fri, 21 Oct 2022 at 20:34, Justin Pryzby  wrote:
> On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
>> Is there any way to get the regression tests diffs from Cirrus CI?
>> I did not find the diffs in [1].
>>
>> [1] https://cirrus-ci.com/build/4721735111540736
>
> They're called "main".
> I'm planning on submitting a patch to rename it to "regress", someday.
> See also: 
https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

Oh, thank you very much!  I find it in testrun/build/testrun/main/regress 
[1].

[1] 
https://api.cirrus-ci.com/v1/artifact/task/6215926717612032/testrun/build/testrun/main/regress/regression.diffs

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.





Re: Making Vars outer-join aware

2022-07-01 Thread Finnerty, Jim
Tom, two quick questions before attempting to read the patch:

Given that views are represented in a parsed representation, does anything 
need to happen to the Vars inside a view when that view is outer-joined to?  

If an outer join is converted to an inner join, must this information get 
propagated to all the affected Vars, potentially across query block levels?




Re: Retrieving unused tuple attributes in ExecScan

2022-06-27 Thread Finnerty, Jim
Re: So I'm actually using the columns during merge join, basically I'm building 
a bloom filter on the outer relation and filtering out data on the inner 
relation of the join. I'm building the filter on the join keys

We had a whole implementation for Bloom filtering for hash inner join, complete 
with costing and pushdown of the Bloom filter from the build side to the 
execution tree on the probe side (i.e. building a Bloom filter on the inner 
side of the join at the conclusion of the build phase of the hash join, then 
pushing it down as a semi-join filter to the probe side of the join, where it 
could potentially be applied to multiple scans).  After a large change to that 
same area of the code by the community it got commented out and has been in 
that state ever since.  It's a good example of the sort of change that really 
ought to be made with the community because there's too much merge burden 
otherwise.

It was a pretty effective optimization in some cases, though.  Most commercial 
systems have an optimization like this, sometimes with special optimizations 
when the number of distinct join keys is very small. If there is interest in 
reviving this functionality, we could probably extract some patches and work 
with the community to try to get it running again.  

   /Jim




Re: Collation version tracking for macOS

2022-06-09 Thread Finnerty, Jim
Specifying the library name before the language-country code with a new 
separator  (":") as you suggested below has some benefits. Did you consider 
making the collation version just another collation attribute, such as 
colStrength, colCaseLevel, etc.?  
For example, an alternate syntax might be:  

create collation icu63."en-US-x-icu" (provider = icu, locale = 
'en-US@colVersion=63');

Was the concern that ICU might redefine a new collation property with the same 
name in a different and incompatible way (we might work with the ICU developers 
to agree on what it should be), or that a version is just not the same kind of 
collation property as the other collation properties?

(in the example above, I'm assuming that for provider = icu, we could translate 
'63' into  'libicui18n.so.63' automatically.)


On 6/8/22, 6:22 AM, "Thomas Munro"  wrote:


postgres=# create collation icu63."en-US-x-icu" (provider = icu,
locale = 'libicui18n.so.63:en-US');
CREATE COLLATION




Re: ICU for global collation

2022-03-15 Thread Finnerty, Jim
Can we get some more consistent terminology around the term "locale"?

In ICU, the "locale" is just the first part of what we can pass to the "locale" 
parameter in CREATE COLLATION - the part before the optional '@' delimiter.  
The ICU locale does not include the secondary or tertiary properties, so it is 
usually just the country and the language, e.g. en_US (or en-US), but it can 
also be something like es_TRADITIONAL for traditional Spanish.  

I think it would be an improvement in clarity if we consistently use the term 
'locale' to mean the same thing that ICU means by that term, and not to have 
the thing that we call the "locale" also include collation modifiers, or to 
imply that a locale is the same thing as a collation.

/Jim






Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-02-17 Thread Finnerty, Jim


So I think knowing what bad it is to have this feature is the key point to 
discussion now.


> While I've only read your description of the patch not the patch itself,

This comment applies to me also.

Is the join selectivity properly calculated in all cases, e.g. in the n:m join 
case in particular, or in general when you’re not joining to a unique key? 
(this would be the usual situation here, since it adds a range qual to a join 
qual)

>> Furthermore, there are some cases involving parameterized paths where
>> enforcing the inequality multiple times is definitely bad


  *   This has been done by committing 4.

What remaining cases are there where the qual is evaluated redundantly?



  *   Anyone still have interest in this?  Or is a better solution really 
possible?
Or is the current method  too bad to rescue?

As you’ve shown, this can potentially be very important, though I don’t think 
you’ll often see equijoins with an additional range restriction on the join 
keys.  When it happens, though, it could be especially important for joins to 
partitioned tables with many remote fdw partitions when the join can’t be 
pushed down to the remote server.


Re: ICU for global collation

2022-01-17 Thread Finnerty, Jim
On 10.01.22 12:49, Daniel Verite wrote:

> I think some users would want their db-wide ICU collation to be
> case/accent-insensitive. 
...
> IIRC, that was the context for some questions where people were
> enquiring about db-wide ICU collations.

+1.  There is the DEFAULT_COLLATION_OID, which is the cluster-level default 
collation, a.k.a. the "global collation", as distinct from the "db-wide" 
database-level default collation, which controls the default type of the 
collatable types within that database.

> With the current patch, it's not possible, AFAICS, because the user
> can't tell that the collation is non-deterministic. Presumably this
> would require another option to CREATE DATABASE and another
> column to store that bit of information.

On 1/11/22, 6:24 AM, "Peter Eisentraut"  
wrote:
   
>  Adding this would be easy, but since pattern matching currently does not
>  support nondeterministic collations, if you make a global collation
>  nondeterministic, a lot of system views, psql, pg_dump queries etc.
>  would break, so it's not practical.  I view this is an orthogonal
>  project.  Once we can support this without breaking system views etc.,
>  then it's easy to enable with a new column in pg_database.

So this patch only enables the default cluster collation 
(DEFAULT_COLLATION_OID) to be a deterministic ICU collation, but doesn't extend 
the metadata in a way that would enable database collations to be ICU 
collations?

Waiting for the pattern matching problem to be solved in general before 
creating the metadata to support ICU collations everywhere will make it more 
difficult for members of the community to help solve the pattern matching 
problem.  

What additional metadata changes would be required to enable an ICU collation 
to be specified at either the cluster-level or the database-level, even if new 
checks need to be added to disallow a nondeterministic collation to be 
specified at the cluster level for now?




Re: ICU for global collation

2022-01-17 Thread Finnerty, Jim
Re: 
>> After this patch goes in, the big next thing would be to support
>> nondeterministic collations for LIKE, ILIKE and pattern matching operators in
>> general.  Is anyone interested in working on that?

> As far as I know you're the last person that seemed to be working on that 
> topic
> back in March :)

I have a solution for LIKE and ILIKE for case-insensitive, accent-sensitive 
ICU collations and the UTF8 server encoding, but didn't attempt to address the 
general case until ICU collations were supported at the database and cluster 
levels.  I may have another look at that after this patch goes in.




Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-12 Thread Finnerty, Jim
Re: patch that uses XID_FMT everywhere ... to make the main patch much smaller

That's exactly what my previous patch did, plus the patch to support 64-bit 
GUCs.

Maxim, maybe it's still a good idea to isolate those two patches and submit 
them separately first, to reduce the size of the rest of the patch?

On 1/12/22, 8:28 AM, "Simon Riggs"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Sat, 8 Jan 2022 at 08:21, Maxim Orlov  wrote:
>>
>>Perhaps we can merge some of the code cleanup that it contained, such 
as using XID_FMT everywhere and creating a type for the kind of page returned 
by TransactionIdToPage() to make the code cleaner.
>
>
> Agree, I think this is a good idea.

Looks to me like the best next actions would be:

1. Submit a patch that uses XID_FMT everywhere, as a cosmetic change.
This looks like it will reduce the main patch size considerably and
make it much less scary. That can be cleaned up and committed while we
discuss the main approach.

2. Write up the approach in a detailed README, so people can
understand the proposal and assess if there are problems. A few short
notes and a link back to old conversations isn't enough to allow wide
review and give confidence on such a major patch.

--
Simon Riggshttp://www.EnterpriseDB.com/





Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Finnerty, Jim
Re:The "prepare" approach was the first tried.
https://github.com/postgrespro/pg_pageprep
But it appears to be very difficult and unreliable.  After investing
many months into pg_pageprep, "double xmax" approach appears to be
very fast to implement and reliable.

I'd still like a plan to retire the "double xmax" representation eventually.  
Previously I suggested that this could be done as a post-process, before 
upgrade is complete, but that could potentially make upgrade very slow. 

Another way to retire the "double xmax" representation eventually could be to 
disallow "double xmax" pages in subsequent major version upgrades (e.g. to 
PG16, if "double xmax" pages are introduced in PG15).  This gives the luxury of 
time after a fast upgrade to convert all pages to contain the epochs, while 
still providing a path to more maintainable code in the future.



Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Finnerty, Jim
Re: clog page numbers, as returned by TransactionIdToPage

-   int pageno = TransactionIdToPage(xid);  /* get 
page of parent */
+   int64   pageno = TransactionIdToPage(xid);  /* get page of 
parent */

...

-   int pageno = TransactionIdToPage(subxids[0]);
+   int64   pageno = TransactionIdToPage(subxids[0]);
int offset = 0;
int i = 0;
 
...

-   int nextpageno;
+   int64   nextpageno;

Etc.

In all those places where you are replacing int with int64 for the kind of 
values returned by TransactionIdToPage(), would you mind replacing the int64's 
with a type name, such as ClogPageNumber, for improved code maintainability?




Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-06 Thread Finnerty, Jim
Re:Most software has a one-stage upgrade model. What you propose would
have us install 2 things, with a step in-between, which makes it
harder to manage.

The intended benefit would be that the code doesn't need to handle the 
possibility of 2 different XID representations for the indefinite future.  

I agree that VACUUM would be the preferred tool to make room for the special 
data area so that there is no need to install a separate tool, though, whether 
this work happens before or after the upgrade. 

Re: 1. Upgrade, with important aspect not-enabled-yet, but everything else 
working - all required software is delivered in one shot, with fast upgrade

Let's clarify what happens during upgrade.  What format are the pages in 
immediately after the upgrade? 

2. As each table is VACUUMed, we confirm/clean/groom data blocks so
each table is individually confirmed as being ready. The pace that
this happens at is under user control.

What are VACUUM's new responsibilities in this phase?  VACUUM needs a new task 
that confirms when there exists no heap page for a table that is not ready.

If upgrade put all the pages into either double-xmax or double-epoch 
representation, then VACUUM's responsibility could be to split the double-xmax 
pages into the double-epoch representation and verify when there exists no 
double-xmax pages.

3. When all tables have been prepared, then restart to allow xid64 format 
usage

Let's also clarify what happens at restart time.

If we were to do the upgrade before preparing space in advance, is there a way 
to ever remove the code that knows about the double-xmax XID representation?





Re: ICU for global collation

2022-01-06 Thread Finnerty, Jim

I didn't notice anything version-specific about the patch.  Would any 
modifications be needed to backport it to pg13 and pg14?

After this patch goes in, the big next thing would be to support 
nondeterministic collations for LIKE, ILIKE and pattern matching operators in 
general.  Is anyone interested in working on that?

On 1/5/22, 10:36 PM, "Julien Rouhaud"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Tue, Jan 04, 2022 at 05:03:10PM +0100, Peter Eisentraut wrote:
> On 04.01.22 03:21, Julien Rouhaud wrote:
>
> > > - if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
> > > - mylocale = pg_newlocale_from_collation(collid);
> > > + if (!lc_collate_is_c(collid))
> > > + {
> > > + if (collid != DEFAULT_COLLATION_OID)
> > > + mylocale = pg_newlocale_from_collation(collid);
> > > + else if (default_locale.provider == COLLPROVIDER_ICU)
> > > + mylocale = _locale;
> > > + }
> >
> > There are really a lot of places with this new code.  Maybe it could be 
some
> > new function/macro to wrap that for the normal case (e.g. not 
formatting.c)?
>
> Right, we could just put this into pg_newlocale_from_collation(), but the
> comment there says
>
>  * In fact, they shouldn't call this function at all when they are dealing
>  * with the default locale.  That can save quite a bit in hotspots.
>
> I don't know how to assess that.
>
> We could pack this into a macro or inline function if we are concerned 
about
> this.

Yes that was my idea, just have a new function (inline function or a macro
then since pg_newlocale_from_collation() clearly warns about performance
concerns) that have the whole
is-not-c-collation-and-is-default-collation-or-icu-collation logic and calls
pg_newlocale_from_collation() only when needed.





Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-06 Thread Finnerty, Jim
(Maxim) Re: -- If after upgrade page has no free space for special data, tuples 
are
   converted to "double xmax" format: xmin became virtual
   FrozenTransactionId, xmax occupies the whole 64bit.
   Page converted to new format when vacuum frees enough space.

A better way would be to prepare the database for conversion to the 64-bit XID 
format before the upgrade so that it ensures that every page has enough room 
for the two new epochs (E bits).

1. Enforce the rule that no INSERT or UPDATE to an existing page will leave 
less than E bits of free space on a heap page

2. Run an online and restartable task, analogous to pg_repack, that rewrites 
and splits any page that has less than E bits of free space. This needs to be 
run on all non-temp tables in all schemas in all databases.  DDL operations are 
not allowed on a target table while this operation runs, which is enforced by 
taking an ACCESS SHARE lock on each table while the process is running. To 
mitigate the effects of this restriction, the restartable task can be 
restricted to run only in certain hours.  This could be implemented as a 
background maintenance task that runs for X hours as of a certain time of day 
and then kicks itself off again in 24-X hours, logging its progress.

When this task completes, the database is ready for upgrade to 64-bit XIDs, and 
there is no possibility that any page has insufficient free space for the 
special data.

Would you agree that this approach would completely eliminate the need for a 
"double xmax" representation? 

/Jim




Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-04 Thread Finnerty, Jim


On 1/4/22, 2:35 PM, "Stephen Frost"  wrote:

>>
>>Not saying that I've got any idea how to fix that case offhand, and we
>>don't really support such a thing today as the server would just stop
>>instead, ...
>>   Perhaps that's a
>>   worthwhile tradeoff for being able to generally avoid having to vacuum
>>   and deal with transaction wrap-around, but I have to wonder if there
>>   might be a better answer.  
>>

For the target use cases that PostgreSQL is designed for, it's a very 
worthwhile tradeoff in my opinion.  Such long-running transactions need to be 
killed.

Re: -- If after upgrade page has no free space for special data, tuples are
   converted to "double xmax" format: xmin became virtual
   FrozenTransactionId, xmax occupies the whole 64bit.
   Page converted to new format when vacuum frees enough space.

I'm concerned about the maintainability impact of having 2 new on-disk page 
formats.  It's already complex enough with XIDs and multixact-XIDs.

If the lack of space for the two epochs in the special data area is a problem 
only in an upgrade scenario, why not resolve the problem before completing the 
upgrade process like a kind of post-process pg_repack operation that converts 
all "double xmax" pages to the "double-epoch" page format?  i.e. maybe the 
"double xmax" representation is needed as an intermediate representation during 
upgrade, but after upgrade completes successfully there are no pages with the 
"double-xmax" representation.  This would eliminate a whole class of coding 
errors and would make the code dealing with 64-bit XIDs simpler and more 
maintainable.

/Jim



Re:disfavoring unparameterized nested loops

2021-06-22 Thread Finnerty, Jim
> But making everything slower will be a hard sell, because vast majority of
> workloads already running on Postgres don't have this issue at all, so
> for them it's not worth the expense. Following the insurance analogy,
> selling tornado insurance in Europe is mostly pointless.
>

Agree. I've been surprised about NOT hearing complaints from PostgreSQL
customers about a particular "bad" plan choice that was common in other
rdbms products where large, complex queries were the norm.  The situation
occurs late in a plan with many joins where a hash join can be used and  
where either side is estimated to fit in memory.  On one side is a base table 
with cardinality that we have statistics for, while the other side has an
estimated cardinality that is the result of many estimates each of which
has error that can compound, and that in some cases amounts to a wild guess.
(e.g. what is the selectivity of SUM(x) < 12 ?).  If the planner's point 
estimate 
of cardinality is such that both sides could fit in memory, then a bad plan can
easily be made.  As Peter said, [ most ] humans have no trouble dealing with 
these kind of situations. They take the risk of being wrong into account.

So in our world, the useful numbers are 0, 1, measured N, and estimated N,
but we don't distinguish between measured N and estimated N.

But that doesn't mean that OLTP customers would be willing to accept
slightly suboptimal plans to mitigate a risk they don't experience.

> Insurance is also about personal preference / risk tolerance. Maybe I'm
> fine with accepting risk that my house burns down, or whatever ...

Right, and that's why the problem mentioned above is still out there
annoying customers who have complex plans.  To them it looks like
an obviously bad plan choice.

Something that might help is to have the planner cost be a structure instead
of a number.  Costs of plans that are deemed "risky" are accumulated 
separately from plans that make no risky choices, and for a given set
of join items you keep the minimum cost plan of both types. It may happen that 
all
plans eventually make a risky choice, in which case you take the plan with the 
minimum
cost, but if there exists a plan with no risky choices, then the minimum cost
plan with no risky choices is chosen, with a GUC that enables a customer to 
ignore
risk when making this choice.  This is not in the spirit of the hoped for 
simple heuristic,
and it would be heuristic in its classification of plans that are risky, but in 
the NLJ case 
the cost of an unparameterized NLJ could be deemed risky if the cardinality of 
the inner 
relation is not 0, 1, or measured N.






Re: Character expansion with ICU collations

2021-06-21 Thread Finnerty, Jim
I have a proposal for how to support tailoring rules in ICU collations: The 
ucol_openRules() function is an alternative to the ucol_open() function that 
PostgreSQL calls today, but it takes the collation strength as one if its 
parameters so the locale string would need to be parsed before creating the 
collator.  After the collator is created using either ucol_openRules or 
ucol_open, the ucol_setAttribute() function may be used to set individual 
attributes from keyword=value pairs in the locale string as it does now, except 
that the strength probably can't be changed after opening the collator with 
ucol_openRules.  So the logic in pg_locale.c would need to be reorganized a 
little bit, but that sounds straightforward.

One simple solution would be to have the tailoring rules be specified as a new 
keyword=value pair, such as colTailoringRules=.  Since the 
 may contain single quote characters or PostgreSQL escape 
characters, any single quote characters or escapes would need to be escaped 
using PostgreSQL escape rules.  If colTailoringRules is present, colStrength 
would also be known prior to opening the collator, or would default to 
tertiary, and we would keep a local flag indicating that we should not process 
the colStrength keyword again, if specified. 

Representing the TailoringRules as just another keyword=value in the locale 
string means that we don't need any change to the catalog to store it.  It's 
just part of the locale specification.  I think we wouldn't even need to bump 
the catversion.

Are there any tailoring rules, such as expansions and contractions, that we 
should disallow?  I realize that we don't handle nondeterministic collations in 
LIKE or regular expression operations as of PG14, but given expr LIKE 'a%' on a 
database with a UTF-8 encoding and arbitrary tailoring rules that include 
expansions and contractions, is it still guaranteed that expr must sort BETWEEN 
'a' AND ('a' || E'/u') ?



Re: Character expansion with ICU collations

2021-06-12 Thread Finnerty, Jim
Re: 
>> Can a CI collation be ordered upper case first, or is this a limitation 
of ICU?

> I don't know the authoritative answer to that, but to me it doesn't make
> sense, since the effect of a case-insensitive collation is to throw away
> the third-level weights, so there is nothing left for "upper case first"
> to operate on.

It wouldn't make sense for the ICU sort key of a CI collation itself because 
the sort keys need to be binary equal, but what the collation of interest does 
is equivalent to adding a secondary "C"-collated expression to the ORDER BY 
clause.  For example:

SELECT ... ORDER BY expr COLLATE ci_as;

Is ordered as if the query had been written:

SELECT ... ORDER BY expr COLLATE ci_as, expr COLLATE "C";

Re: 
> tailoring rules
>> yes

It looks like the relevant API call is ucol_openRules(), 
Interface documented here: 
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/ucol_8h.html
example usage from C here: 
https://android.googlesource.com/platform/external/icu/+/db20b09/source/test/cintltst/citertst.c

for example:

/* Test with an expanding character sequence */
u_uastrcpy(rule, " < b < c/abd < d");
c2 = ucol_openRules(rule, u_strlen(rule), UCOL_OFF, UCOL_DEFAULT_STRENGTH, 
NULL, );

and a reordering rule test:

u_uastrcpy(rule, " < AB");
coll = ucol_openRules(rule, u_strlen(rule), UCOL_OFF, 
UCOL_DEFAULT_STRENGTH, NULL, );

that looks encouraging.  It returns a UCollator object, like ucol_open(const 
char *localeString, ...), so it's an alternative to ucol_open().  One of the 
parameters is the equivalent of colStrength, so then the question would be, how 
are the other keyword/value pairs like colCaseFirst, colAlternate, etc. 
specified via the rules argument?  In the same way with the exception of 
colStrength?

e.g. is "colAlternate=shifted; < AB" a valid rules string?

The ICU documentation says simply:

" rules A string describing the collation rules. For the syntax of the rules 
please see users guide."

Transform rules are documented here: 
http://userguide.icu-project.org/transforms/general/rules

But there are no examples of using the keyword/value pairs that may appear in a 
locale string with the transform rules, and there's no locale argument on 
ucol_openRules.  How can the keyword/value pairs that may appear in the locale 
string be applied in combination with tailoring rules (with the exception of 
colStrength)?







Re: Character expansion with ICU collations

2021-06-11 Thread Finnerty, Jim
>>
You can have these queries return both rows if you use an
accent-ignoring collation, like this example in the documentation:

CREATE COLLATION ignore_accents (provider = icu, locale =
'und-u-ks-level1-kc-true', deterministic = false);
<<

Indeed.  Is the dependency between the character expansion capability and 
accent-insensitive collations documented anywhere?

Another unexpected dependency appears to be @colCaseFirst=upper.  If specified 
in combination with colStrength=secondary, it appears that the upper/lower case 
ordering is random within a group of characters that are secondary equal, e.g. 
'A' < 'a', but 'b' < 'B', 'c' < 'C', ...  , but then 'L' < 'l'.  It is not even 
consistently ordered with respect to case.  If I make it a nondeterministic 
CS_AI collation, then it sorts upper before lower consistently.  The rule seems 
to be that you can't sort by case within a group that is case-insensitive. 

Can a CI collation be ordered upper case first, or is this a limitation of ICU?

For example, this is part of the sort order that I'd like to achieve with ICU, 
with the code point in column 1 and dense_rank() shown in the rightmost column 
indicating that 'b' = 'B', for example:

66  B   B   138 151
98  b   b   138 151   <- so within a group that is CI_AS equal, 
the sort order needs to be upper case first
67  C   C   139 152
99  c   c   139 152
199 Ç   Ç   140 153
231 ç   ç   140 153
68  D   D   141 154
100 d   d   141 154
208 Ð   Ð   142 199
240 ð   ð   142 199
69  E   E   143 155
101 e   e   143 155

Can this sort order be achieved with ICU?

More generally, is there any interest in leveraging the full power of ICU 
tailoring rules to get whatever order someone may need, subject to the 
limitations of ICU itself?  what would be required to extend CREATE COLLATION 
to accept an optional sequence of tailoring rules that we would store in the 
pg_collation catalog and apply along with the modifiers in the locale string?

/Jim





Re: PostgreSQL <-> Babelfish integration

2021-02-15 Thread Finnerty, Jim
We are applying the Babelfish commits to the REL_12_STABLE branch now, and the 
plan is to merge them into the REL_13_STABLE and master branch ASAP after that. 
 There should be a publicly downloadable git repository before very long.

On 2/12/21, 2:35 PM, "Peter Geoghegan"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Fri, Feb 12, 2021 at 11:13 AM Matthias van de Meent
 wrote:
> I agree. I believe that Babelfish's efforts can be compared with the
> zedstore and zheap efforts: they require work in core before they can
> be integrated or added as an extension that could replace the normal
> heap tableam, and while core is being prepared we can discover what
> can and cannot be prepared in core for this new feature.

I see what you mean, but even that seems generous to me, since, as I
said, we don't have any Babelfish code to evaluate today. Whereas
Zedstore and zheap can actually be downloaded and tested.

--
Peter Geoghegan





Re: Read Uncommitted

2019-12-18 Thread Finnerty, Jim
Many will want to use it to do aggregation, e.g. a much more efficient 
COUNT(*), because they want performance and don't care very much about 
transaction consistency.  E.g. they want to compute SUM(sales) by salesperson, 
region for the past 5 years, and don't care very much if some concurrent 
transaction aborted in the middle of computing this result.

On 12/18/19, 2:35 PM, "Stephen Frost"  wrote:

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Dec 18, 2019 at 1:06 PM Simon Riggs  wrote:
> > Just consider this part of the recovery toolkit.
> 
> I agree that it would be useful to have a recovery toolkit for reading
> uncommitted data, but I think a lot more thought needs to be given to
> how such a thing should be designed. If you just add something called
> READ UNCOMMITTED, people are going to expect it to have *way* saner
> semantics than this will. They'll use it routinely, not just as a
> last-ditch mechanism to recover otherwise-lost data. And I'm
> reasonably confident that will not work out well.

+1.

Thanks,

Stephen




Re: Corruption with duplicate primary key

2019-12-09 Thread Finnerty, Jim
Re: " It appears that the second row was in place originally, then got updated 
by a trigger (and even deleted later on, although it doesn't appear that the 
delete transaction got committed), and then the first row was inserted within 
the same transaction that updated the second row."

If you have BEFORE triggers, and a BEFORE trigger signaled failure with RETURN 
NULL, then this is one known (and documented) issue that I think could cause 
the behavior you're reporting:


https://www.postgresql-archive.org/BEFORE-triggers-that-return-NULL-can-circumvent-referential-integrity-tt6056390.html#none

It's hard to say if this is the cause or not, but if you have any BEFORE 
triggers that RETURN NULL, you might want to review the documentation very 
carefully.

thanks,

/Jim F

On 12/5/19, 6:45 PM, "Tomas Vondra"  wrote:

On Thu, Dec 05, 2019 at 09:14:12PM +, Alex Adriaanse wrote:
>We have a Postgres 10 database that we recently upgraded to Postgres 12 
using pg_upgrade. We recently discovered that there are rows in one of the 
tables that have duplicate primary keys:
>
>record_loader=# \d loader.sync
>  Table "loader.sync"
>  Column   |   Type   | Collation | Nullable | 
Default

>---+--+---+--+-
> source| text |   | not null |
> natural_key   | text |   | not null |
> payload   | jsonb|   |  |
> dispatched| timestamp with time zone |   | not null | 
now()
> initial_load_id   | text |   |  |
> deleted_load_id   | text |   |  |
> created_timestamp | timestamp with time zone |   |  | 
now()
> updated_timestamp | timestamp with time zone |   |  | 
now()
> deleted_timestamp | timestamp with time zone |   |  |
>Indexes:
>"sync_pkey" PRIMARY KEY, btree (source, natural_key)
>Publications:
>"debezium"
>
>This table is modified via triggers that fire off when a COPY command 
inserts many rows into another table.
>
>Here are two example duplicate rows:
>
># SELECT xmin, xmax, cmin, cmax, source, md5(natural_key) AS 
natural_key_hash, dispatched, created_timestamp, updated_timestamp, 
deleted_timestamp FROM loader.sync WHERE (source, natural_key) = ('ok_lease', 
'...') ORDER BY xmin::text::int, cmin::text::int;
>-[ RECORD 1 ]-+-
>xmin  | 116649
>xmax  | 0
>cmin  | 5304404
>cmax  | 5304404
>source| ok_lease
>natural_key_hash  | de3e9a567b90025c3399c4c63c823fe9
>dispatched| 2019-11-24 05:09:36.099686+00
>created_timestamp | 2019-11-24 05:09:36.099686+00
>updated_timestamp | 2019-11-24 05:09:36.099686+00
>deleted_timestamp |
>-[ RECORD 2 ]-+-
>xmin  | 116649
>xmax  | 118583
>cmin  | 5312208
>cmax  | 5312208
>source| ok_lease
>natural_key_hash  | de3e9a567b90025c3399c4c63c823fe9
>dispatched| 2019-11-10 05:09:24.214964+00
>created_timestamp | 2019-05-17 21:24:19.558219+00
>updated_timestamp | 2019-11-24 05:09:36.099686+00
>deleted_timestamp | 2019-11-24 05:09:36.099686+00
>
>It appears that the second row was in place originally, then got updated 
by a trigger (and even deleted later on, although it doesn't appear that the 
delete transaction got committed), and then the first row was inserted within 
the same transaction that updated the second row.
>
>Another example:
>-[ RECORD 1 ]-+-
>xmin  | 116649
>xmax  | 0
>cmin  | 5304403
>cmax  | 5304403
>source| ok_lease
>natural_key_hash  | 1c8031348701a32cb5fee26839d6b0b4
>dispatched| 2019-11-10 05:09:24.214964+00
>created_timestamp | 2019-05-31 06:00:33.765547+00
>updated_timestamp | 2019-11-24 05:09:36.099686+00
>deleted_timestamp | 2019-11-24 05:09:36.099686+00
>-[ RECORD 2 ]-+-
>xmin  | 116649
>xmax  | 0
>cmin  | 5304404
>cmax  | 5304404
>source| ok_lease
>natural_key_hash  | 1c8031348701a32cb5fee26839d6b0b4
>dispatched| 2019-11-24 05:09:36.099686+00
>created_timestamp | 2019-11-24 05:09:36.099686+00
>updated_timestamp | 2019-11-24 05:09:36.099686+00
>deleted_timestamp |
>
>Both examples have in common that the two duplicate rows were touched 
within the same transaction.
>

Re: Unwanted expression simplification in PG12b2

2019-09-23 Thread Finnerty, Jim
If the function was moved to the FROM clause where it would be executed as a 
lateral cross join instead of a target list expression, how would this affect 
the cost-based positioning of the Gather?

On 9/23/19, 8:59 AM, "Robert Haas"  wrote:

On Sun, Sep 22, 2019 at 7:47 AM Darafei "Komяpa" Praliaskouski
 wrote:
> A heuristic I believe should help my case (and I hardly imagine how it 
can break others) is that in presence of Gather, all the function calls that 
are parallel safe should be pushed into it.

The cost of pushing data through the Sort is not necessarily
insignificant.  Your functions are (IIUC) extremely expensive, so it's
worth going to any length to reduce the time spent evaluating them.
However, if someone has ||(text,text) in the tlist, that might be the
wrong approach, because it's not saving much to compute that earlier
and it might make the sort a lot wider, especially if de-TOASTing is
involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Why could GEQO produce plans with lower costs than the standard_join_search?

2019-05-22 Thread Finnerty, Jim
Fwiw, I had an intern do some testing on the JOB last year, and he reported 
that geqo sometimes produced plans of lower cost than the standard planner (we 
were on PG10 at the time).  I filed it under "unexplained things that we need 
to investigate when we have time", but alas...

In any case, Donald isn't the only one who has noticed this behavior. 

On 5/22/19, 3:54 PM, "Donald Dong"  wrote:

On May 22, 2019, at 11:42 AM, Tom Lane  wrote:
> 
> Donald Dong  writes:
>> I find the cost from cheapest_total_path->total_cost is different
>> from the cost from  queryDesc->planstate->total_cost. What I saw was
>> that GEQO tends to form paths with lower
>> cheapest_total_path->total_cost (aka the fitness of the children).
>> However, standard_join_search is more likely to produce a lower
>> queryDesc->planstate->total_cost, which is the cost we get using
>> explain.
> 
>> I wonder why those two total costs are different? If the total_cost
>> from the planstate is more accurate, could we use that instead as the
>> fitness in geqo_eval?
> 
> You're still asking us to answer hypothetical questions unsupported
> by evidence.  In what case does that really happen?

Hi,

My apologies if this is not the minimal necessary set up. But here's
more information about what I saw using the following query
(JOB/1a.sql):

SELECT MIN(mc.note) AS production_note,
   MIN(t.title) AS movie_title,
   MIN(t.production_year) AS movie_year
FROM company_type AS ct,
 info_type AS it,
 movie_companies AS mc,
 movie_info_idx AS mi_idx,
 title AS t
WHERE ct.kind = 'production companies'
  AND it.info = 'top 250 rank'
  AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'
  AND (mc.note LIKE '%(co-production)%'
   OR mc.note LIKE '%(presents)%')
  AND ct.id = mc.company_type_id
  AND t.id = mc.movie_id
  AND t.id = mi_idx.movie_id
  AND mc.movie_id = mi_idx.movie_id
  AND it.id = mi_idx.info_type_id;

I attached the query plan and debug_print_rel output for GEQO and
standard_join_search.

planstate->total_cost   cheapest_total_path
GEQO54190.1354239.03
STD 54179.0254273.73

Here I observe GEQO  produces a lower
cheapest_total_path->total_cost, but its planstate->total_cost is higher
than what standard_join_search produces.

Regards,
Donald Dong





Re: Removing useless DISTINCT clauses

2018-08-24 Thread Finnerty, Jim
I feel strongly that eliminating the entire DISTINCT or GROUP BY clause (when 
there are no aggs) is an important optimization, especially when the 
incremental cost to test for it is so tiny.  I'm happy to submit that as a 
separate thread.

My goal here was to move the original proposal along and contribute a little 
something back to the community in the process.  DISTINCT optimization is 
currently quite poor compared to the leading commercial RDBMS alternatives, and 
doing unnecessary DISTINCT in the single-table case is an example of that.  
There are other missing DISTINCT optimizations.

I'll explore a proper way to test that it's in the single-relation case, and 
will post a separate thread for the 'remove unnecessary DISTINCT' optimization.

Cheers,

/Jim


On 8/23/18, 11:12 PM, "David Rowley"  wrote:

You might be confusing #1 and #2. My concern is with #2.  The existing
GROUP BY clause optimisation is almost identical to #1. I just wanted
to also apply it to the DISTINCT clause.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: New committers announced at PGCon 2018

2018-06-04 Thread Finnerty, Jim
Congratulations, everyone!   (I wonder if we have any female PG committers?)

On 6/4/18, 9:14 AM, "Jesper Pedersen"  wrote:

On 06/01/2018 05:05 PM, Tom Lane wrote:
> The core team is pleased to announce the appointment of seven
> new Postgres committers:
> 
> Etsuro Fujita
> Peter Geoghegan
> Amit Kapila
> Alexander Korotkov
> Thomas Munro
> Michael Paquier
> Tomas Vondra
> 
> Congratulations to all!
> 

Congratulations to all - well deserved !

Best regards,
  Jesper