[HACKERS] GiST support for inet datatypes

2013-12-17 Thread Emre Hasegeli
Hi,

Attached patch adds GiST support to the inet datatypes with two new
operators. Overlaps operator can be used with exclusion constraints.
Is adjacent to operator is just the negator of it. Index uses only
the network bits of the addresses. Except for the new operators and
is contained within, contains; basic comparison operators are also
supported.

Query planner never chooses to use the index for the operators which
the index is particularly useful because selectivity estimation functions
are missing. I am planning to work on them.

I also wanted to add strictly left of and strictly right of operators
but I did not want to introduce new symbols. I think we need style
guidelines for them. Range types use @ and @ for is contained within
and contains operators;  and  for strictly left of and strictly right of
operators. It would be nice if we could change the symbols for contains
and is contained within operators of the inet datatypes. Then we could
use the old ones for strictly left of and strictly right of operators.

I did not touch opr_sanity regression tests as I did not decide
how to solve these problems. I did not add documentation except
the new operators. It would be nice to mention the index and exclusion
constraints for inet datatypes somewhere. I did not know which page
would be more suitable.


inet-gist-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Wrong comment for bitncmp function in network.c

2014-01-04 Thread Emre Hasegeli
I am not sure it worth reporting but it took me a while to find out
what is wrong with comparing two values returned from
the bitncmp function. It does not return -1, 1 or 0 as it is written
on the comment when n % 8 == 0.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-01-05 Thread Emre Hasegeli
2013-12-17 Emre Hasegeli e...@hasegeli.com:
 Query planner never chooses to use the index for the operators which
 the index is particularly useful because selectivity estimation functions
 are missing. I am planning to work on them.

Attached patch adds selectivity estimation functions for the overlap and
adjacent operators. Other operators need a bit more work. I want to send it
before to get some feedback.


inet-selfuncs-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Failed assertion root-hasLateralRTEs on initsplan.c

2014-01-07 Thread Emre Hasegeli
I get assertion failure on initsplan.c line 1325 while executing following query
on HEAD (edc43458d797a5956f4bf39af18cf62abb0077db). It works fine
without --enable-cassert.

update subscriber set properties = hstore(a) from (select firstname,
lastname from player where player.id = subscriber.id) as a;

Backtrace:

* thread #1: tid = 0x2e16ec, 0x7fff85f0b866
libsystem_kernel.dylib`__pthread_kill + 10, queue =
'com.apple.main-thread, stop reason = signal SIGABRT
frame #0: 0x7fff85f0b866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff8450335c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x7fff82ffdbba libsystem_c.dylib`abort + 125
frame #3: 0x00010e2b7510
postgres`ExceptionalCondition(conditionName=unavailable,
errorType=unavailable, fileName=unavailable,
lineNumber=unavailable) + 80 at assert.c:54
frame #4: 0x00010e155ab6
postgres`distribute_qual_to_rels(root=unavailable,
clause=0x7fd5c382e208, is_deduced='\0', below_outer_join='\0',
jointype=JOIN_INNER, qualscope=0x7fd5c3835ee8,
ojscope=unavailable, outerjoin_nonnullable=unavailable,
deduced_nullable_relids=unavailable,
postponed_qual_list=unavailable) + 1254 at initsplan.c:1325
frame #5: 0x00010e154a66
postgres`deconstruct_recurse(root=0x7fd5c382c248,
jtnode=0x7fd5c382cde0, below_outer_join='\0',
qualscope=0x7fff51c723f8, inner_join_rels=unavailable,
postponed_qual_list=0x7fff51c72400) + 870 at initsplan.c:781
frame #6: 0x00010e1548ab
postgres`deconstruct_recurse(root=0x7fd5c382c248,
jtnode=0x7fd5c382bfd8, below_outer_join='\0',
qualscope=0x7fff51c72450, inner_join_rels=0x7fff51c72448,
postponed_qual_list=0x7fff51c72440) + 427 at initsplan.c:732
frame #7: 0x00010e1546a1
postgres`deconstruct_jointree(root=unavailable) + 81 at
initsplan.c:655
frame #8: 0x00010e156a1b
postgres`query_planner(root=0x7fd5c382c248,
tlist=0x7fd5c382e398, qp_callback=0x00010e15a660,
qp_extra=0x7fff51c725f0) + 219 at planmain.c:145
frame #9: 0x00010e1589d8
postgres`grouping_planner(root=0x7fd5c382c248,
tuple_fraction=unavailable) + 2888 at planner.c:1243
frame #10: 0x00010e157adf
postgres`subquery_planner(glob=0x7fd5c4007e68,
parse=0x7fd5c4007a30, parent_root=unavailable,
hasRecursion=unavailable, tuple_fraction=0,
subroot=0x7fff51c72900) + 3119 at planner.c:572
frame #11: 0x00010e156cac
postgres`standard_planner(parse=0x7fd5c4007a30,
cursorOptions=unavailable, boundParams=unavailable) + 236 at
planner.c:210
frame #12: 0x00010e1d6356
postgres`pg_plan_query(querytree=0x7fd5c4007a30, cursorOptions=0,
boundParams=0x) + 118 at postgres.c:759
frame #13: 0x00010e1d979a postgres`PostgresMain [inlined]
pg_plan_queries(cursorOptions=0, querytrees=unavailable,
boundParams=unavailable) + 56 at postgres.c:818
frame #14: 0x00010e1d9762 postgres`PostgresMain [inlined]
exec_simple_query(query_string=0x7fd5c4006038) + 21 at
postgres.c:983
frame #15: 0x00010e1d974d
postgres`PostgresMain(argc=unavailable, argv=unavailable,
dbname=0x7fd5c301ac30, username=unavailable) + 8749 at
postgres.c:4011
frame #16: 0x00010e184c1f postgres`PostmasterMain [inlined]
BackendRun + 7551 at postmaster.c:4085
frame #17: 0x00010e184c00 postgres`PostmasterMain [inlined]
BackendStartup at postmaster.c:3774
frame #18: 0x00010e184c00 postgres`PostmasterMain [inlined]
ServerLoop at postmaster.c:1585
frame #19: 0x00010e184c00
postgres`PostmasterMain(argc=unavailable, argv=unavailable) + 7520
at postmaster.c:1240
frame #20: 0x00010e11924f postgres`main(argc=1,
argv=0x7fd5c2c03ec0) + 783 at main.c:194
frame #21: 0x7fff897165fd libdyld.dylib`start + 1
frame #22: 0x7fff897165fd libdyld.dylib`start + 1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: implement gen_random_uuid

2014-01-17 Thread Emre Hasegeli
2014/1/9 Oskari Saarenmaa o...@ohmu.fi:
 The only useful feature of the uuid-ossp module in my opinion is the
 uuid_generate_v4 function and as uuid-ossp is more or less abandonware
 people have had trouble building and installing it.  This patch implements
 an alternative uuid v4 generation function in pgcrypto which could be moved
 to core once there's a core PRNG with large enough internal state.

It is a small but very useful patch. Installing uuid-ossp can be very hard
on some systems. There is not much to review. The patch applies cleanly to
HEAD. The function is generating valid UUID version 4. The code and
the documentation style seems to fit in the pgcrypto extension. I am marking
it as Ready for Commiter.

The problem is users probably would not look pgcrypto extension for
UUID generator, especially when there is another extension with uuid in
it's name. Also, UUID generator does not sound like a cryptographic function.
It would be much better, if this would be in core with the UUID type. There
is a reference on the UUID Type documentation page to the uuid-ossp
extension. We can add a reference to pgcrypro extension in that page and
consider adding a deprecation note to the uuid-ossp extension, if is is not
possible to add the function to the core, for now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-01-19 Thread Emre Hasegeli
2014-01-19 Andreas Karlsson andr...@proxel.se:
 Hi,

 I will review your two patches (gist support + selectivity). This is part 1
 of my review. I will look more into the actual GiST implementation in a
 couple of days, but thought I could provide you with my initial input right
 away.

Thank you for looking at it.


 inet-gist
 -

 General:

 I like the idea of the patch and think the  operator is useful for
 exclusion constraints, and that indexing the contains operator is useful for
 IP-address lookups. There is an extension, ip4r, which adds a GiST indexed
 type for IP ranges but since we have the inet type in core I think it should
 have GiST indexes.

 I am not convinced an adjacent operator is useful for the inet type, but if
 it is included it should be indexed just like -|- of ranges. We should try
 to keep these lists of indexed operators the same.

I added it just not to leave negotor field empty. It can also be useful with
exclusion constraints but not with GiST support. I did not add GiST support
for it and the not equals operator because they do not fit the index
structure. I can just remove the operator for now.


 Compilation:

 Compiled without errors.

 Regression tests:

 One of the broken regression tests seems unrelated to the selectivity
 functions.

 -- Make a list of all the distinct operator names being used in particular
 -- strategy slots.

 I think it would be fine just to add the newly indexed operators here, but
 the more indexed operators we get in the core the less useful this test
 becomes.

I did not add the new operators to the list because I do not feel right
about supporting , =, , = symbols for these operators.
They should be @, @=, @, @= to be consistent with other types.


 I am a bit suspicious about your memcmp based optimization in bitncommon,
 but it could be good. Have you benchmarked it compared to doing the same
 thing with a loop?

I did, when I was writing that part. I will be happy to do it again. I will
post the results.


 Documentation:

 Please use examples in the operator table which will evaluate to true, and
 for the  case an example where not both sides are the same.

I will change in the next version.


 I have not found a place either in the documentation where it is documented
 which operators are documented. I would suggest just adding a short note
 after the operators table.

I will add in the next version.


 inet-selfuncs
 -

 Compilation:

 There were some new warnings caused by this patch (with gcc 4.8.2). The
 warnings were all about the use of uninitialized variables (right,
 right_min_bits, right_order) in inet_hist_overlap_selectivity. Looking at
 the code I see that they are harmless but you should still rewrite the loop
 to silence the warnings.

I will fix in the next version.


 Regression tests:

 There are two broken

 -- Insist that all built-in pg_proc entries have descriptions

 Here you should just add descriptions for the new functions in pg_proc.h.

I will add in the next version.


 -- Check that all opclass search operators have selectivity estimators.

 Fails due to missing selectivity functions for the operators. I think you
 should at least for now do like the range types and use areajoinsel,
 contjoinsel, etc for the join selectivity estimation.

I did not support them in the first version because I could not decide
the way. I have better options than using the the geo_selfuncs for these
operators. The options are from simple to complex:

0) just use network_overlap_selectivity
1) fix network_overlap_selectivity with a constant between 0 and 1
2) calculate this constant by using masklens of all buckets of the histogram
3) calculate this constant by using masklens of matching buckets of
   the histogram
4) store STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for this
   types, calculate this constant with it
5) create another kind of histogram for masklens, calculate this constant
   with it

I do not know how to do 4 or 5, so I will try 3 for the next version. Do you
think it is a good idea?


 Source code:

 The selectivity estimation functions, network_overlap_selectivity and
 network_adjacent_selectivity, should follow the naming conventions of the
 other selectivity estimation functions. They are named things like
 tsmatchsel, tsmatchjoinsel, and rangesel.

I will rename it to inetoverlapsel, then.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-06 Thread Emre Hasegeli
2014-01-19 12:10, Emre Hasegeli e...@hasegeli.com:
 2014-01-19 Andreas Karlsson andr...@proxel.se:

 I am a bit suspicious about your memcmp based optimization in bitncommon,
 but it could be good. Have you benchmarked it compared to doing the same
 thing with a loop?

 I did, when I was writing that part. I will be happy to do it again. I will
 post the results.

I was testing it by creating GiST indexes. I realized that these test are
inconsistent when BUFFERING = AUTO. I repeated them with BUFFERING = ON.
The function without memcmp was faster in this case. I will change
the function in the next version of the patch.

The test case:

Create table Network as
select (a || '.' || b || '.' || c || '/24')::cidr
from generate_series(0, 255) as a,
generate_series(0, 255) as b,
generate_series(0, 255) as c;

Drop index if exists N;
Create index N on Network using gist(cidr) with (buffering = on);

Create table Network6 as
select ('::' || to_hex(a) || ':' || to_hex(b))::inet
from generate_series(0, 255) as a,
generate_series(0, 65535) as b;

Drop index if exists N6;
Create index N6 on Network6 using gist(inet) with (buffering = on);

What I could not understand is the tests with IP version 6 was much faster.
The row count is same, the index size is bigger.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with displaying wide tables in psql

2014-02-15 Thread Emre Hasegeli
Hi,

This is my review about 3th version of the patch. It is an useful
improvement in my opinion. It worked well on my environment.

2013-12-11 17:43:06, Sergey Muraviov sergey.k.murav...@gmail.com:
 It works in expanded mode when either format option is set to wrapped
 (\pset format wrapped), or we have no pager, or pager doesn't chop long
 lines (so you can still use the trick).

I do not like this logic on the IsWrappingNeeded function. It does not
seems right to check the environment variables for less. It would be hard
to explain this behavior to the users. It is better to make this only
the behavior of the wrapped format in expanded mode, in my opinion.

   {
   if (opt_border  2)
   fprintf(fout, %s\n, 
 dlineptr[line_count].ptr);
   else
   fprintf(fout, %-s%*s %s\n, 
 dlineptr[line_count].ptr,
   dwidth - 
 dlineptr[line_count].width, ,
   
 dformat-rightvrule);
   }

Is it necessary to keep this old print line code? It seems to me the new
code works well on (dlineptr[line_count].width = dwidth) condition.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with displaying wide tables in psql

2014-02-17 Thread Emre Hasegeli
2014-02-16 18:37, Sergey Muraviov sergey.k.murav...@gmail.com:

 New code doesn't work with empty strings but I've done minor optimization
 for this case.

It seems better now. I added some new lines and spaces, removed unnecessary
parentheses and marked it as Ready for Committer.


fix_psql_print_aligned_vertical_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-17 Thread Emre Hasegeli
2014-02-17 22:16, Tom Lane t...@sss.pgh.pa.us:

 More generally, it doesn't look to me like these upgrade scripts are
 complete; shouldn't they be creating some new objects, not just replacing
 old ones?

The actual patches are on the previous mail [1]. I was just trying
to solve the problem that btree_gist cannot be loaded because of
the new operator class.

 In short we probably need to think a bit harder about what this patch is
 proposing to do.  It seems fairly likely to me that some other approach
 would be a better idea.

How about only removing the inet and the cidr operator classes
from btree_gist. btree-gist-drop-inet-v2.patch does that.

[1] 
http://www.postgresql.org/message-id/cae2gyzxc0fxewe59sfduznq24c+frbdmgxwwvbyvmeanate...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-19 Thread Emre Hasegeli
2014-02-19 16:52, Tom Lane t...@sss.pgh.pa.us:
 Not at all, AFAICS.  If it were okay to decide that some formerly-default
 opclass is no longer default, then having such a command would be better
 than manually manipulating pg_opclass.opcdefault --- but extension upgrade
 scripts could certainly do the latter if they had to.  The problem here
 is whether it's upgrade-safe to make such a change at all; having
 syntactic sugar doesn't make it safer.

It is not hard to support != operator on the new operator class. That would
make it functionally compatible with the ones on btree_gist. That way,
changing the default would not break pg_dump dumps, it would only upgrade
the index to the new one.

pg_upgrade dumps current objects in the extension. It fails to restore them,
if the new operator class exists as the default. I do not really understand
how pg_upgrade handle extension upgrades. I do not have a solution to this.

It would be sad if we cannot make the new operator class default because of
the btree_gist implementation which is not useful for inet data types. You
wrote on 2010-10-11 about it [1]:

 Well, actually the btree_gist implementation for inet is a completely
 broken piece of junk: it thinks that convert_network_to_scalar is 100%
 trustworthy and can be used as a substitute for the real comparison
 functions, which isn't even approximately true.  I'm not sure why
 Teodor implemented it like that instead of using the type's real
 comparison functions, but it's pretty much useless if you want the
 same sort order that the type itself defines.

[1] http://www.postgresql.org/message-id/8973.1286841...@sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-20 Thread Emre Hasegeli
2014-02-20 01:37, Tom Lane t...@sss.pgh.pa.us:

 Perhaps it would be acceptable to drop the btree_gist implementation
 and teach pg_upgrade to refuse to upgrade if the old database contains
 any such indexes.  I'm not sure that solves the problem, though, because
 I think pg_upgrade will still fail if the opclass is in the old database
 even though unused (because you'll get the complaint about multiple
 default opclasses anyway).  The only simple way to avoid that is to not
 have btree_gist loaded at all in the old DB, and I doubt that's an
 acceptable upgrade restriction.  It would require dropping indexes of
 the other types supported by btree_gist, which would be a pretty hard
 sell since those aren't broken.

 Really what we'd need here is for btree_gist to be upgraded to a version
 that doesn't define gist_inet_ops (or at least doesn't mark it as default)
 *before* pg_upgrading to a server version containing the proposed new
 implementation.  Not sure how workable that is.  Could we get away with
 having pg_upgrade unset the default flag in the old database?
 (Ick, but there are no very good alternatives here ...)

Upgrading btree_gist on the old installation would be almost impossible
for the majority of the users who use package managers, in my opinion.
I cannot think of a better solution than your suggestion. I can try to
prepare a patch to execute the following query on pg_upgrade before
dumping the old database, if that is the final decision.

UPDATE pg_opclass SET opcdefault = false
WHERE opcname IN ('gist_inet_ops', 'gist_cidr_ops');

 BTW, I'd not been paying any attention to this thread, but now that
 I scan through it, I think the proposal to change the longstanding
 names of the inet operators has zero chance of being accepted.
 Consistency with the names chosen for range operators is a mighty
 weak argument compared to backwards compatibility.

That is why I prepared it as a separate patch on top of the others. It is
not only consistency with the range types: @ and @ symbols used for
containment everywhere except the inet data types, particularly on
the geometric types, arrays; cube, hstore, intaray, ltree extensions.
The patch does not just change the operator names, it leaves the old ones,
adds new operators with GiST support and changes the documentation, like
your commit ba920e1c9182eac55d5f1327ab0d29b721154277 back in 2006. I could
not find why did you leave the inet operators unchanged on that commit,
in the mailing list archives [1]. GiST support will be a promotion for
users to switch to the new operators, if we make this change with it.

This change will also indirectly deprecate the undocumented non-transparent
btree index support that works sometimes for some of the subnet inclusion
operators [2].

[1] http://www.postgresql.org/message-id/14277.1157304...@sss.pgh.pa.us

[2] http://www.postgresql.org/message-id/389.1042992...@sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-26 Thread Emre Hasegeli
2014-02-24 02:44, Andreas Karlsson andr...@proxel.se:

 Note: The patches do not apply anymore due to changes to
 src/backend/utils/adt/Makefile.

I will fix it on the next version.

 I see, thanks for the explanation. But I am still not very fond of how that
 code is written since I find it hard to verify the correctness of it, but
 have no better suggestions.

We can split the selectivity estimation patch from the GiST support, add
it to the next commit fest for more review. In the mean time, I can
improve it with join selectivity estimation. Testing with different datasets
would help the most to reveal how true the algorithm is.

 Do you think the new index is useful even if you use the basic geo_selfuncs?
 Or should we wait with committing the patches until all selfuncs are
 implemented?

My motivation was to be able to use the cidr datatype with exclusion
constraints. I think the index would also be useful without proper
selectivity estimation functions. Range types also use geo_selfuncs for
join selectivity estimation.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-27 Thread Emre Hasegeli
2014-02-24 17:55, Bruce Momjian br...@momjian.us:

 pg_upgrade has _never_ modified the old cluster, and I am hesitant to do
 that now.  Can we make the changes in the new cluster, or in pg_dump
 when in binary upgrade mode?

It can be possible to update the new operator class in the new cluster
as not default, before restore. After the restore, pg_upgrade can upgrade
the btree_gist extension and reset the operator class as the default.
pg_upgrade suggests to re-initdb the new cluster if it fails, anyway. Do
you think it is a better solution? I think it will be more complicated
to do in pg_dump when in binary upgrade mode.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-28 Thread Emre Hasegeli
2014-02-27 18:15, Florian Pflug f...@phlo.org:
 It can be possible to update the new operator class in the new cluster
 as not default, before restore. After the restore, pg_upgrade can upgrade
 the btree_gist extension and reset the operator class as the default.
 pg_upgrade suggests to re-initdb the new cluster if it fails, anyway. Do
 you think it is a better solution? I think it will be more complicated
 to do in pg_dump when in binary upgrade mode.

 Maybe I'm missing something, but isn't the gist of the problem here that
 pg_dump won't explicitly state the operator class if it's the default?

No, the problem is pg_upgrade. We can even benefit from this behavior of
pg_dump, if we would like to remove the operator classes on btree_gist.
Because of that behavior; users, who would upgrade by dump and restore,
will upgrade to the better default operator class without noticing. I am
not sure not notifying is the best think to do, though.

The problem is that pg_dump --binary-upgrade dumps objects in the extension
on the old cluster, not just the CREATE EXTENSION statement. pg_upgrade
fails to restore them, if the new operator class already exists on the new
cluster as the default. It effects all users who use the extension, even
if they are not using the inet and cidr operator classes in it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-28 Thread Emre Hasegeli
2014-02-28 17:30, Florian Pflug f...@phlo.org:
 Hm, what if we put the new opclass into an extension of its own, say 
 inet_gist,
 instead of into core?

It will work but I do not think it is better than adding it in core as
not default.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-04-09 Thread Emre Hasegeli
Tom Lane t...@sss.pgh.pa.us:
 Committed with some additional documentation work.  Thanks for
 submitting this!

Thank you for committing. I had not thought of using different structure
for the index. It works faster with my test case, too.

I am sending rebased version of the consistent operator names patch
in case you would like to include it to the same release. This is what
I wrote about this change before:

 That is why I prepared it as a separate patch on top of the others. It is
 not only consistency with the range types: @ and @ symbols used for
 containment everywhere except the inet data types, particularly on
 the geometric types, arrays; cube, hstore, intaray, ltree extensions.
 The patch does not just change the operator names, it leaves the old ones,
 adds new operators with GiST support and changes the documentation, like
 your commit ba920e1c9182eac55d5f1327ab0d29b721154277 back in 2006. I could
 not find why did you leave the inet operators unchanged on that commit,
 in the mailing list archives [1]. GiST support will be a promotion for
 users to switch to the new operators, if we make this change with it.

 This change will also indirectly deprecate the undocumented non-transparent
 btree index support that works sometimes for some of the subnet inclusion
 operators [2].

 [1] http://www.postgresql.org/message-id/14277.1157304...@sss.pgh.pa.us

 [2] http://www.postgresql.org/message-id/389.1042992...@sss.pgh.pa.us


inet-operator-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-05-12 Thread Emre Hasegeli
Pavel Stehule pavel.steh...@gmail.com:
 Hello

 I am checking feature
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6513633b94173fc1d9e2b213c43f9422ddbf5faa

 It works perfect with pager less, but it works badly with default more

 see attached screenshots, pls

 It is expected behave?

I do not think so. It looks broken with or without any pager when
border != 2. Your less configuration might be hiding the problem from you.

I think it is because of miscalculation of the width used by
the separators. Increasing this variable for border = 0 and 1 fixed
the problem, but it might not be the right fix. The patch without
regression test changes attached.

While looking at it, I found another problem. It seems to me, a minus sign
is missing after -[RECORD  ] when border = 1.


psql-wrapped-expanded-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Selectivity estimation for inet operators

2014-05-15 Thread Emre Hasegeli
New version of the selectivity estimation patch attached. I am adding
it to CommitFest 2014-06. Previous version of it reviewed by
Andreas Karlson on the previous CommitFest with the GiST support patch.
The new version includes join selectivity estimation.

Join selectivity is calculated in 4 steps:

* matching first MCV to second MCV
* searching first MCV in the second histogram
* searching second MCV in the first histogram
* searching boundaries of the first histogram in the second histogram

Comparing the lists with each other slows down the function when
statistics set to higher values. To avoid this problem I only use
log(n) values of the lists. It is the first log(n) value for MCV,
evenly separated values for histograms. In my tests, this optimization
does not affect the planning time when statistics = 100, but does
affect accuracy of the estimation. I can send the version without
this optimization, if slow down with larger statistics is not a problem
which should be solved on the selectivity estimation function.

I also attach the script I was using for testing and I left log statements
in the networkjoinsel() function to make testing easier. These statements
should be removed before commit.


inet-selfuncs-v4.patch
Description: Binary data


inet-selfuncs-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-06-18 Thread Emre Hasegeli
I wanted to check the patch last time and found a bug effecting
MVC vs MVC part of the join selectivity. Fixed version attached.

Emre Hasegeli e...@hasegeli.com:
 Comparing the lists with each other slows down the function when
 statistics set to higher values. To avoid this problem I only use
 log(n) values of the lists. It is the first log(n) value for MCV,
 evenly separated values for histograms. In my tests, this optimization
 does not affect the planning time when statistics = 100, but does
 affect accuracy of the estimation. I can send the version without
 this optimization, if slow down with larger statistics is not a problem
 which should be solved on the selectivity estimation function.

Also, I changed this from log(n) to sqrt(n). It seems much better
now.

I try to explain the reason to processes some of the values with more
comments. I hope it is understandable.


inet-selfuncs-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Better partial index-only scans

2014-06-29 Thread Emre Hasegeli
Joshua Yanovski pythones...@gmail.com:
 Proof of concept initial patch for enabling index only scans for
 partial indices even when an attribute is not in the target list, as
 long as it is only used in restriction clauses that can be proved by
 the index predicate.  This also works for index quals, though they
 still can't be used in the target list.  However, this patch may be
 inefficient since it duplicates effort that is currently delayed until
 after the best plan is chosen.

I find the improvement very useful.  I use functional and partial
indexes, a lot.  I think we even need a dummy index to make more use
of these features.

 The patch works by basically repeating the logic from
 create_indexscan_plan in createplan.c that determines which clauses
 can't be discarded, instead of the current approach, which just
 assumes that any attributes referenced anywhere in a restriction
 clause has to be a column in the relevant index.  It should build
 against master and passes make check for me.  It also includes a minor
 fix in the same code in createplan.c to make sure we're explicitly
 comparing an empty list to NIL, but I can take that out if that's not
 considered in scope.  If this were the final patch I'd probably
 coalesce the code used in both places into a single function, but
 since I'm not certain that the implementation in check_index_only
 won't change substantially I held off on that.
 
 Since the original comment suggested that this was not done due to
 planner performance concerns, I assume the performance of this
 approach is unacceptable (though I did a few benchmarks and wasn't
 able to detect a consistent difference--what would be a good test for
 this?).  As such, this is intended as more of a first pass that I can
 build on, but I wanted to get feedback at this stage on where we can
 improve (particularly if there were already ideas on how this might be
 done, as the comment hints).  Index only scans cost less than regular
 index scans so I don't think we can get away with waiting until we've
 chosen the best plan before we do the work described above.  That
 said, as I see it performance could improve in any combination of five
 ways:
 * Improve the performance of determining which clauses can't be
 discarded (e.g. precompute some information about equivalence classes
 for index predicates, mess around with the order in which we check the
 clauses to make it fail faster, switch to real union-find data
 structures for equivalence classes).
 * Find a cleverer way of figuring out whether a partial index can be
 used than just checking which clauses can't be discarded.
 * Use a simpler heuristic (that doesn't match what use to determine
 which clauses can be discarded, but still matches more than we do
 now).
 * Take advantage of work we do here to speed things up elsewhere (e.g.
 if this does get chosen as the best plan we don't need to recompute
 the same information in create_indexscan_plan).
 * Delay determining whether to use an index scan or index only scan
 until after cost analysis somehow.  I'm not sure exactly what this
 would entail.

I do not know much about the internals of the planner.  So, I am not
in a position to decide the performance is acceptable or not.  If it
is not, I think your first solution would minimize the penalty in
almost all scenarios.  Your other options seems harder to implement.

I think, it can also be useful to store predicates implied by the
index clause or the index predicate under the path tree.  We may
make use of them in future improvements.  Especially it would be
nice to avoid calling expensive functions if they are included in
an index.  This approach can also simplify the design.  The predicates
can be stored under IndexPath even index only scan is disabled.
They can be used used unconditionally on create_indexscan_plan().

I will update the patch as returned with feedback, but it would
be nice if someone more experienced give an opinion about these
ideas.  I would be happy to review further developments when they
are ready.  Please let me know if I can help any other way.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-07-06 Thread Emre Hasegeli
Thank you for looking at it.

 In inet_his_inclusion_selec function, 
 When the constant matches only the right side of the bucket, and if it’s a 
 last bucket then it's never considered as partial match candidate.
 In my opinion, if it's not a last bucket then for next bucket it will become 
 left boundary and this will be treated as partial match so no problem, but 
 in-case of last bucket it can give wrong selectivity.
 
 Can't we consider it as partial bucket match if it is last bucket ?

Actually, in that case, the ratio for one value in the column is used.
I clarified the comment about it.  I do not think it is common enough
case to make the function more complicated.

 Apart from that there is one spell check you can correct
 -- in inet_his_inclusion_selec comments
 histogram boundies  - histogram boundaries :)

I fixed it.  New version attached.  The debug log statements are also
removed.
diff --git a/src/backend/utils/adt/network_selfuncs.c 
b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..d8aeae9 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,677 @@
 /*-
  *
  * network_selfuncs.c
  *   Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, distinct value count, most common
+ * values, and histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *   src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include utils/lsyscache.h
 #include utils/inet.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity constant for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity constant for the other operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for given operator */
+#define DEFAULT_SEL(operator) \
+   ((operator) == OID_INET_OVERLAP_OP ? \
+   DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static short int inet_opr_order(Oid operator, bool reversed);
+static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues,
+   int red_nvalues, Datum *constvalue, 
double ndistinct,
+   short int opr_order);
+static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1,
+   int nvalues1, Datum *values2, float4 
*numbers2,
+   int nvalues2, Oid operator, bool 
reversed);
+static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers,
+   int mcv_nvalues, Datum *his_values, int 
his_nvalues,
+   int red_nvalues, short int opr_order);
+static Selectivity inet_his_inclusion_join_selec(Datum *his1_values,
+   int his1_nvalues, int red1_nvalues, 
Datum *his2_values,
+   int his2_nvalues, int red2_nvalues, 
short int opr_order);
+static short int inet_inclusion_cmp(inet *left, inet *right,
+   short 
int opr_order);
+static short int inet_masklen_inclusion_cmp(inet *left, inet *right,
+   
short int opr_order);
+static short int inet_his_match_divider(inet *boundary, inet *query,
+   
short int opr_order);
+
+/*
+ * Selectivity estimation for the subnet inclusion operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-   PG_RETURN_FLOAT8(0.001);
+   PlannerInfo*root = (PlannerInfo *) PG_GETARG_POINTER(0);
+   Oid operator = PG_GETARG_OID(1);
+   List   *args = (List *) PG_GETARG_POINTER(2);
+   int varRelid = PG_GETARG_INT32(3),
+   his_nvalues;
+   VariableStatData vardata;
+   Node   *other;
+   boolvaronleft;
+   Selectivity selec,
+   max_mcv_selec,
+   max_his_selec;
+   Datum   constvalue,
+  *his_values;
+   Form_pg_statistic stats;
+   FmgrInfoproc;
+
+   /*
+* If expression 

Re: [HACKERS] Selectivity estimation for inet operators

2014-07-12 Thread Emre Hasegeli
 I have one last comment, after clarifying this I can move it to ready for 
 committer.
 1. In networkjoinsel, For avoiding the case of huge statistics, only some of 
 the values from mcv and histograms are used (calculated using SQRT).
 -- But in my opinion, if histograms and mcv both are exist then its fine, but 
 if only mcv's are there in that case, we can match complete MCV, it will give 
 better accuracy.
In other function like eqjoinsel also its matching complete MCV.

I was not sure of reducing statistics, at all.  I could not find any
other selectivity estimation function which does this.  After testing
it some more, I reached the conclusion that it would be better to
only reduce the values of the outer loop on histogram match.  Now it
matches complete MCV lists to each other.  I also switched back to
log2() from sqrt() to make the outer list smaller.

I rethink your previous advice to threat histogram bucket partially
matched when the constant matches the last boundary, and changed it
that way.  It is better than using the selectivity for only one value.
Removing this part also make the function more simple.  The new
version of the patch attached.

While looking at it I find some other small problems and fixed them.
I also realized that I forgot to support other join types than inner
join.  Currently, the default estimation is used for anti joins.
I think the patch will need more than trivial amount of change to
support anti joins.  I can work on it later.  While doing it, outer
join selectivity estimation can also be improved.  I think the patch
is better than nothing in its current state.
diff --git a/src/backend/utils/adt/network_selfuncs.c 
b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..08ec945 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,626 @@
 /*-
  *
  * network_selfuncs.c
  *   Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, distinct value count, most common
+ * values, and histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *   src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include utils/lsyscache.h
 #include utils/inet.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity constant for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity constant for the other operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for given operator */
+#define DEFAULT_SEL(operator) \
+   ((operator) == OID_INET_OVERLAP_OP ? \
+   DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static short int inet_opr_order(Oid operator, bool reversed);
+static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues,
+   Datum *constvalue, short int opr_order);
+static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1,
+   int nvalues1, Datum *values2, float4 
*numbers2,
+   int nvalues2, Oid operator, bool 
reversed);
+static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers,
+   int mcv_nvalues, Datum *his_values, int 
his_nvalues,
+   int red_nvalues, short int opr_order,
+   Selectivity *max_selec_pointer);
+static Selectivity inet_his_inclusion_join_selec(Datum *his1_values,
+   int his1_nvalues, Datum *his2_values, 
int his2_nvalues,
+   int red_nvalues, short int opr_order);
+static short int inet_inclusion_cmp(inet *left, inet *right,
+   short 
int opr_order);
+static short int inet_masklen_inclusion_cmp(inet *left, inet *right,
+   
short int opr_order);
+static short int inet_his_match_divider(inet *boundary, inet *query,
+   
short int opr_order);
+
+/*
+ * Selectivity estimation for the subnet inclusion operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-   PG_RETURN_FLOAT8(0.001);
+   PlannerInfo*root = (PlannerInfo *) PG_GETARG_POINTER(0);
+   Oid

[HACKERS] Shapes on the regression test for polygon

2014-07-21 Thread Emre Hasegeli
The first two shapes on src/test/regress/sql/polygon.sql do not make
sense to me.  They look more like polygons with some more tabs,
but still did not match the coordinates.  I changed them to make
consistent with the shapes.  I believe this was the intention of
the original author.  Patch attached.
diff --git a/src/test/regress/expected/polygon.out 
b/src/test/regress/expected/polygon.out
index b252902..66ff51d 100644
--- a/src/test/regress/expected/polygon.out
+++ b/src/test/regress/expected/polygon.out
@@ -1,28 +1,28 @@
 --
 -- POLYGON
 --
 -- polygon logic
 --
 -- 3 o
---   |
+--   |
 -- 2   + |
---/  |
--- 1 # o +
---   /|
+--/  |
+-- 1 #   +
+--   /  o |
 -- 0   #-o-+
 --
--- 0 1 2 3 4
+-- 0 1 2 3 4
 --
 CREATE TABLE POLYGON_TBL(f1 polygon);
-INSERT INTO POLYGON_TBL(f1) VALUES ('(2.0,0.0),(2.0,4.0),(0.0,0.0)');
-INSERT INTO POLYGON_TBL(f1) VALUES ('(3.0,1.0),(3.0,3.0),(1.0,0.0)');
+INSERT INTO POLYGON_TBL(f1) VALUES ('(2.0,2.0),(0.0,0.0),(4.0,0.0)');
+INSERT INTO POLYGON_TBL(f1) VALUES ('(3.0,3.0),(1.0,1.0),(3.0,0.0)');
 -- degenerate polygons
 INSERT INTO POLYGON_TBL(f1) VALUES ('(0.0,0.0)');
 INSERT INTO POLYGON_TBL(f1) VALUES ('(0.0,1.0),(0.0,1.0)');
 -- bad polygon input strings
 INSERT INTO POLYGON_TBL(f1) VALUES ('0.0');
 ERROR:  invalid input syntax for type polygon: 0.0
 LINE 1: INSERT INTO POLYGON_TBL(f1) VALUES ('0.0');
 ^
 INSERT INTO POLYGON_TBL(f1) VALUES ('(0.0 0.0');
 ERROR:  invalid input syntax for type polygon: (0.0 0.0
@@ -36,157 +36,170 @@ INSERT INTO POLYGON_TBL(f1) VALUES ('(0,1,2,3');
 ERROR:  invalid input syntax for type polygon: (0,1,2,3
 LINE 1: INSERT INTO POLYGON_TBL(f1) VALUES ('(0,1,2,3');
 ^
 INSERT INTO POLYGON_TBL(f1) VALUES ('asdf');
 ERROR:  invalid input syntax for type polygon: asdf
 LINE 1: INSERT INTO POLYGON_TBL(f1) VALUES ('asdf');
 ^
 SELECT '' AS four, * FROM POLYGON_TBL;
  four | f1  
 --+-
-  | ((2,0),(2,4),(0,0))
-  | ((3,1),(3,3),(1,0))
+  | ((2,2),(0,0),(4,0))
+  | ((3,3),(1,1),(3,0))
   | ((0,0))
   | ((0,1),(0,1))
 (4 rows)
 
 -- overlap
 SELECT '' AS three, p.*
FROM POLYGON_TBL p
-   WHERE p.f1  '(3.0,1.0),(3.0,3.0),(1.0,0.0)';
+   WHERE p.f1  '(3.0,3.0),(1.0,1.0),(3.0,0.0)';
  three | f1  
 ---+-
-   | ((2,0),(2,4),(0,0))
-   | ((3,1),(3,3),(1,0))
+   | ((2,2),(0,0),(4,0))
+   | ((3,3),(1,1),(3,0))
 (2 rows)
 
 -- left overlap
 SELECT '' AS four, p.*
FROM POLYGON_TBL p
-   WHERE p.f1  '(3.0,1.0),(3.0,3.0),(1.0,0.0)';
+   WHERE p.f1  '(3.0,3.0),(1.0,1.0),(3.0,0.0)';
  four | f1  
 --+-
-  | ((2,0),(2,4),(0,0))
-  | ((3,1),(3,3),(1,0))
+  | ((3,3),(1,1),(3,0))
   | ((0,0))
   | ((0,1),(0,1))
-(4 rows)
+(3 rows)
 
 -- right overlap
 SELECT '' AS two, p.*
FROM POLYGON_TBL p
-   WHERE p.f1  '(3.0,1.0),(3.0,3.0),(1.0,0.0)';
+   WHERE p.f1  '(3.0,3.0),(1.0,1.0),(3.0,0.0)';
  two | f1  
 -+-
- | ((3,1),(3,3),(1,0))
+ | ((3,3),(1,1),(3,0))
 (1 row)
 
 -- left of
 SELECT '' AS one, p.*
FROM POLYGON_TBL p
-   WHERE p.f1  '(3.0,1.0),(3.0,3.0),(1.0,0.0)';
+   WHERE p.f1  '(3.0,3.0),(1.0,1.0),(3.0,0.0)';
  one |  f1   
 -+---
  | ((0,0))
  | ((0,1),(0,1))
 (2 rows)
 
 -- right of
 SELECT '' AS zero, p.*
FROM POLYGON_TBL p
-   WHERE p.f1  '(3.0,1.0),(3.0,3.0),(1.0,0.0)';
+   WHERE p.f1  '(3.0,3.0),(1.0,1.0),(3.0,0.0)';
  zero | f1 
 --+
 (0 rows)
 
 -- contained
 SELECT '' AS one, p.*
FROM POLYGON_TBL p
-   WHERE p.f1 @ polygon '(3.0,1.0),(3.0,3.0),(1.0,0.0)';
+   WHERE p.f1 @ polygon '(3.0,3.0),(1.0,1.0),(3.0,0.0)';
  one | f1  
 -+-
- | ((3,1),(3,3),(1,0))
+ | ((3,3),(1,1),(3,0))
 (1 row)
 
 -- same
 SELECT '' AS one, p.*
FROM POLYGON_TBL p
-   WHERE p.f1 ~= polygon '(3.0,1.0),(3.0,3.0),(1.0,0.0)';
+   WHERE p.f1 ~= polygon '(3.0,3.0),(1.0,1.0),(3.0,0.0)';
  one | f1  
 -+-
- | ((3,1),(3,3),(1,0))
+ | ((3,3),(1,1),(3,0))
 (1 row)
 
 -- contains
 SELECT '' AS one, p.*
FROM POLYGON_TBL p
-   WHERE p.f1 @ polygon '(3.0,1.0),(3.0,3.0),(1.0,0.0)';
+   WHERE p.f1 @ polygon '(3.0,3.0),(1.0,1.0),(3.0,0.0)';
  one | f1  
 -+-
- | ((3,1),(3,3),(1,0))
+ | ((3,3),(1,1),(3,0))
 (1 row)
 
 --
 -- polygon logic
 --
 -- 3 o
---   |
--- 2   + |
---/  |
+--  /|
+-- 2   + |
+--/  |
 -- 1 / o +
 --   /|
 -- 0   +-o-+
 --
--- 0 1 2 3 4
+-- 0 1 2 3 4
 --
 -- left of
-SELECT polygon '(2.0,0.0),(2.0,4.0),(0.0,0.0)'  

Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-23 Thread Emre Hasegeli
 That seems like a nonstarter :-(.  Index-only scans don't have a license
 to return approximations.  If we drop the behavior for circles, how much
 functionality do we have left?

It should work with exact operator classes, box_ops, point_ops,
range_ops, inet_ops.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shapes on the regression test for polygon

2014-07-25 Thread Emre Hasegeli
 Well, I think the number of tabs that makes them look best depends on
 your tab-stop setting.  At present, I find that with 8-space tabs
 things seem to line up pretty well, whereas with your patch, 4-space
 tabs line up well.  Either way, I have no idea what the picture is
 supposed to mean, because it looks to me like the original picture has
 circles at (3,3) and (2,1) and plus signs at (2,2), (3,1), and (4,0).
 With your patch applied, the circle at (2,1) has moved to (2,0.5).  I
 can't correlate any of this to the test cases you modified (and which
 I see no reason to modify, whether they match the picture or not).

4 space tab-stop is not the project standard?

The circle I moved does not represent a corner of the polygon.  I just
moved it to make the line straight after the tabs.

 And really, if the diagram is confusing, let's just remove it.  We
 don't really need our regression test comments to illustrate what a
 triangle looks like, or how to do bad ASCII art.

 Personally, though, my vote would be to just leave it the way it is.
 I don't think there's really a problem here in need of solving.

I am sorry for taking your time for such a low priority problem, but
as we stumble across this, let me to make the change more clear.
According to git blame the tests with the shapes added by 04688df6.
The coordinates was written in the old format like this:

(3.0,3.0,1.0,1.0,3.0,0.0)

These are changed by 3d9584c9 to the new format like this:

(3.0,1.0),(3.0,3.0),(1.0,0.0)

In my opinion, the real intention of the first commit was to write
them in the new format like this:

(3.0,3.0),(1.0,1.0),(3.0,0.0)

It makes sense because the corners match the shape.  So, I changed it
that way.  If we will not going to change the tests, It would be okay
to just remove the shapes.  I do not think they add more value to
the tests.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] KNN-GiST with recheck

2014-08-03 Thread Emre Hasegeli
  1. This patch introduces a new polygon - point operator. That seems
  useful on its own, with or without this patch.
 
 Yeah, but exact-knn cant come with no one implementation. But it would
 better come in a separate patch.

I tried to split them.  Separated patches are attached.  I changed
the order of the arguments as point - polygon, because point was
the first one on all the others.  Its commutator was required for
the index, so I added it on the second patch.  I also added tests
for the operator.  I think it is ready for committer as a separate
patch.  We can add it to the open CommitFest.

I have made some cosmetic changes on the patches.  I hope they are
useful.

I added support to point - circle operator with the same GiST
distance function you added for polygon.  I can change it, if it is not
the right way.

  2. I wonder how useful it really is to allow mixing exact and non-exact
  return values from the distance function. The distance function included in
  the patch always returns recheck=true. I have a feeling that all other
  distance function will also always return either true or false.
 
 For geometrical datatypes recheck variations in consistent methods are also
 very rare (I can't remember any). But imagine opclass for arrays where keys
 have different representation depending on array length. For such opclass
 and knn on similarity recheck flag could be useful.

I also wonder how useful it is.  Your example is convincing, but maybe
setting it index-wide will make the decisions on the framework easier.
For example, how hard would it be to decide if further sorting is
required or not on the planner?

  4. (as you mentioned in the other thread: ) It's a modularity violation
  that you peek into the heap tuple from gist. I think the proper way to do
  this would be to extend the IndexScan executor node to perform the
  re-shuffling of tuples that come from the index in wrong order, or perhaps
  add a new node type for it.
 
  Of course that's exactly what your partial sort patch does :-). I haven't
  looked at that in detail, but I don't think the approach the partial sort
  patch takes will work here as is. In the KNN-GiST case, the index is
  returning tuples roughly in the right order, but a tuple that it returns
  might in reality belong somewhere later in the ordering. In the partial
  sort patch, the input stream of tuples is divided into non-overlapping
  groups, so that the tuples within the group are not sorted, but the groups
  are. I think the partial sort case is a special case of the KNN-GiST case,
  if you consider the lower bound of each tuple to be the leading keys that
  you don't need to sort.
 
 Yes. But, for instance btree accesses heap for unique checking. Is really
 it so crimilal? :-)
 This is not only question of a new node or extending existing node. We need
 to teach planner/executor access method can return value of some expression
 which is lower bound of another expression. AFICS now access method can
 return only original indexed datums and TIDs. So, I afraid that enormous
 infrastructure changes are required. And I can hardly imagine what they
 should look like.

Unfortunately, I am not experienced enough to judge your implementation.
As far as I understand the problem is partially sorting rows on
the index scan node.  It can lead the planner to choose non-optimal
plans, because of not taking into account the cost of sorting.
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index 54391fd..402ea40 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -2408,36 +2408,42 @@ lseg_interpt(PG_FUNCTION_ARGS)
  **Routines for position comparisons of differently-typed
  **2D objects.
  **
  ***/
 
 /*-
  * dist_
  * Minimum distance from one object to another.
  *---*/
 
+/*
+ * Distance from a point to a line
+ */
 Datum
 dist_pl(PG_FUNCTION_ARGS)
 {
Point  *pt = PG_GETARG_POINT_P(0);
LINE   *line = PG_GETARG_LINE_P(1);
 
PG_RETURN_FLOAT8(dist_pl_internal(pt, line));
 }
 
 static double
 dist_pl_internal(Point *pt, LINE *line)
 {
return fabs((line-A * pt-x + line-B * pt-y + line-C) /
HYPOT(line-A, line-B));
 }
 
+/*
+ * Distance from a point to a lseg
+ */
 Datum
 dist_ps(PG_FUNCTION_ARGS)
 {
Point  *pt = PG_GETARG_POINT_P(0);
LSEG   *lseg = PG_GETARG_LSEG_P(1);
 
PG_RETURN_FLOAT8(dist_ps_internal(pt, lseg));
 }
 
 static double
@@ -2487,21 +2493,21 @@ dist_ps_internal(Point *pt, LSEG *lseg)
result = point_dt(pt, lseg-p[0]);
tmpdist = point_dt(pt, lseg-p[1]);
if (tmpdist  result)
 

Re: [HACKERS] Selectivity estimation for inet operators

2014-08-26 Thread Emre Hasegeli
 I agree with you that we can support other join type and anti join later,
 If others don’t have any objection in doing other parts later I will mark as 
 Ready For Committer.

I updated the patch to cover semi and anti joins with eqjoinsel_semi().
I think it is better than returning a constant.  The new version
attached with the new version of the test script.  Can you please
look at it again and mark it as ready for committer if it seems okay
to you?
diff --git a/src/backend/utils/adt/network_selfuncs.c 
b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..eca9e7c 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,669 @@
 /*-
  *
  * network_selfuncs.c
  *   Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, distinct value count, most common
+ * values, and histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *   src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include utils/lsyscache.h
 #include utils/inet.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity constant for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity constant for the other operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for given operator */
+#define DEFAULT_SEL(operator) \
+   ((operator) == OID_INET_OVERLAP_OP ? \
+   DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static Selectivity networkjoinsel_inner(Oid operator,
+VariableStatData *vardata1, 
VariableStatData *vardata2);
+extern double eqjoinsel_semi(Oid operator, VariableStatData *vardata1,
+  VariableStatData *vardata2, RelOptInfo *inner_rel);
+extern RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
+static short int inet_opr_order(Oid operator);
+static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues,
+Datum *constvalue, short int 
opr_order);
+static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1,
+   int nvalues1, Datum *values2, float4 
*numbers2,
+   int nvalues2, Oid operator);
+static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers,
+  int mcv_nvalues, Datum *his_values, int 
his_nvalues,
+  int red_nvalues, short int opr_order,
+  Selectivity *max_selec_pointer);
+static Selectivity inet_his_inclusion_join_selec(Datum *his1_values,
+ int his1_nvalues, Datum *his2_values, 
int his2_nvalues,
+ int red_nvalues, 
short int opr_order);
+static short int inet_inclusion_cmp(inet *left, inet *right,
+  short int opr_order);
+static short int inet_masklen_inclusion_cmp(inet *left, inet *right,
+  short int opr_order);
+static short int inet_his_match_divider(inet *boundary, inet *query,
+  short int opr_order);
+
+/*
+ * Selectivity estimation for the subnet inclusion operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-   PG_RETURN_FLOAT8(0.001);
+   PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+   Oid operator = PG_GETARG_OID(1);
+   List   *args = (List *) PG_GETARG_POINTER(2);
+   int varRelid = PG_GETARG_INT32(3),
+   his_nvalues;
+   VariableStatData vardata;
+   Node   *other;
+   boolvaronleft;
+   Selectivity selec,
+   max_mcv_selec;
+   Datum   constvalue,
+  *his_values;
+   Form_pg_statistic stats;
+   double  nullfrac;
+   FmgrInfoproc;
+
+   /*
+* If expression is not (variable op something) or (something op
+* variable), then punt and return a default estimate.
+*/
+   if (!get_restriction_variable(root, args, varRelid,
+ vardata, 
other, varonleft))
+   PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+
+   

Re: [HACKERS] Selectivity estimation for inet operators

2014-08-31 Thread Emre Hasegeli
 * Isn't X  Y equivalent to network_scan_first(X)  Y AND 
 network_scan_last(X)  Y? Or at least close enough for selectivity 
 estimation purposes? Pardon my ignorance - I'm not too familiar with the 
 inet datatype - but how about just calling scalarineqsel for both bounds?

Actually, X  Y is equivalent to

network_scan_first(X) = network_host(Y) AND
network_scan_last(X) = network_host(Y) AND
network_masklen(X)  network_masklen(X)

but we do not have statistics for neither network_scan_last(X)
nor network_masklen(X).  I tried to find a solution based on
the implementation of the operators.

 * inet_mcv_join_selec() is O(n^2) where n is the number of entries in the 
 MCV lists. With the max statistics target of 1, a worst case query on 
 my laptop took about 15 seconds to plan. Maybe that's acceptable, but you 
 went through some trouble to make planning of MCV vs histogram faster, by 
 the log2 method to compare only some values, so I wonder why you didn't do 
 the same for the MCV vs MCV case?

It was like that in the previous versions.  It was causing worse
estimation, but I was trying to reduce both sides of the lists.  It
works slightly better when only the left hand side of the list is
reduced.  Attached version works like that.

 * A few typos: lenght - length.

Fixed.

Thank you for looking at it.
diff --git a/src/backend/utils/adt/network_selfuncs.c 
b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..a00706c 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,671 @@
 /*-
  *
  * network_selfuncs.c
  *   Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, distinct value count, most common
+ * values, and histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *   src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include utils/lsyscache.h
 #include utils/inet.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity constant for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity constant for the other operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for given operator */
+#define DEFAULT_SEL(operator) \
+   ((operator) == OID_INET_OVERLAP_OP ? \
+   DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static Selectivity networkjoinsel_inner(Oid operator,
+VariableStatData *vardata1, 
VariableStatData *vardata2);
+extern double eqjoinsel_semi(Oid operator, VariableStatData *vardata1,
+  VariableStatData *vardata2, RelOptInfo *inner_rel);
+extern RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
+static short int inet_opr_order(Oid operator);
+static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues,
+Datum *constvalue, short int 
opr_order);
+static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1,
+   int nvalues1, Datum *values2, float4 
*numbers2,
+   int nvalues2, int red_nvalues, Oid 
operator);
+static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers,
+  int mcv_nvalues, Datum *his_values, int 
his_nvalues,
+  int red_nvalues, short int opr_order,
+  Selectivity *max_selec_pointer);
+static Selectivity inet_his_inclusion_join_selec(Datum *his1_values,
+ int his1_nvalues, Datum *his2_values, 
int his2_nvalues,
+ int red_nvalues, 
short int opr_order);
+static short int inet_inclusion_cmp(inet *left, inet *right,
+  short int opr_order);
+static short int inet_masklen_inclusion_cmp(inet *left, inet *right,
+  short int opr_order);
+static short int inet_his_match_divider(inet *boundary, inet *query,
+  short int opr_order);
+
+/*
+ * Selectivity estimation for the subnet inclusion operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-   PG_RETURN_FLOAT8(0.001);
+   PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+   Oid 

Re: [HACKERS] Selectivity estimation for inet operators

2014-08-31 Thread Emre Hasegeli
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  * inet_mcv_join_selec() is O(n^2) where n is the number of entries in 
  the MCV lists. With the max statistics target of 1, a worst case 
  query on my laptop took about 15 seconds to plan. Maybe that's 
  acceptable, but you went through some trouble to make planning of MCV vs 
  histogram faster, by the log2 method to compare only some values, so I 
  wonder why you didn't do the same for the MCV vs MCV case?
 
 Actually, what I think needs to be asked is the opposite question: why is
 the other code ignoring some of the statistical data?  If the user asked
 us to collect a lot of stats detail it seems reasonable that he's
 expecting us to use it to get more accurate estimates.  It's for sure
 not obvious why these estimators should take shortcuts that are not being
 taken in the much-longer-established code for scalar comparison estimates.

It will still use more statistical data, when statistics_target is
higher.  It was not sure that the user wants to spent O(n^2) amount
of time based on statistics_target.  Attached version is without
this optimization.  Estimates are better without it, but planning
takes more time.

 I'm not exactly convinced that the math adds up in this logic, either.
 The way in which it combines results from looking at the MCV lists and
 at the histograms seems pretty arbitrary.

I taught the product of the join will be

(left_mcv + left_histogram) * (right_mcv + right_histogram) * 
selectivity

and tried to calculate it as in the following:

(left_mcv * right_mcv * selectivity) +
(right_mcv * left_histogram * selectivity) +
(left_mcv * right_histogram * selectivity) +
(left_histogram * right_histogram * selectivity)

where left_histogram is

1.0 - left_nullfrac - left_mcv

I fixed calculation for the MCV vs histogram part.  The estimates of
inner join are very close to the actual rows with statistics_target = 1000.
I think the calculation should be right.
diff --git a/src/backend/utils/adt/network_selfuncs.c 
b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..4367f0e 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,655 @@
 /*-
  *
  * network_selfuncs.c
  *   Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, distinct value count, most common
+ * values, and histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *   src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include utils/lsyscache.h
 #include utils/inet.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity constant for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity constant for the other operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for given operator */
+#define DEFAULT_SEL(operator) \
+   ((operator) == OID_INET_OVERLAP_OP ? \
+   DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static Selectivity networkjoinsel_inner(Oid operator,
+VariableStatData *vardata1, 
VariableStatData *vardata2);
+extern double eqjoinsel_semi(Oid operator, VariableStatData *vardata1,
+  VariableStatData *vardata2, RelOptInfo *inner_rel);
+extern RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
+static short int inet_opr_order(Oid operator);
+static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues,
+Datum *constvalue, short int 
opr_order);
+static Selectivity inet_mcv_join_selec(Datum *values1, float4 *numbers1,
+   int nvalues1, Datum *values2, float4 
*numbers2,
+   int nvalues2, Oid operator, Selectivity 
*max_selec_p);
+static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers,
+  int mcv_nvalues, Datum *his_values, int 
his_nvalues,
+  short int opr_order, Selectivity 
*max_selec_p);
+static Selectivity inet_his_inclusion_join_selec(Datum *his1_values,
+ int his1_nvalues, Datum *his2_values, 
int his2_nvalues,
+ short int opr_order);
+static short int inet_inclusion_cmp(inet *left, inet 

Re: [HACKERS] Selectivity estimation for inet operators

2014-08-31 Thread Emre Hasegeli
 What you did there is utterly unacceptable from a modularity standpoint;
 and considering that the values will be nowhere near right, the argument
 that it's better than returning a constant seems pretty weak.  I think
 you should just take that out again.

I will try to come up with a better, data type specific implementation
in a week.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-09-07 Thread Emre Hasegeli
   I updated the patch to cover semi and anti joins with eqjoinsel_semi().
   I think it is better than returning a constant.
 
  What you did there is utterly unacceptable from a modularity standpoint;
  and considering that the values will be nowhere near right, the argument
  that it's better than returning a constant seems pretty weak.  I think
  you should just take that out again.
 
 I will try to come up with a better, data type specific implementation
 in a week.

New version with semi join estimation function attached.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..f6cb2f6 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,819 @@
 /*-
  *
  * network_selfuncs.c
  *	  Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, distinct value count, most common
+ * values, and histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *	  src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include utils/lsyscache.h
 #include utils/inet.h
+#include utils/selfuncs.h
+
+
+/* Default selectivity constant for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity constant for the other operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for given operator */
+#define DEFAULT_SEL(operator) \
+	((operator) == OID_INET_OVERLAP_OP ? \
+			DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
 
+static Selectivity networkjoinsel_inner(Oid operator,
+	VariableStatData *vardata1, VariableStatData *vardata2);
+static Selectivity networkjoinsel_semi(Oid operator,
+	VariableStatData *vardata1, VariableStatData *vardata2);
+static short int inet_opr_order(Oid operator);
+static Selectivity inet_his_inclusion_selec(Datum *values, int nvalues,
+	Datum *constvalue, short int opr_order);
+static Selectivity inet_mcv_join_selec(Datum *mcv1_values,
+	float4 *mcv1_numbers, int mcv1_nvalues, Datum *mcv2_values,
+	float4 *mcv2_numbers, int mcv2_nvalues, Oid operator,
+	Selectivity *max_selec_p);
+static Selectivity inet_mcv_his_selec(Datum *mcv_values, float4 *mcv_numbers,
+	int mcv_nvalues, Datum *his_values, int his_nvalues,
+	short int opr_order, Selectivity *max_selec_p);
+static Selectivity inet_his_inclusion_join_selec(Datum *his1_values,
+	int his1_nvalues, Datum *his2_values, int his2_nvalues,
+	short int opr_order);
+static Selectivity inet_semi_join_selec(bool mcv2_exists, Datum *mcv2_values,
+	int mcv2_nvalues, bool his2_exists, Datum *his2_values,
+	int his2_nvalues, double his2_weight, Datum *constvalue,
+	FmgrInfo *proc, short int opr_order);
+static short int inet_inclusion_cmp(inet *left, inet *right,
+	short int opr_order);
+static short int inet_masklen_inclusion_cmp(inet *left, inet *right,
+	short int opr_order);
+static short int inet_his_match_divider(inet *boundary, inet *query,
+	short int opr_order);
 
+/*
+ * Selectivity estimation for the subnet inclusion operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_FLOAT8(0.001);
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3),
+his_nvalues;
+	VariableStatData vardata;
+	Node	   *other;
+	bool		varonleft;
+	Selectivity selec,
+max_mcv_selec;
+	Datum		constvalue,
+			   *his_values;
+	Form_pg_statistic stats;
+	double		nullfrac;
+	FmgrInfo	proc;
+
+	/*
+	 * If expression is not (variable op something) or (something op
+	 * variable), then punt and return a default estimate.
+	 */
+	if (!get_restriction_variable(root, args, varRelid,
+  vardata, other, varonleft))
+		PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+
+	/*
+	 * Can't do anything useful if the something is not a constant, either.
+	 */
+	if (!IsA(other, Const))
+	{
+		ReleaseVariableStats(vardata);
+		PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+	}
+
+	/* All of the subnet inclusion operators are strict. */
+	if (((Const *) other)-constisnull)
+	{
+		ReleaseVariableStats(vardata);
+		PG_RETURN_FLOAT8(0.0);
+	}
+
+	if (!HeapTupleIsValid(vardata.statsTuple))
+	{
+		ReleaseVariableStats(vardata);
+		PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+	}
+
+	constvalue = ((Const *) other)-constvalue;
+	stats = (Form_pg_statistic) 

Re: [HACKERS] KNN-GiST with recheck

2014-09-14 Thread Emre Hasegeli
I added the point to polygon distance operator patch to the open
CommitFest as ready for committer and added myself as reviewer to
both of the patches.

 I think that for most use cases just some operators require further sorting
 and some of them not. But it could appear one day that some index gives
 part of its knn answers exact and part of them inexact. Same happen to
 recheck of regular operators. Initially recheck flag was defined in
 opclass. But later recheck became runtime flag.

I cannot think of an use case, but it makes sense to add the flag to
the distance function just like the consistent function if we will go
with this implementation.
 
 Cost estimation of GiST is a big problem anyway. It doesn't care (and
 can't) about amount of recheck for regular operators. In this patch, same
 would be for knn recheck. The problem is that touching heap from access
 method breaks incapsulation. One idea about this is to do sorting in
 another nodes. However, I wonder if it would be an overengineering and
 overhead. In attached patch I propose a different approach: put code
 touching heap into separate index_get_heap_values function. Also new
 version of patch includes regression tests and some cleanup.

While looking it at I found a bug.  It returns the second column
in wrong order when both of the distance functions return recheck = true.
Test script attached to run on the regression database.  I tried to
fix but could not.  searchTreeItemDistanceRecheck function is not
very easy to follow.  I think it deserves more comments.


knn-gist-recheck-test-multicolumn.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collation-aware comparisons in GIN opclasses

2014-09-16 Thread Emre Hasegeli
 No.  And we don't know how to change the default opclass without
 breaking things, either.  See previous discussions about how we
 might fix the totally-broken default gist opclass that btree_gist
 creates for the inet type [1].  The motivation for getting rid of that
 is *way* stronger than it might be slow, but there's no apparent
 way to make something else be the default without creating havoc.

Inet case was not the same.  We tried to replace the default opclass
in contrib with another one in core.  It did not work because
pg_dump --binary-upgrade dumps the objects of the extension which
cannot be restored when there is a default opclass for the same
data type.

Changing the default opclasses should work if we make
pg_dump --binary-upgrade dump the default opclasses with indexes
and exclusion constraints.  I think it makes sense to do so in
--binary-upgrade mode.  I can try to come with a patch for this.

I cannot see a way to rename opclasses in core.  I think we can live
with default opclasses which are not named as type_ops.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collation-aware comparisons in GIN opclasses

2014-09-16 Thread Emre Hasegeli
  Changing the default opclasses should work if we make
  pg_dump --binary-upgrade dump the default opclasses with indexes
  and exclusion constraints.  I think it makes sense to do so in
  --binary-upgrade mode.  I can try to come with a patch for this.
 
 Can you explain it a bit more detail? I didn't get it.

pg_upgrade uses pg_dump --binary-upgrade to dump the schema of
the old database.  Now, it generates CREATE INDEX statements without
explicit opclass if opclass is the default.  We can change pg_dump
to generate the statements with opclass even if opclass is the default
in --binary-upgrade mode.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] KNN-GiST with recheck

2014-09-17 Thread Emre Hasegeli
  While looking it at I found a bug.  It returns the second column
  in wrong order when both of the distance functions return recheck = true.
  Test script attached to run on the regression database.  I tried to
  fix but could not.  searchTreeItemDistanceRecheck function is not
  very easy to follow.  I think it deserves more comments.
 
 Fixed, thanks. It was logical error in comparison function implementation.

I managed to break it again by ordering rows only by the second column
of the index.  Test script attached.


knn-gist-recheck-test-secondcolumn.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] KNN-GiST with recheck

2014-09-17 Thread Emre Hasegeli
 I managed to break it again by ordering rows only by the second column
 of the index.  Test script attached.

I was confused.  It is undefined behavior.  Sorry for the noise.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] KNN-GiST with recheck

2014-09-25 Thread Emre Hasegeli
 Fixed, thanks.

Here are my questions and comments about the code.

doc/src/sgml/gist.sgml:812:
be rechecked from heap tuple before tuple is returned.  If
literalrecheck/ flag isn't set then it's true by default for
compatibility reasons.  The literalrecheck/ flag can be used only

Recheck flag is set to false on gistget.c so I think it should say
false by default.  On the other hand, it is true by default on
the consistent function.  It is written as the safest assumption
on the code comments.  I don't know why the safest is chosen over
the backwards compatible for the consistent function.

src/backend/access/gist/gistget.c:505:
   /* Recheck distance from heap tuple if needed */
   if (GISTSearchItemIsHeap(*item) 
   searchTreeItemNeedDistanceRecheck(scan, 
 so-curTreeItem))
   {
   searchTreeItemDistanceRecheck(scan, 
 so-curTreeItem, item);
   continue;
   }

Why so-curTreeItem is passed to these functions?  They can use
scan-opaque-curTreeItem.

src/backend/access/gist/gistscan.c:49:
   /*
* When all distance values are the same, items without recheck
* can be immediately returned.  So they are placed first.
*/
   if (recheckCmp == 0  distance_a.recheck != distance_b.recheck)
   recheckCmp = distance_a.recheck ? 1 : -1;

I don't understand why items without recheck can be immediately
returned.  Do you think it will work correctly when there is
an operator class which will return recheck true and false for
the items under the same page?

src/backend/access/index/indexam.c:258:
   /* Prepare data structures for getting original indexed values from 
 heap */
   scan-indexInfo = BuildIndexInfo(scan-indexRelation);
   scan-estate = CreateExecutorState();
   scan-slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));

With the changes in indexam.c, heap access become legal for all index
access methods.  I think it is better than the previous version but
I am leaving the judgement to someone experienced.  I will try to
summarize the pros and cons of sorting the rows in the GiST access
method, as far as I understand.

Pros:

* It does not require another queue.  It should be effective to sort
  the rows inside the queue the GiST access method already has.
* It does not complicate index access method infrastructure.

Cons:

* It could be done without additional heap access.
* Other access methods could make use of the sorting infrastructure
  one day.
* It could be more transparent to the users.  Sorting information
  could be shown on the explain output.
* A more suitable data structure like binary heap could be used
  for the queue to sort the rows.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-09-27 Thread Emre Hasegeli
 Thanks. Overall, my impression of this patch is that it works very
 well. But damned if I understood *how* it works :-). There's a lot
 of statistics involved, and it's not easy to see why something is
 multiplied by something else. I'm adding comments as I read through
 it.

Thank you for looking at it.  I tried to add more comments to
the multiplications.  New version attached.  It also fixes a bug
caused by wrong operator order used on histogram to histogram
selectivity estimation for inner join.

 I've gotten to the inet_semi_join_selec function:
 
  [function]
 
 This desperately needs comment at the top of the function explaining
 what it does. Let me try to explain what I think it does:
 
 [explanation]

I used your explanation on the new version.

 Now, I think that last step is wrong. Firstly, the Do not bother if
 histogram weight is smaller than 0.1 rule seems bogus. The
 his2_weight is the total number of rows represented by the
 histogram, so surely it can't be less than 1. It can't really be
 less than the statistics target. Unless maybe if the histogram was
 collected when the table was large, but it has since shrunk to
 contain only a few rows, but that seems like a very bizarre corner
 case. At least it needs more comments explaining what the test is
 all about, but I think we should just always use the histogram (if
 it's available).

It was an unnecessary check.  I put an assert instead of it.

 Secondly, if we estimate that there is on average 1.0 matching row
 in the table, it does not follow that the probability that at least
 one row matches is 1.0. Assuming a gaussian distribution with mean
 1.0, the probability that at least one row matches is 0.5. Assuming
 a gaussian distribution here isn't quite right - I guess a Poisson
 distribution would be more accurate - but it sure doesn't seem right
 as it is.

 The error isn't very big, and perhaps you don't run into that very
 often, so I'm not sure what the best way to fix that would be. My
 statistics skills are a bit rusty, but I think the appropriate way
 would be to apply the Poisson distribution, with the estimated
 number of matched rows as the mean. The probability of at least one
 match would be the cumulative distribution function at k=1. It
 sounds like overkill, if this is case occurs only rarely. But then
 again, perhaps it's not all that rare.

A function of his_weight and his_selec could be a better option
than just multiplying them.  I am not sure about the function or
it worths the trouble.  Join selectivity estimation function for
equality doesn't even bother to look at the histograms.  Others
only return constant values.

 That said, I can't immediately find a test case where that error
 would matter. I tried this:
 
 create table inettbl1 (a inet);
 insert into inettbl1 select '10.0.0.' || (g % 255) from
 generate_series(1, 10) g;
 analyze inettbl1;
 explain analyze select count(*) from inettbl1 where a = ANY
 (SELECT a from inettbl1);
 
 The estimate for that is pretty accurate, 833 rows estimated vs 1000
 actual, with the current patch. I'm afraid if we fixed
 inet_semi_join_selec the way I suggest, the estimate would be
 smaller, i.e. more wrong. Is there something else in the estimates
 that accidentally compensates for this currently?

The partial bucket match on inet_his_inclusion_selec() causes low
estimates.  Which also effects non join estimation but not as much as
it effects join estimations.  If that works more correctly, semi
join estimation can be higher than it should be.

network_selfuncs.c:602:
   /* Partial bucket match. */

   left_divider = inet_his_match_divider(left, query, 
 opr_order);
   right_divider = inet_his_match_divider(right, query, 
 opr_order);

   if (left_divider = 0 || right_divider = 0)
   match += 1.0 / pow(2, Max(left_divider, 
 right_divider));

I think this calculation can benefit from a statistical function
more than the semi join.  Using the different bit count as power
of two is the best I could find.  It works quite well on most of
the cases.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..e9f9696 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -3,7 +3,8 @@
  * network_selfuncs.c
  *	  Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * Estimates are based on null fraction, most common values, and
+ * histogram of inet/cidr datatypes.
  *
  * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -16,17 +17,864 @@
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_collation.h
+#include catalog/pg_operator.h
+#include 

Re: [HACKERS] KNN-GiST with recheck

2014-10-06 Thread Emre Hasegeli
 Thanks. The main question now is design of this patch. Currently, it does
 all the work inside access method. We already have some discussion of pro
 and cons of this method. I would like to clarify alternatives now. I can
 see following way:
 
1. Implement new executor node which performs sorting by priority queue.
Let's call it Priority queue. I think it should be separate node from
Sort node. Despite Priority queue and Sort are essentially similar
from user view, they would be completely different in implementation.
2. Implement some interface to transfer distance values from access
method to Priority queue node.

If we assume that all of them need recheck, maybe it can be done
without passing distance values.

3. Somehow tell the planner that it could use Priority queue in
corresponding cases. I see two ways of doing this:
   - Add flag to operator in opclass indicating that index can only
   order by lower bound of col op value, not by col op value itself.
   - Define new relation between operators. Value of one operator could
   be lower bound for value of another operator. So, planner can
 put Priority
   queue node when lower bound ordering is possible from index. Also 
 ALTER
   OPERATOR command would be reasonable, so extensions could upgrade.

I think, it would be better to make it a property of the operator
class.  We can add a column to pg_amop or define another value for
amoppurpose on pg_amop.  Syntax can be something like this:

CREATE OPERATOR CLASS circle_ops DEFAULT
   FOR TYPE circle USING gist AS
   OPERATOR 15  -(circle, point) FOR ORDER BY pg_catalog.float_ops LOWER 
BOUND;

While looking at it, I realize that current version of the patch does
not use the sort operator family defined with the operator class.  It
assumes that the distance function will return values compatible with
the operator.  Operator class definition makes me think that there is
not such an assumption.

 Besides overhead, this way makes significant infrastructural changes. So,
 it may be over-engineering. However, it's probably more clean and beautiful
 solution.
 I would like to get some feedback from people familiar with KNN-GiST like
 Heikki or Tom. What do you think about this? Any other ideas?

I would be happy to test and review the changes.  I think it is nice
to solve the problem in a generalized way improving the access method
infrastructure.  Definitely, we should have a consensus on the design
before working on the infrastructure changes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shapes on the regression test for polygon

2014-10-14 Thread Emre Hasegeli
 I extracted Emre's diagram adjustments from the patch and applied it,
 and no tabs now.  Emre, I assume your regression changes did not affect
 the diagram contents.

Thank you for looking at it.  I wanted to make the tests consistent
with the diagrams.  Now they look better but they still don't make
sense with the tests.  I looked at it some more, and come to the
conclusion that removing them is better than changing the tests.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BRIN range operator class

2014-10-19 Thread Emre Hasegeli
 Once again, many thanks for the review.  Here's a new version.  I have
 added operator classes for int8, text, and actually everything that btree
 supports except:
 bool
 record
 oidvector
 anyarray
 tsvector
 tsquery
 jsonb
 range
 
 since I'm not sure that it makes sense to have opclasses for any of
 these -- at least not regular minmax opclasses.  There are some
 interesting possibilities, for example for range types, whereby we store
 in the index tuple the union of all the range in the block range.

I thought we can do better than minmax for the inet data type,
and ended up with a generalized opclass supporting both inet and range
types.  Patch based on minmax-v20 attached.  It works well except
a few small problems.  I will improve the patch and add into
a commitfest after BRIN framework is committed.

To support more operators I needed to change amstrategies and
amsupport on the catalog.  It would be nice if amsupport can be set
to 0 like amstrategies.

Inet data types accept IP version 4 and version 6.  It is not possible
to represent union of addresses from different versions with a valid
inet type.  So, I made the union function return NULL in this case.
Then, I tried to store if returned value is NULL or not, in
column-values[] as boolean, but it failed on the pfree() inside
brin_dtuple_initilize().  It doesn't seem right to free the values
based on attr-attbyval.

I think the same opclass can be used for geometric types.  I can
rename it to inclusion_ops instead of range_ops.  The GiST opclasses
for the geometric types use bounding boxes.  It wouldn't be possible
to use a different data type in a generic oplass.  Maybe STORAGE
parameter can be used for that purpose.

 (I had an opclass for anyenum too, but on further thought I removed it
 because it is going to be pointless in nearly all cases.)

It can be useful in some circumstances.  We wouldn't lose anything
by supporting more types.  I think we should even add an operator
class for boolean.
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 12ba3f4..7663113 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -249,6 +249,18 @@
  /entry
 /row
 row
+ entryliteralinet_range_ops/literal/entry
+ entrytypeinet/type/entry
+ entry
+  literalamp;amp;/
+  literalgt;gt;/
+  literalgt;gt;=/
+  literallt;lt;/literal
+  literallt;lt;=/literal
+  literal=/
+ /entry
+/row
+row
  entryliteralbpchar_minmax_ops/literal/entry
  entrytypecharacter/type/entry
  entry
@@ -370,6 +382,23 @@
  /entry
 /row
 row
+ entryliteralrange_ops//entry
+ entryany range type/entry
+ entry
+  literalamp;amp;/
+  literalamp;gt;/
+  literalamp;lt;/
+  literalgt;gt;/
+  literallt;lt;/
+  literallt;@/
+  literal=/
+  literal@gt;/
+  literal@gt;/
+ /entry
+ entry
+ /entry
+/row
+row
  entryliteralpg_lsn_minmax_ops/literal/entry
  entrytypepg_lsn/type/entry
  entry
diff --git a/src/backend/access/brin/Makefile b/src/backend/access/brin/Makefile
index ac44fcd..019c582 100644
--- a/src/backend/access/brin/Makefile
+++ b/src/backend/access/brin/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/access/brin
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brin.o brin_pageops.o brin_revmap.o brin_tuple.o brin_xlog.o \
-	   brin_minmax.o
+OBJS = brin.o brin_pageops.o brin_range.o brin_revmap.o brin_tuple.o \
+	   brin_xlog.o brin_minmax.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/brin/brin_range.c b/src/backend/access/brin/brin_range.c
new file mode 100644
index 000..b63b80a
--- /dev/null
+++ b/src/backend/access/brin/brin_range.c
@@ -0,0 +1,323 @@
+/*
+ * brin_range.c
+ *		Implementation of range opclass for BRIN
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/brin/brin_range.c
+ */
+#include postgres.h
+
+#include access/genam.h
+#include access/brin_internal.h
+#include access/brin_tuple.h
+#include access/skey.h
+#include catalog/pg_type.h
+#include utils/datum.h
+#include utils/lsyscache.h
+#include utils/syscache.h
+
+
+/*
+ * Procedure numbers must not collide with BRIN_PROCNUM defines in
+ * brin_internal.h.  Note we only need inequality functions.
+ */
+#define		RANGE_NUM_PROCNUMS		10	/* # support procs */
+#define		PROCNUM_CONTAINS		5
+#define		PROCNUM_UNION			6
+#define		PROCNUM_BEFORE			7	/* required for overright strategy */
+#define		PROCNUM_OVERLEFT		8	/* required for after strategy */
+#define		PROCNUM_OVERLAPS		9	/* required for overlaps strategy */
+#define		PROCNUM_OVERRIGHT		10	/* required for before strategy */
+#define		PROCNUM_AFTER			11	/* required for after strategy */
+#define		PROCNUM_ADJACENT		12	/* required for 

Re: [HACKERS] Selectivity estimation for inet operators

2014-12-02 Thread Emre Hasegeli
 I spent a fair chunk of the weekend hacking on this patch to make
 it more understandable and fix up a lot of what seemed to me pretty
 clear arithmetic errors in the upper layers of the patch.  However,
 I couldn't quite convince myself to commit it, because the business
 around estimation for partial histogram-bucket matches still doesn't
 make any sense to me.  Specifically this:

 /* Partial bucket match. */
 left_divider = inet_hist_match_divider(left, query, opr_codenum);
 right_divider = inet_hist_match_divider(right, query, 
 opr_codenum);

 if (left_divider = 0 || right_divider = 0)
 match += 1.0 / pow(2.0, Max(left_divider, right_divider));

 Now unless I'm missing something pretty basic about the divider
 function, it returns larger numbers for inputs that are further away
 from each other (ie, have more not-in-common significant address bits).
 So the above calculation seems exactly backwards to me: if one endpoint
 of a bucket is close to the query, or even an exact match, and the
 other endpoint is further away, we completely ignore the close/exact
 match and assign a bucket match fraction based only on the further-away
 endpoint.  Isn't that exactly backwards?

You are right that partial bucket match calculation isn't fair on
some circumstances.

 I experimented with logic like this:

 if (left_divider = 0  right_divider = 0)
 match += 1.0 / pow(2.0, Min(left_divider, right_divider));
 else if (left_divider = 0 || right_divider = 0)
 match += 1.0 / pow(2.0, Max(left_divider, right_divider));

 ie, consider the closer endpoint if both are valid.  But that didn't seem
 to work a whole lot better.  I think really we need to consider both
 endpoints not just one to the exclusion of the other.

I have tried many combinations like this.  Including arithmetic,
geometric, logarithmic mean functions.  I could not get good results
with any of them, so I left it in a basic form.

Max() works pretty well most of the time, because if the query matches
the bucket generally it is close to both of the endpoints.  By using
Max(), we are actually crediting the ones which are close to the both
of the endpoints.  

 I'm also not exactly convinced by the divider function itself,
 specifically about the decision to fail and return -1 if the masklen
 comparison comes out wrong.  This effectively causes the masklen to be
 the most significant part of the value (after the IP family), which seems
 totally wrong.  ISTM we ought to consider the number of leading bits in
 common as the primary indicator of how far apart a query and a
 histogram endpoint are.

The partial match calculation with Max() is especially unfair on
the buckets where more significant bits change.  For example 63/8 and
64/8.  Returning -1 instead of a high divider, forces it to use
the divider for the other endpoint.  We consider the number of leading
bits in common as the primary indicator, just for the other endpoint.

I have also experimented with the count of the common bits of
the endpoints of the bucket for better partial match calculation.
I could not find out a meaningful equation with it.

 Even if the above aspects of the code are really completely right, the
 comments fail to explain why.  I spent a lot of time on the comments,
 but so far as these points are concerned they still only explain what
 is being done and not why it's a useful calculation to make.

I couldn't write better comments because I don't have strong arguments
about it.  We can say that we don't try to make use of the both of
the endpoints, because we don't know how to combine them.  We only use
the one with matching family and masklen, and when both of them match
we use the distant one to be on the safer side.

Thank you for looking at it.  Comments look much better now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-12-02 Thread Emre Hasegeli
 Actually, there's a second large problem with this patch: blindly
 iterating through all combinations of MCV and histogram entries makes the
 runtime O(N^2) in the statistics target.  I made up some test data (by
 scanning my mail logs) and observed the following planning times, as
 reported by EXPLAIN ANALYZE:

 explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip;
 explain analyze select * from relays r1, relays r2 where r1.ip  r2.ip;

 stats targeteqjoinsel   networkjoinsel

 100 0.27 ms 1.85 ms
 10002.56 ms 167.2 ms
 1   56.6 ms 13987.1 ms

 I don't think it's necessary for network selectivity to be quite as
 fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning
 time for a query that might need just milliseconds to execute :-(

 It seemed to me that it might be possible to reduce the runtime by
 exploiting knowledge about the ordering of the histograms, but
 I don't have time to pursue that right now.

I make it break the loop when we passed the last possible match. Patch
attached.  It reduces the runtime almost 50% with large histograms.

We can also make it use only some elements of the MCV and histogram
for join estimation.  I have experimented with reducing the right and
the left hand side of the lists on the previous versions.  I remember
it was better to reduce only the left hand side.  I think it would be
enough to use log(n) elements of the right hand side MCV and histogram.
I can make the change, if you think selectivity estimation function
is the right place for this optimization.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index f854847..16f39db 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -612,20 +612,23 @@ inet_hist_value_sel(Datum *values, int nvalues, Datum constvalue,
 		return 0.0;
 
 	query = DatumGetInetPP(constvalue);
 
 	/* left is the left boundary value of the current bucket ... */
 	left = DatumGetInetPP(values[0]);
 	left_order = inet_inclusion_cmp(left, query, opr_codenum);
 
 	for (i = 1; i  nvalues; i++)
 	{
+		if (left_order == 256)
+			break;
+
 		/* ... and right is the right boundary value */
 		right = DatumGetInetPP(values[i]);
 		right_order = inet_inclusion_cmp(right, query, opr_codenum);
 
 		if (left_order == 0  right_order == 0)
 		{
 			/* The whole bucket matches, since both endpoints do. */
 			match += 1.0;
 		}
 		else if ((left_order = 0  right_order = 0) ||
@@ -854,20 +857,23 @@ inet_opr_codenum(Oid operator)
 static int
 inet_inclusion_cmp(inet *left, inet *right, int opr_codenum)
 {
 	if (ip_family(left) == ip_family(right))
 	{
 		int			order;
 
 		order = bitncmp(ip_addr(left), ip_addr(right),
 		Min(ip_bits(left), ip_bits(right)));
 
+		if (order  0)
+			return 256;
+
 		if (order != 0)
 			return order;
 
 		return inet_masklen_inclusion_cmp(left, right, opr_codenum);
 	}
 
 	return ip_family(left) - ip_family(right);
 }
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2014-12-14 Thread Emre Hasegeli
 I thought we can do better than minmax for the inet data type,
 and ended up with a generalized opclass supporting both inet and range
 types.  Patch based on minmax-v20 attached.  It works well except
 a few small problems.  I will improve the patch and add into
 a commitfest after BRIN framework is committed.

I wanted to send a new version before the commitfest to get some
feedback, but it is still work in progress.  Patch attached rebased
to the current HEAD.  This version supports more operators and
box from geometric data types.  Opclasses are renamed to inclusion_ops
to be more generic.  The problems I mentioned remain beause I
couldn't solve them without touching the BRIN framework.

 To support more operators I needed to change amstrategies and
 amsupport on the catalog.  It would be nice if amsupport can be set
 to 0 like am strategies.

I think it would be nicer to get the functions from the operators
with using the strategy numbers instead of adding them directly as
support functions.  I looked around a bit but couldn't find
a sensible way to support it.  Is it possible without adding them
to the RelationData struct?

 Inet data types accept IP version 4 and version 6.  It isn't possible
 to represent union of addresses from different versions with a valid
 inet type.  So, I made the union function return NULL in this case.
 Then, I tried to store if returned value is NULL or not, in
 column-values[] as boolean, but it failed on the pfree() inside
 brin_dtuple_initilize().  It doesn't seem right to free the values
 based on attr-attbyval.

This problem remains.  There is also a similar problem with the
range types, namely empty ranges.  There should be special cases
for them on some of the strategies.  I tried to solve the problems
in several different ways, but got a segfault one line or another.
This makes me think that BRIN framework doesn't support to store
different types than the indexed column in the values array.
For example, brin_deform_tuple() iterates over the values array and
copies them using the length of the attr on the index, not the length
of the type defined by OpcInfo function.  If storing another types
aren't supported, why is it required to return oid's on the OpcInfo
function.  I am confused.

I didn't try to support other geometric types than box as I couldn't
managed to store a different type on the values array, but it would
be nice to get some feedback about the overall design.  I was
thinking to add a STORAGE parameter to the index to support other
geometric types.  I am not sure that adding the STORAGE parameter
to be used by the opclass implementation is the right way.  It
wouldn't be the actual thing that is stored by the index, it will be
an element in the values array.  Maybe, data type specific opclasses
is the way to go, not a generic one as I am trying.


brin-inclusion-v02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2015-02-15 Thread Emre Hasegeli
New version of the patch attached with the optimization to break the
loop before looking at all of the histogram values.  I can reduce
join selectivity estimation runtime by reducing the values of the
left hand side or both of the sides, if there is interest.

  Even if the above aspects of the code are really completely right, the
  comments fail to explain why.  I spent a lot of time on the comments,
  but so far as these points are concerned they still only explain what
  is being done and not why it's a useful calculation to make.

 I couldn't write better comments because I don't have strong arguments
 about it.  We can say that we don't try to make use of the both of
 the endpoints, because we don't know how to combine them.  We only use
 the one with matching family and masklen, and when both of them match
 we use the distant one to be on the safer side.

I added two more sentences to explain the calculation.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index 73fc1ca..51a33c2 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,972 @@
 /*-
  *
  * network_selfuncs.c
  *	  Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * This module provides estimators for the subnet inclusion and overlap
+ * operators.  Estimates are based on null fraction, most common values,
+ * and histogram of inet/cidr columns.
  *
  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *	  src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
 #include utils/inet.h
+#include utils/lsyscache.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity for the various inclusion operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for specified operator */
+#define DEFAULT_SEL(operator) \
+	((operator) == OID_INET_OVERLAP_OP ? \
+	 DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static Selectivity networkjoinsel_inner(Oid operator,
+	 VariableStatData *vardata1, VariableStatData *vardata2);
+static Selectivity networkjoinsel_semi(Oid operator,
+	VariableStatData *vardata1, VariableStatData *vardata2);
+static Selectivity mcv_population(float4 *mcv_numbers, int mcv_nvalues);
+static Selectivity inet_hist_value_sel(Datum *values, int nvalues,
+	Datum constvalue, int opr_codenum);
+static Selectivity inet_mcv_join_sel(Datum *mcv1_values,
+  float4 *mcv1_numbers, int mcv1_nvalues, Datum *mcv2_values,
+  float4 *mcv2_numbers, int mcv2_nvalues, Oid operator);
+static Selectivity inet_mcv_hist_sel(Datum *mcv_values, float4 *mcv_numbers,
+  int mcv_nvalues, Datum *hist_values, int hist_nvalues,
+  int opr_codenum);
+static Selectivity inet_hist_inclusion_join_sel(Datum *hist1_values,
+			 int hist1_nvalues,
+			 Datum *hist2_values, int hist2_nvalues,
+			 int opr_codenum);
+static Selectivity inet_semi_join_sel(Datum lhs_value,
+   bool mcv_exists, Datum *mcv_values, int mcv_nvalues,
+   bool hist_exists, Datum *hist_values, int hist_nvalues,
+   double hist_weight,
+   FmgrInfo *proc, int opr_codenum);
+static int	inet_opr_codenum(Oid operator);
+static int	inet_inclusion_cmp(inet *left, inet *right, int opr_codenum);
+static int inet_masklen_inclusion_cmp(inet *left, inet *right,
+		   int opr_codenum);
+static int inet_hist_match_divider(inet *boundary, inet *query,
+		int opr_codenum);
+
+/*
+ * Selectivity estimation for the subnet inclusion/overlap operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_FLOAT8(0.001);
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3);
+	VariableStatData vardata;
+	Node	   *other;
+	bool		varonleft;
+	Selectivity selec,
+mcv_selec,
+non_mcv_selec;
+	Datum		constvalue,
+			   *hist_values;
+	int			hist_nvalues;
+	Form_pg_statistic stats;
+	double		sumcommon,
+nullfrac;
+	FmgrInfo	proc;
+
+	/*
+	 * If expression is not (variable op something) or (something op
+	 * variable), then punt and return a default estimate.
+	 */
+	if (!get_restriction_variable(root, args, varRelid,
+  vardata, other, varonleft))
+		PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+
+	/*
+	 * Can't do anything useful if the something is not a constant, either.
+	 */
+	if (!IsA(other, Const))
+	{
+		

Re: [HACKERS] BRIN range operator class

2015-05-06 Thread Emre Hasegeli
 Looking at patch 04, it seems to me that it would be better to have
 the OpcInfo struct carry the typecache struct rather than the type OID,
 so that we can avoid repeated typecache lookups in brin_deform_tuple;

 Here's the patch.

Looks better to me.  I will incorporate with this patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-06 Thread Emre Hasegeli
 Can you please explain what is the purpose of patch 07?  I'm not sure I
 understand; are we trying to avoid having to add pg_amproc entries for
 these operators and instead piggy-back on btree opclass definitions?
 Not too much in love with that idea; I see that there is less tedium in
 that the brin opclass definition is simpler.  One disadvantage is a 3x
 increase in the number of syscache lookups to get the function you need,
 unless I'm reading things wrong.  Maybe this is not performance critical.

It doesn't use btree opclass definitions.  It uses brin opclass
pg_amop entries instead of duplicating them in pg_amproc.
The pg_amproc.h header says:

 * The amproc table identifies support procedures associated with index
 * operator families and classes.  These procedures can't be listed in pg_amop
 * since they are not the implementation of any indexable operator.

In our case, these procedures can be listed in pg_amop as they
are implementations of indexable operators.

The more important change on this patch is to request procedures for
the right data types.  Minmax opclasses return wrong results without
this patch.  You can reproduce it with this query on
the regression database:

select * from brintest where timestampcol = '1979-01-29 11:05:09'::timestamptz;

 Anyway I tried applying it on isolation, and found that it fails the
 assertion that tests the union support proc in brininsert.  That
 doesn't seem okay.  I mean, it's okay not to run the test for the
 inclusion opclasses, but why does it now fail in minmax which was
 previously passing?  Couldn't figure it out.

The regression tests passed when I tried it on the current master.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-10 Thread Emre Hasegeli
 I pushed patches 04 and 07, as well as adopting some of the changes to
 the regression test in 06.  I'm afraid I caused a bit of merge pain for
 you -- sorry about that.

No problem.  I rebased the remaining ones.


brin-inclusion-v09-02-strategy-numbers.patch
Description: Binary data


brin-inclusion-v09-03-remove-assert-checking.patch
Description: Binary data


brin-inclusion-v09-05-box-vs-point-operators.patch
Description: Binary data


brin-inclusion-v09-06-inclusion-opclasses.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-05 Thread Emre Hasegeli
 Indeed, I have done some testing of the patch but more people testing would
 be nice.

The inclusion opclass should work for other data types as long
required operators and SQL level support functions are supplied.
Maybe it would work for PostGIS, too.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-05 Thread Emre Hasegeli
 Nice, I think it is ready now other than the issues Alvaro raised in his
 review[1]. Have you given those any thought?

I already replied his email [1].  Which issues do you mean?

[1] 
http://www.postgresql.org/message-id/CAE2gYzxQ-Gk3q3jYWT=1enlebsgcgu28+1axml4omcwjbkp...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-04-14 Thread Emre Hasegeli
 Judging from a quick look, I think patches 1 and 5 can be committed
 quickly; they imply no changes to other parts of BRIN.  (Not sure why 1
 and 5 are separate.  Any reason for this?)  Also patch 2.

Not much reason except that 1 includes only functions, but 5 includes operators.

 Patch 4 looks like a simple bugfix (or maybe a generalization) of BRIN
 framework code; should also be committable right away.  Needs a closer
 look of course.

 Patch 3 is a problem.  That code is there because the union proc is only
 used in a corner case in Minmax, so if we remove it, user-written Union
 procs are very likely to remain buggy for long.  If you have a better
 idea to test Union in Minmax, or some other way to turn that stuff off
 for the range stuff, I'm all ears.  Just lets make sure the support
 procs are tested to avoid stupid bugs.  Before I introduced that, my
 Minmax Union proc was all wrong.

I removed this test because I don't see a way to support it.  I
believe any other implementation that is more complicated than minmax
will fail in there.  It is better to cache them with the regression
tests, so I tried to improve them.  GiST, SP-GiST and GIN don't have
similar checks, but they have more complicated user defined functions.

 Patch 7 I don't understand.  Will have to look closer.  Are you saying
 Minmax will depend on Btree opclasses?  I remember thinking in doing it
 that way at some point, but wasn't convinced for some reason.

No, there isn't any additional dependency.  It makes minmax operator
classes use the procedures from the pg_amop instead of adding them to
pg_amproc.

It also makes the operator class safer for cross data type usage.
Actually, I just checked and find out that we got wrong answers from
index on the current master without this patch.  You can reproduce it
with this query on the regression database:

select * from brintest where timestampcol = '1979-01-29 11:05:09'::timestamptz;

inclusion-opclasses patch make it possible to add cross type brin
regression tests.  I will add more of them on the next version.

 Patch 6 seems the real meat of your own stuff.  I think there should be
 a patch 8 also but it's not attached ... ??

I had another commit not to intended to be sent.  Sorry about that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] point_ops for GiST

2015-06-13 Thread Emre Hasegeli
 Emre Hasegeli just pointed out to me that this patch introduced
 box_contain_pt() and in doing so used straight C comparison (= etc)
 instead of FPlt() and friends.  I would think that that's a bug and
 needs to be changed -- but certainly not backpatched, because gist
 indexes would/might become corrupt.

The problem with this is BRIN inclusion opclass uses some operators to
implement others.  It was using box @ point operator to implement
point ~= point operator by indexing points in boxes.  The former
doesn't use the macros, but later does.  The opclass could return
wrong result when the point right near the index boundaries.

Currently, there are not BRIN opclasses for geometric types except box
because of this reason.  I would like to work on supporting them for
the next release.  I think the best way is to change the operators
which are not using the macros to be consistent with the others.  Here
is the list:

* polygon  polygon
* polygon  polygon
* polygon  polygon
* polygon  polygon
* polygon | polygon
* polygon | polygon
* polygon | polygon
* polygon | polygon
* box @ point
* point @ box
* lseg @ box
* circle @ point
* point @ circle

I can send a patch, if it is acceptable.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-20 Thread Emre Hasegeli
 to handle DROP dependency behaviors properly.  (On reflection, maybe
 better if it's bernoulli(internal) returns tablesample_handler,
 so as to guarantee that such a function doesn't create a conflict with
 any user-defined function of the same name.)

 The probability of conflict seems high with the system() so yeah we'd need
 some kind of differentiator.

Maybe it would be even better to have something like bernoulli(tablesample)
where tablesample defined as pseudo-type like trigger.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improve BRIN documentation somewhat

2015-07-21 Thread Emre Hasegeli
 Emre Hasegeli told me on IM he's going to submit a patch to add
 something similar for the inclusion opclass framework.

It is attached.  Thank you for pushing forward to improve the documentation.


0001-Improve-BRIN-documentation-for-Inclusion-opclass.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-10-25 Thread Emre Hasegeli
> Currently PostgreSQL only has trigonometric functions that work in
> radians. I think it would be quite useful to have an equivalent set of
> functions that worked in degrees. In other environments these are
> commonly spelled sind(), cosd(), etc.

I would prefer gradian over degree.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support

2015-11-07 Thread Emre Hasegeli
Thank you for working on this.

I tried the patch with a Turkish dictionary [1] I could find on the
Internet.  It worked for some words, but not others:

> hasegeli=# create text search dictionary hunspell_tr (template = ispell, 
> dictfile = tr, afffile = tr);
> CREATE TEXT SEARCH DICTIONARY
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilki'); -- The root "fox"
> ---
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkinin'); -- Genitive form, 
> affix 3290
> ts_lexize
> ---
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkiler'); -- Plural form, affix 
> 4371
> ts_lexize
> ---
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkiyi'); -- Accusative form, 
> affix 2646
> ts_lexize
> ---
>
> (1 row)

It seems to have something to do with the order of the affixes.  It
works, if I move affix 2646 to the beginning of the list.

[1] https://tr-spell.googlecode.com/files/dict_aff_5000_suffix_113_words.zip


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] point_ops for GiST

2015-10-12 Thread Emre Hasegeli
>> Pls. don't misunderstand my questions: They are directed to get an
>> even more useful spatial data handling of PostgreSQL. I'm working with
>> PostGIS since years and are interested in any work regarding spatial
>> types...
>>
>> Can anyone report use cases or applications of these built-in geometric
>> types?
>>
>> Would'nt it be even more useful to concentrate to PostGIS
>> geometry/geography types and extend BRIN to these types?
>
>
> Note, that PostGIS is a different project which is maintained by separate
> team. PostGIS have its own priorities, development plan etc.
> PostgreSQL have to be self-consistent. In particular, it should have
> reference implementation of operator classes which extensions can use as the
> pattern. This is why it's important to maintain built-in geometric types.
>
> In short: once we implement it for built-in geometric types, you can ask
> PostGIS team to do the same for their geometry/geography.

The problem is that geometric types don't even serve well to this
purpose in their current state.  We have to make the operators
consistent to better demonstrate index support of cross type
operators.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] point_ops for GiST

2015-10-12 Thread Emre Hasegeli
> This was already fixed for GiST.
> See following discussion
> http://www.postgresql.org/message-id/capphfdvgticgniaj88vchzhboxjobuhjlm6c09q_op_u9eo...@mail.gmail.com
> and commit
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3c29b196b0ce46662cb9bb7a1f91079fbacbcabb
> "Consistent" method of GiST influences only search and can't lead to corrupt
> indexes. However, "same" method can lead to corrupt indexes.
> However, this is not the reason to not backpatch the changes and preserve
> buggy behaviour; this is the reason to recommend reindexing to users. And it
> was already backpatched.

Fixing it on the opclass is not an option for BRIN.  We designed BRIN
opclasses extensible using extra SQL level support functions and
operators.  It is possible to support point datatype using box vs
point operators.  Doing so would lead to wrong results, because point
operators use FP macros, but box_contain_pt() doesn't.

GiST opclass could be more clean and extensible, if we wouldn't have
those macros.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Segfault while using an array domain

2015-11-29 Thread Emre Hasegeli
I was getting segfaults while working on the current master for a while.
This is the shortest way I could found to reproduce the problem:

create or replace function is_distinct_from(anyelement, anyelement)
returns boolean language sql
as 'select $1 is distinct from $2';

create operator !== (
procedure = is_distinct_from,
leftarg = anyelement,
rightarg = anyelement
);

create domain my_list int[] check (null !== all (value));

create table my_table (my_column my_list);

insert into my_table values ('{1}');
insert into my_table values ('{1}');

Here is the backtrace:

> * thread #1: tid = 0x108710, 0x0001040ebf82 
> postgres`MemoryContextDelete(context=0x7f7f7f7f7f7f7f7f) + 18 at mcxt.c:205, 
> queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=EXC_I386_GPFLT)
>   * frame #0: 0x0001040ebf82 
> postgres`MemoryContextDelete(context=0x7f7f7f7f7f7f7f7f) + 18 at mcxt.c:205
> frame #1: 0x000103ea60ac postgres`fmgr_sql(fcinfo=0x7fa5e50150a8) 
> + 252 at functions.c:1047
> frame #2: 0x000103e9f6f5 
> postgres`ExecEvalScalarArrayOp(sstate=0x7fa5e5015038, 
> econtext=, isNull="", isDone=) + 885 at 
> execQual.c:2660
> frame #3: 0x000103ea1bb4 
> postgres`ExecEvalCoerceToDomain(cstate=, 
> econtext=0x7fa5e6065110, isNull="", isDone=) + 180 at 
> execQual.c:4009
> frame #4: 0x000103ea208a postgres`ExecProject + 39 at execQual.c:5345
> frame #5: 0x000103ea2063 postgres`ExecProject(projInfo=, 
> isDone=0x7fff5bef58bc) + 387 at execQual.c:5560
> frame #6: 0x000103eb96a3 postgres`ExecResult(node=0x7fa5e6064ff8) 
> + 179 at nodeResult.c:155
> frame #7: 0x000103e9b57c 
> postgres`ExecProcNode(node=0x7fa5e6064ff8) + 92 at execProcnode.c:392
> frame #8: 0x000103eb5f12 
> postgres`ExecModifyTable(node=0x7fa5e6064ea0) + 434 at 
> nodeModifyTable.c:1331
> frame #9: 0x000103e9b5bb 
> postgres`ExecProcNode(node=0x7fa5e6064ea0) + 155 at execProcnode.c:396
> frame #10: 0x000103e97a90 postgres`standard_ExecutorRun [inlined] 
> ExecutePlan(estate=, planstate=0x7fa5e6064ea0, 
> use_parallel_mode='\0', operation=, numberTuples=0, 
> direction=, dest=) + 87 at execMain.c:1566
> frame #11: 0x000103e97a39 
> postgres`standard_ExecutorRun(queryDesc=0x7fa5e6061038, 
> direction=, count=0) + 201 at execMain.c:338
> frame #12: 0x000103fc18da 
> postgres`ProcessQuery(plan=0x7fa5e604fbd8, sourceText="insert into 
> my_table values ('{1}');", params=0x, 
> dest=0x7fa5e604fcd0, completionTag="") + 218 at pquery.c:185
> frame #13: 0x000103fc0ddb 
> postgres`PortalRunMulti(portal=0x7fa5e480a238, isTopLevel='\x01', 
> dest=0x7fa5e604fcd0, altdest=0x7fa5e604fcd0, completionTag="") + 331 
> at pquery.c:1283
> frame #14: 0x000103fc06f8 
> postgres`PortalRun(portal=0x7fa5e480a238, count=9223372036854775807, 
> isTopLevel='\x01', dest=0x7fa5e604fcd0, altdest=0x7fa5e604fcd0, 
> completionTag="") + 552 at pquery.c:812
> frame #15: 0x000103fbe8d6 postgres`PostgresMain + 48 at 
> postgres.c:1105
> frame #16: 0x000103fbe8a6 postgres`PostgresMain(argc=, 
> argv=, dbname=, username=) + 9414 at 
> postgres.c:4032
> frame #17: 0x000103f503c8 postgres`PostmasterMain [inlined] 
> BackendRun + 8328 at postmaster.c:4237
> frame #18: 0x000103f503a2 postgres`PostmasterMain [inlined] 
> BackendStartup at postmaster.c:3913
> frame #19: 0x000103f503a2 postgres`PostmasterMain at postmaster.c:1684
> frame #20: 0x000103f503a2 postgres`PostmasterMain(argc=, 
> argv=) + 8290 at postmaster.c:1292
> frame #21: 0x000103ed759f postgres`main(argc=, 
> argv=) + 1567 at main.c:223
> frame #22: 0x7fff8f1245c9 libdyld.dylib`start + 1

I can reproduce it on 9.5 branch too, but not on 9.4 branch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2016-01-03 Thread Emre Hasegeli
> But is it? Is it impossible for the BRIN bitmap index scan to return 0 rows
> (say, if the value being matched is outside the min/max boundary for every
> block range?) Granted, if we document that it always returns 0 and should be
> ignored, then confusing the actual 0 with the 0 as a representation of
> “unknown” would be less a problem.

How about -1 ?  It is an impossible value for sure.  Maybe we should
change BitmapAnd and BitmapOr nodes, too.  It is better to make it
obvious that it is not the correct value.  I don't think many people
would notice the note on the documentation.

On the other hand, returning -1 broke parser of explain.depesz.com [1].

[1] http://explain.depesz.com/s/tAkd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Emre Hasegeli
> which is much closer to the actual number of rows removed by the index
> recheck + the one left.

Is it better to be closer?  We are saying those are the "actual"
values not the estimates.  If we cannot provide the actual rows, I
think it is better to provide nothing.  Something closer to the
reality would create more confusion.  Maybe, we just just return the
number of blocks, and put somewhere a note about it.  The actual row
count is already available on the upper part of the plan.

By the way, the estimation is a bigger problem than that.  Please see
my patch [1] about it.

[1] 
http://www.postgresql.org/message-id/cae2gyzzjvzpy-1csgzjjyh69izsa13segfc4i4r2z0qbq2p...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Emre Hasegeli
> I don’t see how to solve this problem without changing explain analyze output 
> to accommodate for “unknown” value. I don’t think “0” is a non-confusing 
> representation of “unknown” for most people, and from the practical 
> standpoint, a “best effort” estimate is better than 0 (i.e. I will be able to 
> estimate how efficient BRIN index is for my tables in terms of the number of 
> tuples retrieved/thrown away)

The number of retrieved and thrown away rows are already available on
the upper part of the plan.  Bitmap Index Scan should provide the rows
that matched the index.  Another alternative would be just returning
the number of matching pages (by not multiplying with 10).  It might
be better understood.  The users who can understand the EXPLAIN
ANALYZE output shouldn't be expecting BRIN to return rows.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2015-12-24 Thread Emre Hasegeli
> The patch is attached.

Now, it is actually attached.


brin-correlation-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2015-12-24 Thread Emre Hasegeli
> Somebody wrote to me a few days ago that the BRIN cost estimation is
> rather poor.  One immediately obvious issue which I think is easily
> fixed is the correlation estimate, which is currently hardcoded to 1.
>
> Since a BRIN scan always entails a scan of the relation in physical
> order, it's simple to produce a better estimate for that, per the
> attached patch.  (Note: trying to run this on expression indexes will
> cause a crash.  I need to complete that part ...)
>
> There are other improvements we've been discussing, but I'm unclear that
> I will have time to study them to get a useful patch quickly enough.  If
> somebody else wants to mess with this, please get in touch.
>
> Thoughts?

As we have discusses, the correlation is not used for bitmap index
scan cost estimation.  I think the best we can do in here is to somehow
increase the selectivity.  I suggest something like this:

/*
* Index selectivity is important for the planner to calculate the cost
* of the bitmap heap scan.  Unfortunately, we don't have a robust way to
* estimate selectivity of BRIN.  It can dependent on many things.  This
* is a long rationale about the incomplete calculation we have at the
* moment.
*
* Our starting point is that BRIN selectivity has to be less than the
* selectivity of the btree.  We are using a product of logical and
* physical selectivities to achieve this.  The equality of
*
* (1 + logical_selectivity) * (1 + physical_selectivity) - 1
*
* is used to make sure the result is not less than any of the values.
*
* The logical selectivity is calculated using the indexable expressions
* of the WHERE clause.  The physical selectivity is calculated using
* the block proportion and the maximum correlation.  The block
* proportion is a comparable value with selectivity.  It is the
* selectivity of the smallest unit of the index.  The final selectivity
* can never be less than that.
*
* Using the contrary of the correlation by subtracting it from 1 is not
* not really a comparable value with the selectivity.  It is just a
* value between 0 and 1.  On the other hand, it is the only value
* related to the BRIN quality, we have available right now.  We are
* using the arithmetic of it with the block proportion to normalise it.
* This part of the physical selectivity is likely to be more effective
* than the block proportion in many circumstances as there would be
* many blocks on big tables.
*
* Using the contrary of the correlation of a column as selectivity of
* the index is wrong in many ways.  First of all, it cannot be applied
* to all BRIN operator classes.  It makes sense for the main built-in
* operator class "minmax", and makes a little sense for the other one
* "inclusion".  It wouldn't probably make any sense for a bloom filter
* implementation, if there would be any.  Maybe, we should push down
* this function to the operator class, but there is not enough reason
* to do it right now.
*
* Second, correlation is not dependent to any indexed expression.  It
* probably doesn't make any sense for the complicated operators.  It
* would probably effect basic comparison operators differently than
* equality operator.  The effect would even differ by count of those
* expressions.  For example, x IN (10, 20, 30) would be effected from
* correlation more than x = 15, even when their selectivities are the
* same.
*
* Last but not least, the correlation is a single value for the whole
* range.  The indexed table can partly be very well correlated, but
* the correlation value can still be very low.  For example, if a
* perfectly correlated table is copied 4 times, the correlation would
* be 0.25, although the index would be almost as good as the version on
* the initial table.  Or the expression can match the better correlated
* part of the table.  It is not hard to imagine more scenarios where
* the correlation is a bad value to use as the selectivity.  We should
* probably improve this by collecting more statistics, one day.
*
* Another problem in here is that the caller assumes the selectivity
* by tuples.  It might have been better, if we had a way to return it
* as some number of pages.  On the other hand, even though we know about
* the index, it is not too much easier for us to estimate the number of
* matching pages then it is for the caller.  We are likely to make too
* much mistake by relying on the correlation, anyway.  We are at least
* not making it worse in here.
*/
blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
selec = (1 + qualSelectivity) * (1 + blockSelectivity) - 1;
CLAMP_PROBABILITY(selec);

The patch is attached.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new full text search configurations

2015-11-21 Thread Emre Hasegeli
> I checked new snowball site http://snowballstem.org/ and found several new
> stemmers appeared (as external contributions):
>
> Irish and Czech
> Object Pascal codegenerator for Snowball
> Two stemmers for Romanian
> Hungarian
> Turkish
> Armenian
> Basque (Euskera)
> Catalan
>
> Some of them we don't have in our list of default configurations. Since
> these are external, not official stemmers, it'd be nice if  people  look and
> test them. If they are fine, we can prepare new configurations for 9.6.

We have configurations for the ones included to the Snowball, namely
Romanian, Hungarian, and Turkish.  I don't know why the others are not
included but listed on the page as external contributions.  It might
be a good idea to wait for someone to commit them to the upstream.

I have checked the changes on the algorithms [1].  They don't seemed
to be updated much after 2007, but recently a new one for Tamil
language is added.  It might be a good candidate for a new
configuration.

[1] https://github.com/snowballstem/snowball/commits/master/algorithms


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-06-04 Thread Emre Hasegeli
> The main problem being solved is the use of a SETOF result.  I'm inclined to
> prefer that the final, single, result is still an array.

I have changed it like that.  New patch attached.

> I've got a style issue with the information_schema - I like to call it
> useless-use-of-E'' -  but that was there long before this patch...

We better not touch it.

> /* user mustn't specify 'g' for regexp_split */ - do we add "or
> regexp_match" or just removed the extraneous detail?

I don't think it would be a nice error message.

> There seems to be scope creep regarding "regexp_split_to_table" that I'm
> surprised to find.  Related to that is the unexpected removal of the
> "force_glob" parameter to setup_regexp_matches.  You took what was a single
> block of code and now duplicated it without any explanation in the commit
> message (a code comment wouldn't work for this kind of change).  The change
> to flags from passing a pointer to text to passing in a pointer to a
> previously derived pg_re_flags makes more sense on its face, and it is
> apparently a non-public API, but again constitutes a refactoring that at
> least would ideally be a separate commit from the one the introduces the new
> behavior.

That check doesn't belong to setup_regexp_matches() in the first place.
The arguments of the function are organised to be caller agnostic,
and then it gives an error on behalf of regexp_split().  The check
fits better to the regexp_split() functions even with duplication.

I can split it to another patch, but I think these kind of changes most
often go together.

Would you mind adding yourself to the reviewers on the Commitfest app?
I think you have already read though it.

Thanks for all the feedback.
From a6f24b34fdb39eaaca1d3819f7f528b1689725a4 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli <e...@hasegeli.com>
Date: Sun, 29 May 2016 18:53:37 +0200
Subject: [PATCH] Add regexp_match()

---
 doc/src/sgml/citext.sgml  |   5 ++
 doc/src/sgml/func.sgml|  62 +-
 src/backend/utils/adt/regexp.c| 119 +-
 src/include/catalog/pg_proc.h |   4 ++
 src/include/utils/builtins.h  |   2 +
 src/test/regress/expected/regex.out   |  28 
 src/test/regress/expected/strings.out |   4 +-
 src/test/regress/sql/regex.sql|   7 ++
 8 files changed, 183 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 7fdf302..9b4c68f 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry';
   
 
   
Similarly, all of the following functions perform matching
case-insensitively if their arguments are citext:
   
 
   

 
+  regexp_match()
+
+   
+   
+
   regexp_matches()
 


 
   regexp_replace()
 


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff7545d..64e975e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1935,32 +1935,49 @@
 or, if the argument is null, return NULL.
 Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)
'42.5'
   
 
   

 
+ regexp_match
+
+regexp_match(string text, pattern text [, flags text])
+   
+   text[]
+   
+Return a single captured substring resulting from matching a POSIX
+regular expression against the string. See
+ for more information.
+   
+   regexp_match('foobarbequebaz', '(bar)(beque)')
+   {bar,beque}
+  
+
+  
+   
+
  regexp_matches
 
 regexp_matches(string text, pattern text [, flags text])

setof text[]

 Return all captured substrings resulting from matching a POSIX regular
 expression against the string. See
  for more information.

-   regexp_matches('foobarbequebaz', '(bar)(beque)')
-   {bar,beque}
+   regexp_matches('foobarbequebaz', 'ba.', 'g')
+   {bar}{baz} (2 rows)
   
 
   

 
  regexp_replace
 
 regexp_replace(string text, pattern text, replacement text [, flags text])

text
@@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression
 pattern matching


 substring


 regexp_replace


+regexp_match
+   
+   
 regexp_matches


 regexp_split_to_table


 regexp_split_to_array

 

@@ -4168,66 +4188,78 @@ substring('foobar' from 'o(.)b')   o
 regexp_replace('foobarbaz', 'b..', 'X')
fooXbaz
 regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regex

[HACKERS] Floating point comparison inconsistencies of the geometric types

2016-05-27 Thread Emre Hasegeli
There are those macros defined for the built-in geometric types:

> #define EPSILON 1.0E-06

> #define FPzero(A)   (fabs(A) <= EPSILON)
> #define FPeq(A,B)   (fabs((A) - (B)) <= EPSILON)
> #define FPne(A,B)   (fabs((A) - (B)) > EPSILON)
> #define FPlt(A,B)   ((B) - (A) > EPSILON)
> #define FPle(A,B)   ((A) - (B) <= EPSILON)
> #define FPgt(A,B)   ((A) - (B) > EPSILON)
> #define FPge(A,B)   ((B) - (A) <= EPSILON)

with this warning:

>  *XXX These routines were not written by a numerical analyst.

Most of the geometric operators use those macros for comparison, but
those  do not:

* polygon << polygon
* polygon &< polygon
* polygon &> polygon
* polygon >> polygon
* polygon <<| polygon
* polygon &<| polygon
* polygon |&> polygon
* polygon |>> polygon
* box @> point
* point <@ box
* lseg <@ box
* circle @> point
* point <@ circle

This is really a bug that needs to be fixed one way or another.  I think
that it is better to fix it by removing the macros all together.  I
am not sure how useful they are in practice.  I haven't seen anyone
complaining about the above operators not using the macros.  Though,
people often complain about the ones using the macros and the problems
caused by them.

The hackers evidently don't like the macros, either.  That should be
why they are not used on the new operators.  What annoys me most about
this situation is the inconsistency blocks demonstrating our indexes.
Because of this, we had to rip out point type support from BRIN
inclusion operator class which could be useful to PostGIS.

Fixing it has been discussed many times before [1][2][3][4] with
no result.  Here is my plan to fix the situation covering the problems
around it:

1) Reimplement some operators to avoid divisions

The attached patch does it on some of the line operators.

2) Use exact comparison on everywhere except the operators fuzzy
   comparison is certainly required

The attach patch changes all operators except some "lseg" operators.
"lseg" stores two points on a line.  Most of the operations done on it
are lossy.  I don't see a problem treating them differently, those
operators are very unlikely to be index supported.

3) Check numbers for underflow and overflow

I am thinking to use CHECKFLOATVAL on utils/adt/float.c, but it is not
exposed outside at the moment.  Would it be okay to create a new header
file utils/float.h to increase code sharing between float and geometric
types?

4) Implement relative fuzzy comparison for the remaining operators

It is better to completely get rid of those macros while we are on it.
I think we can get away by implementing just a single function for fuzzy
equality, not for other comparisons.  I am inclined to put it to
utils/adt/float.c.  Is this a good idea?

Tom Lane commented on the function posted to the list [1] on 2002:

> Not like that. Perhaps use a fraction of the absolute value of the
> one with larger absolute value.  As-is it's hard to tell how FLT_EPSILON
> is measured.

I cannot image how the function would look like.  I would appreciate
any guidance.

5) Implement default hash operator class for all geometric types

This would solve most complained problem of those types allowing them
to used with DISTINCT and GROUP BY.

6) Implement default btree operator class at least for the point type

This would let the point type to be used with ORDER BY.  It is
relatively straight forward to implement it for the point type.  Is it
a good idea to somehow implement it for other types?

7) Add GiST index support for missing cross type operators

Currently only contained by operators are supported by an out-of-range
strategy number.  I think we can make the operator classes much nicer
by allowing really cross type operator families.

Comments?

[1] 
https://www.postgresql.org/message-id/flat/d90a5a6c612a39408103e6ecdd77b8290fd...@voyager.corporate.connx.com
[2] https://www.postgresql.org/message-id/flat/4a7c2c4b.5020...@netspace.net.au
[3] https://www.postgresql.org/message-id/flat/12549.1346111...@sss.pgh.pa.us
[4] 
https://www.postgresql.org/message-id/flat/20150512181307.gj2...@alvh.no-ip.org
From aa79e331595860489cdbbdce2d5f35a7d1f33783 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli <e...@hasegeli.com>
Date: Wed, 25 May 2016 17:53:19 +0200
Subject: [PATCH] Stop using FP macros on geo_ops.c

---
 src/backend/access/gist/gistproc.c|  17 +-
 src/backend/access/spgist/spgkdtreeproc.c |  24 +--
 src/backend/utils/adt/geo_ops.c   | 312 +++---
 src/backend/utils/adt/geo_spgist.c|  24 +--
 src/include/utils/geo_decls.h |  16 --
 src/test/regress/expected/point.out   |   8 +-
 6 files changed, 193 insertions(+), 208 deletions(-)

diff --git a/src/backend/acces

Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2016-05-29 Thread Emre Hasegeli
> My interpretation of the standard is that FILTER is not allowable for
> a window function, and IGNORE|RESPECT NULLS is not allowable for an
> ordinary aggregate.

Yes, it is clear.

> So if we support IGNORE|RESPECT NULLS for anything other than a window
> function, we have to come up with our own semantics.

I don't think this clause is useful for aggregates especially while we
already have the FILTER clause.  Though, I can see this error message
being useful:

> ERROR:  IGNORE NULLS is only implemented for the lead and lag window functions

Can we still give this message when the syntax is not part of the OVER clause?

Thank you for returning back to this patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] regexp_match() returning text

2016-05-29 Thread Emre Hasegeli
Attached patch adds regexp_match() function which is a simple variant of
regexp_matches() that doesn't return a set.  It is based on Tom Lane's
comment to bug #10889 [1].

[1] https://www.postgresql.org/message-id/23769.1404747...@sss.pgh.pa.us
From f8c113c77864ef1ca6386195aea02b2090ff17b6 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli <e...@hasegeli.com>
Date: Sun, 29 May 2016 18:53:37 +0200
Subject: [PATCH] Add regexp_match()

---
 doc/src/sgml/citext.sgml   |   5 ++
 doc/src/sgml/func.sgml |  59 +++---
 src/backend/catalog/information_schema.sql |   2 +-
 src/backend/utils/adt/regexp.c | 125 +
 src/include/catalog/pg_proc.h  |   4 +
 src/include/utils/builtins.h   |   2 +
 src/test/regress/expected/regex.out|  28 +++
 src/test/regress/expected/strings.out  |   4 +-
 src/test/regress/sql/regex.sql |   7 ++
 9 files changed, 188 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 7fdf302..9b4c68f 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry';
   
 
   
Similarly, all of the following functions perform matching
case-insensitively if their arguments are citext:
   
 
   

 
+  regexp_match()
+
+   
+   
+
   regexp_matches()
 


 
   regexp_replace()
 


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff7545d..503dfe5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1935,20 +1935,37 @@
 or, if the argument is null, return NULL.
 Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)
'42.5'
   
 
   

 
+ regexp_match
+
+regexp_match(string text, pattern text [, flags text])
+   
+   text
+   
+Return a single captured substring resulting from matching a POSIX regular
+expression against the string. See
+ for more information.
+   
+   regexp_match('foobarbequebaz', 'barbeque')
+   barbeque
+  
+
+  
+   
+
  regexp_matches
 
 regexp_matches(string text, pattern text [, flags text])

setof text[]

 Return all captured substrings resulting from matching a POSIX regular
 expression against the string. See
  for more information.

@@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression
 pattern matching


 substring


 regexp_replace


+regexp_match
+   
+   
 regexp_matches


 regexp_split_to_table


 regexp_split_to_array

 

@@ -4168,66 +4188,81 @@ substring('foobar' from 'o(.)b')   o
 regexp_replace('foobarbaz', 'b..', 'X')
fooXbaz
 regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
fooXarYXazY
 

 
 
+ The regexp_match function returns the first captured
+ substring resulting from matching a POSIX regular expression
+ pattern.  It has the syntax
+ regexp_match(string, pattern
+ , flags ).
+ If the pattern does not match, the function returns
+ NULL.  It ignores the parenthesized subexpressions in
+ the pattern.  regexp_matches can be used in
+ this case (see below for details).
+ The flags parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior.  Supported flags
+ are described in .
+
+
+
  The regexp_matches function returns a text array of
  all of the captured substrings resulting from matching a POSIX
- regular expression pattern.  It has the syntax
- regexp_matches(string, pattern
- , flags ).
+ regular expression pattern.  It has the same syntax as
+ regexp_match.
  The function can return no rows, one row, or multiple rows (see
  the g flag below).  If the pattern
  does not match, the function returns no rows.  If the pattern
  contains no parenthesized subexpressions, then each row
  returned is a single-element text array containing the substring
  matching the whole pattern.  If the pattern contains parenthesized
  subexpressions, the function returns a text array whose
  n'th element is the substring matching the
  n'th parenthesized subexpression of the pattern
  (not counting non-capturing parentheses; see below for
  details).
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
- function's behavior

Re: [HACKERS] create opclass documentation outdated

2016-03-10 Thread Emre Hasegeli
>> In create_opclass.sgml, not only do we have the list of AMs supporting
>> STORAGE, but there is also a paragraph describing which AMs do what
>> for input datatypes of FUNCTION members (around line 175).
>
> I placed BRIN together with gist/gin/spgist here, namely that the optype
> defaults to the opclass' datatype.

Requiring STORAGE type for BRIN opclasses is more of a bug than a
feature.  None of the operator classes actually support storing values
with different type.  We had intended to support it for inclusion
opclass to index points by casting them to box, but then ripped it out
because of inconsistencies on the operators caused by floating point
arithmetics.

The problem is that the operator classes try to query the strategies
using the datatype they got from TupleDesc structure.  It fails at
least for cidr, because it is casted implicitly and indexed as inet.
All of the BRIN operator classes can fail the same way for user
defined types.  Adding STORAGE to all operator classes have seemed to
me like an easy fix at the time I noticed this problem, but what we
really need to do is to fix this than document.  We should to make
BRIN use the datatype of the operator class as btree does.

> +   of each operator class interpret the strategy nnumbers according to the

Typo: nnumbers.

> +   Operator classes based on the Inclusion framework can
> +   theoretically support cross-data-type operations, but there's no
> +   demonstrating implementation yet.

This part of the inclusion class is not committed, so this paragraph
shouldn't be there.

Other than those, the changes look good to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-18 Thread Emre Hasegeli
Here are my first comments.  I haven't read the actual index
implementation, yet.

I think traversal value is a useful addition.  It is nice that the
implementation for the range types is also changed.  My questions
about them are:

Do reconstructedValues is required now?  Wouldn't it be cleaner to use
the new field on the prefix tree implementation, too?

We haven't had specific memory context for reconstructedValues.  Why
is it required for the new field?

> +   Memory for traversalValues should be allocated in
> +   the default context, but make sure to switch to
> +   traversalMemoryContext before allocating memory
> +   for pointers themselves.

This sentence is not understandable.  Shouldn't it say "the memory
should *not* be allocated in the default context"?  What are the
"pointers themselves"?

The box opclass is broken because of the floating point arithmetics of
the box type.  You can reproduce it with the attached script.  We
really need to fix the geometric types, before building more on them.
They fail to serve the only purpose they are useful for, demonstrating
features.

It think the opclass should support the cross data type operators like
box @> point.  Ideally we should do it by using multiple opclasses in
an opfamily.  The floating problem will bite us again in here, because
some of the operators are not using FP macros.

The tests check not supported operator "@".  It should be "<@".  It is
nice that the opclass doesn't support long deprecated operators.

We needs tests for the remaining operators and for adding new values
to the index.  I am not sure create_index.sql is the best place for
them.  Maybe we should use the test scripts of "box".

> + #define LT  -1
> + #define GT   1
> + #define EQ   0

Using these numbers is a very well established pattern.  I don't think
macros make the code any more readable.

> + /* SP-GiST API functions */
> + Datum   spg_box_quad_config(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_choose(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_picksplit(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);

I guess they should go to the header file.


box-index-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Emre Hasegeli
Here are my comments about the operator class implementation:

> + *implementation of quad-4d tree over boxes for SP-GiST.

Isn't the whole thing actually 3D?

> +  * For example consider the case of intersection.

No need for a new line after this.

> +  * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner 
> nodes.

I couldn't get the term 4D point.  Maybe it means that we are using
box datatype as the prefix, but we are not treating them as boxes.

> + * We use traversalValue to calculate quadrant bounds from parent's quadrant
> + * bounds.

I couldn't understand this sentence.  We are using the parent's prefix
and the quadrant to generate the traversalValue.  I think this is the
crucial part of the opclass.  It would be nice to explain it more
clearly.

> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

2016.

> + *  src/backend/utils/adt/boxtype_spgist.c

Maybe we should name this file as geo_spgist.c to support other types
in the future.

> + #include "utils/builtins.h";

Extra ;.

> + #include "utils/datum.h"

I think this is unnecessary.

> + /* InfR type implements doubles and +- infinity */
> + typedef struct
> + {
> +int infFlag;
> +double  val;
> + }   InfR;

Why do we need this?  Can't we store infinity in "double"?

> + /* wrap double to InfR */
> + static InfR
> + toInfR(double v, InfR * r)
> + {
> +r->infFlag = NotInf;
> +r->val = v;
> + }

This isn't returning the value.

> + typedef struct
> + {
> +Range   range_x;
> +Range   range_y;
> + }   Rectangle;

Wouldn't it be simpler to just using BOX instead of this?

> + static void
> + evalIRangeBox(const IRangeBox *range_box, const Range *range, const int 
> half1,
> +  const int half2, IRangeBox *new_range_box)
>
> + static void
> + evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
> + const uint8 quadrant, IRectBox * new_rect_box)
>
> + inline static void
> + initializeUnboundedBox(IRectBox * rect_box)

Wouldn't it be simpler to palloc and return the value on those functions?

> + static int
> + intersect2D(const Range * range, const IRangeBox * range_box)

Wouldn't it be better to return "bool" on those functions.

> +return ((p1 >= 0) && (p2 <= 0));

Extra parentheses.

> + iconst spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
> + ispgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);

Many variables are defined with "const".  I am not sure they are all
that doesn't change.  If it applies to the pointer, "out" should also
not change in here.  Am I wrong?

> +/*
> + * Begin. This block evaluates the median of coordinates of boxes
> + */

I would rather explain what the function does on the function header.

> + memcpy(new_rect_box, rect_box, sizeof(IRectBox));

Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.

> + for (i = 0; flag && i < in->nkeys && flag; i++)

It checks flag two times.

> +boxPointerToRectangle(
> +DatumGetBoxP(in->scankeys[i].sk_argument),
> +p_query_rect
> +);

I don't think this matches the project coding style.

> + int flag = 1,

Wouldn't it be better as "bool"?

The regression tests for the remaining operators are still missing.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN is missing in multicolumn indexes documentation

2016-03-23 Thread Emre Hasegeli
> I guess multicolumn BRIN behaves similarly to B-tree or GiST. But I'm
> no expert, so I need someone knowledgeable to confirm this. If the
> following wording is OK, I will update the patch.

Multicolumn BRIN is like GIN.  Every column is indexed separately.
The order of the columns doesn't matter.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-24 Thread Emre Hasegeli
> +  * boxtype_spgist.c

The names on the file header need to be changed, too.

> I'll try to explain with two-dimensional example over points. ASCII-art:
>|
>|
> 1  |  2
>|
> ---+-
>|P
>  3 |  4
>|
>|
>
> '+' with 'A' represents a centroid or, other words, point which splits plane
> for 4 quadrants and it strorend in parent node. 1,2,3,4 are labels of
> quadrants, each labeling will be the same for all pictures and all
> centriods, and i will not show them in pictures later to prevent too
> complicated images. Let we add C - child node (and again, it will split
> plane for 4 quads):
>
>
>| |
>+-+---
>  X |B|C
>| |
> ---+-+---
>|P|A
>| |
>|
>|
>
> A and B are points of intersection of lines. So, box PBCAis a bounding box
> for points contained in 3-rd (see labeling above). For example X labeled
> point is not a descendace of child node with centroid  C because it must be
> in branch of 1-st quad of parent node. So, each node (except root) will have
> a limitation in its quadrant. To transfer that limitation the traversalValue
> is used.
>
> To keep formatting I attached a txt file with this fragment.

Thank you for the explanation.  Should we incorporate this with the patch.

>>> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development
>>> Group
>>
>> 2016.
>
> fixed

Not on the patch.

> + cmp_double(const double a, const double b)

Does this function necessary?  We can just compare the doubles.

> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)

The "rectangle" variable in here should be renamed.

> + Assert(is_infinite(b) == 0);

This is failing on my test.  You can reproduce with the script I have sent.

>> Wouldn't it be simpler to palloc and return the value on those functions?
>
> evalRangeBox() initializes part of RectBox, so, it could not palloc its
> result.
> Other methods use the same signature just for consistency.

I couldn't understand it.  evalRangeBox() can palloc and return the
result.  evalRectBox() can return the result palloc'ed by
evalRangeBox().  The caller wouldn't need to palloc anything.

>> Many variables are defined with "const".  I am not sure they are all
>> that doesn't change.  If it applies to the pointer, "out" should also
>> not change in here.  Am I wrong?
>
> No, changes

I now read about "const".  I am sorry for not being experienced in C.
The new version of the patch marks them as "const" by mistake.

I went through all other variables:

> + int r = is_infinite(a);
>
> + double  x = *(double *) a;
> + double  y = *(double *) b;
>
> + spgInnerConsistentIn *in = (spgInnerConsistentIn *) 
> PG_GETARG_POINTER(0);
>
> + spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
>
> + BOX*leafBox = DatumGetBoxP(in->leafDatum);

Shouldn't they be "const", too?

>>> +/*
>>> + * Begin. This block evaluates the median of coordinates of boxes
>>> + */
>>
>> I would rather explain what the function does on the function header.
>
> fixed

The "end" part of it is still there.

>> Do we really need to copy the traversalValues on allTheSame case.
>> Wouldn't it work if just the same value is passed for all of them.
>> The search shouldn't continue after allTheSame case.
>
> Seems, yes. spgist tree could contain a locng branches with allTheSame.

Does SP-GiST allows any node under allTheSame to not being allTheSame?
 Not setting traversalValues for allTheSame worked fine with my test.

> + if (in->allTheSame)

Most of the things happening before this check is not used in there.
Shouldn't we move this to the top of the function?

> + out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);

This could go before allTheSame block.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-27 Thread Emre Hasegeli
>>> I'll try to explain with two-dimensional example over points. ASCII-art:
>>
>> Thank you for the explanation.  Should we incorporate this with the patch.
>
> added

I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.

>>> + cmp_double(const double a, const double b)
>>
>> Does this function necessary?  We can just compare the doubles.
>
> Yes, it compares with limited precision as it does by geometry operations.
> Renamed to point actual arguments.

I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.

>>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
>>
>> The "rectangle" variable in here should be renamed.
>
> fixed

I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.

> evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
> evalRangeBox() will palloc its result then we need to copy its result into
> RangeBox struct and free. Let both fucntion use the same interface.

I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.

> it works until allthesame branch contains only one inner node. Otherwise
> traversalValue will be freeed several times, see spgWalk().

It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist


q4d-5.patch
Description: Binary data


box-index-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SP-GiST support for inet datatypes

2016-03-02 Thread Emre Hasegeli
Attached patches add SP-GiST support to the inet datatypes.  The operator
class comes with a small change on the SP-GiST framework to allow fixed
number of child nodes.

The index is like prefix tree except that it doesn't bother to split the
addresses into parts as text is split.  It also doesn't use labels to know
the part after the prefix, but relies on static node numbers.

The GiST index released with version 9.4 performs really bad with real
world data.  SP-GiST works much better with the query posted to the
performance list [1] a while ago:

> hasegeli=# SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn =
2914;
> SELECT 732
>
> hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
>QUERY PLAN
>

>  Nested Loop  (cost=0.41..571742.27 rows=2248 width=7) (actual
time=12.643..20474.813 rows=8127 loops=1)
>->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
time=0.017..0.524 rows=732 loops=1)
>->  Index Only Scan using route_gist on routes  (cost=0.41..552.05
rows=22900 width=7) (actual time=4.851..27.948 rows=11 loops=732)
>  Index Cond: (route && (hmm.route)::inet)
>  Heap Fetches: 8127
>  Planning time: 1.507 ms
>  Execution time: 20475.605 ms
> (7 rows)
>
> hasegeli=# DROP INDEX route_gist;
> DROP INDEX
>
> hasegeli=# CREATE INDEX route_spgist ON routes USING spgist (route);
> CREATE INDEX
>
> hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
>   QUERY PLAN
>
-
>  Nested Loop  (cost=0.41..588634.27 rows=2248 width=7) (actual
time=0.081..16.961 rows=8127 loops=1)
>->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
time=0.022..0.079 rows=732 loops=1)
>->  Index Only Scan using route_spgist on routes  (cost=0.41..575.13
rows=22900 width=7) (actual time=0.014..0.021 rows=11 loops=732)
>  Index Cond: (route && (hmm.route)::inet)
>  Heap Fetches: 8127
>  Planning time: 1.376 ms
>  Execution time: 15.936 ms

[1]
http://www.postgresql.org/message-id/flat/alpine.DEB.2.11.1508251504160.31004@pyrite#alpine.DEB.2.11.1508251504160.31004@pyrite


spgist-fixed-nnodes.patch
Description: Binary data


inet-spgist-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-03 Thread Emre Hasegeli
> Emre, I checked original thread and didn't find sample data. Could you 
> provide them for testing ?

I found it on the Git history:

https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-09 Thread Emre Hasegeli
>> Spgist index tree is much better  than gist - 12149 pages vs 1334760 !

I assume this is the reason why it is bigger.  IP addresses are very
well suited to SP-GiST.  They naturally do not overlap.

> I also noticed, that spgist is much faster than gist for other inet
> operators. I'd like to see in 9.6.

Unfortunately, I missed the deadline of the last commitfest.  It is on
the next one:

https://commitfest.postgresql.org/10/571/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-04-04 Thread Emre Hasegeli
I think the variable "magick" should be "magic".  Patch attached.


bloom-magick.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-31 Thread Emre Hasegeli
> Thank you, pushed

Thank you for working on this.

I noticed that have overlooked this:

 static RectBox *
-initRectBox()
+initRectBox(void)
 {


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alter or rename enum value

2016-03-28 Thread Emre Hasegeli
> I do not know whether this would be a meaningful improvement for
> common use-cases, though.  (It'd help if people were more specific
> about the use-cases they need to work.)

For what its worth, in the company I am working for, InnoGames GmbH,
not being able to alter enums is the number one pain point with
PostgreSQL.  We have hundreds of small databases on the backend of the
game worlds.  They are heavily using enums.  New values to the enums
need to be added or to be removed for the new features.  We are
relying on the transactions for the database migrations, so we
couldn't make use of ALTER TYPE ADD VALUE.

Some functions found on the Internet [1] which change the values on
the catalog had been used, until they corrupted some indexes.  (I
believe those functions are still quite common.)  Now, we are using a
function to replace an enum type on all tables with another one, but
we are not at all happy with this solution.  It requires all objects
which were using the enum to be dropped and recreated, and it rewrites
the tables, so it greatly increases the migration time and effort.

[1] http://en.dklab.ru/lib/dklab_postgresql_enum/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per-statement-level INSTEAD OF triggers

2016-08-10 Thread Emre Hasegeli
> It might be more useful after we get the infrastructure that Kevin's been
> working on to allow collecting all the updates into a tuplestore that
> could be passed to a statement-level trigger.  Right now I tend to agree
> that there's little point.

Maybe, this can be used to re-implement FOREIGN KEYs.  Never-ending
bulk DELETEs caused by lack of indexes on foreign key columns are
biting novice users quite often in my experience.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-11 Thread Emre Hasegeli
> I think that probably the most reasonable answer is to replace all the
> raw "double" comparisons in this code with float8_cmp_internal() or
> something that's the moral equivalent of that.  Does anyone want to
> propose something else?

We can probably get away by changing the comparison on the GiST code.
It is not likely to cause inconsistent results.  Comparisons with NaN
coordinates don't return true to anything, anyway:

# select '(3,4),(nan,5)'::box = '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

# select '(3,4),(nan,5)'::box < '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

# select '(3,4),(nan,5)'::box > '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

> More generally, this example makes me fearful that NaN coordinates in
> geometric values are likely to cause all sorts of issues.  It's too late
> to disallow them, probably, but I wonder how can we identify other bugs
> of this ilk.

Is it reasonable to disallow NaN coordinates on the next major
release.  Are there any reason to deal with them?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-02-01 Thread Emre Hasegeli
> Got it, but if other people don't agree then this is going nowhere.

Yes.  As I wrote, I don't particularly care about functions like "is
point on line".  I can prepare a patch to fix as many problems as
possible around those operators by preserving the current epsilon.

I though we were arguing about *all* operators.  Having containment
and placement operators consistent with each other, is the primary
thing I am trying to fix.  Is removing epsilon from them is
acceptable?

We can also stay away from changing operators like "~=" to minimise
compatibility break, if we keep the epsilon on some places.  We can
instead document this operator as "close enough", and introduce
another symbol for really "the same" operator.

That said, there are some places where it is hard to decide to apply
the epsilon or not.  For example, we can keep the epsilon to check for
two lines being parallel, but then should we return the intersection
point, or not?  Those issues may become more clear when I start
working on it, if preserving epsilon for those operators is the way to
go forward.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-02-01 Thread Emre Hasegeli
> I think this patch is already in a good shape.

I am sorry for introducing this bug.  This fix looks good to me as well.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Emre Hasegeli
> Backing up a bit here, have we lost track of the problem that we're
> trying to solve?  Tom gave his opinion here:
>
> https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us
>
> But I don't see that the latest patch I can find does anything to fix
> that.

This is what he wrote:

> As I understand it, the key problem is that tests like "is point on line"
> would basically never succeed except in the most trivial cases, because of
> roundoff error.  That's not very nice, and it might cascade to larger
> problems like object-containment tests failing unexpectedly.  We would
> need to go through all the geometric operations and figure out where that
> kind of gotcha is significant and what we can do about it.  Seems like a
> fair amount of work :-(.  If somebody's willing to do that kind of
> investigation, then sure, but I don't think just blindly removing these
> macros is going to lead to anything good.

I re-implemented "is point on line" operator on my patch so that it
would, at least, work on the most trivial cases.  This is not very
nice, but it is predictable.  I have tried to prevent it cascade to
larger problems like object-containment tests failing unexpectedly.  I
have gone through all of the geometric operations and re-implemented
the ones with similar problems, too.  It is no work at all compared to
discussions we have to have :-).

My initial idea was to keep the fuzzy behaviour for some operators
like "is point on line", but the more I get into it the less value I
see in doing so.  The same family operators like "is point on line" is
currently badly broken:

> postgres=# select '(1,0)'::point ## '{1,2,0}'::line;
> ?column?
> --
> (2,2)
> (1 row)

This makes me wonder if anybody is ever using those operators.  In the
end, I don't care about those operators.  They are unlikely to be
supported by indexes.  I can simplify my patch to leave them as they
are.

> Now maybe that's not the problem that Emre is trying to solve,
> but then it is not very clear to me what problem he IS trying to
> solve.  And I think Kyotaro Horiguchi is trying to solve yet another
> problem which is again different.  So IMHO the first task here is to
> agree on a clear statement of what we'd like to fix, and then, given a
> patch, we can judge whether it's fixed.

I listed the problems I am trying to solve in here:

https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com

> Maybe I'm being dumb here and it's clear to you guys what the issues
> under discussion are.  If so, apologies for that, but the thread has
> gotten too theoretical for me and I can't figure out what the
> top-level concern is any more.  I believe we all agree these macros
> are bad, but there seems to be no agreement that I can discern on what
> would be better or why.

We couldn't agree on how we should treat on floating point numbers.  I
think we should threat them just like the "float" datatype.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Press Release Draft - 2016-02-09 Cumulative Update

2017-02-07 Thread Emre Hasegeli
> As there are a lot of updates I did my best to consolidate some of the
> bullet points and as usual, people are directed to the release notes.
> Please let me know if there are any inaccuracies so I can fix them ASAP.

Just some minor points:

>  * Several fixes for PostgreSQL operating in hot standby mode

It sounded unnatural to me.  Maybe it is better without "PostgreSQL".

>  * Several vacuum and autovacuum fxies

Typo

>  * Several Unicode fixes

It sounded alarming to me.  I see just one related item on the release
notes.  Maybe we can clarify the problem.

>  * Sync our copy of the timezone library with IANA release tzcode2016j

This is repeated on the following sentence.

>  BEGIN;
>  DROP INDEX bad_index_name;
>  CREATE INDEX CONCURRENTLY bad_index_name ON table_name (column_name); /* 
> replace names with your original index definition */
>  COMMIT;

Maybe you meant CREATE INDEX without CONCURRENTLY?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-25 Thread Emre Hasegeli
I am responding both of your emails together.

> Perhaps I don't understand it. Many opclass are defined for
> btree. But since (ordinary) btree handles only one-dimentional,
> totally-orderable sets, geometric types even point don't fit
> it. Even if we relaxed it by removing EPSILON, multidimentional
> data still doesn't fit.

Yes, multidimentional data doesn't fit into btree.  Let's say we would
have to introduce operators called *<, *<=, *>, *>= to support btree
opclass for point.  I agree those operators would be useless, but
btree opclass is used for other purposes too.  "ORDER BY point",
merge-joins, btree index support for ~= would be useful.

Lack of ORDER BY, GROUP BY, and DISTINCT support is the major
inconvenience about the geometric types.  There are many complaints
about this on the mailing list.  Many frameworks and ORMs are not
prepared to deal with getting an error when they use certain types on
those clauses.

>> - Only some operators are using the epsilon.
>>
>> I included the list of the ones which doesn't use the epsilon on
>> initial post of this thread.  This makes the operators not compatible
>> with each other.  For example, you cannot say "if box_a @> box_b and
>> box_b @> point then box_a @> box_b".
>
> (It must be "then box_a @> point".)

Yes, I am sorry.

>  Both of the operators don't
> seem to use EPSILON so the transitivity holds, but perhaps we
> should change them to more appropriate shape, that is, to use
> EPSILON. Even having the tolerance, we can define the operators
> to keep the transitivity.

Yes, that would be another way.  In my view, this would make things
worse, because I believe epsilon approach is completely broken.  The
operators which are not using the epsilon are the only ones people can
sanely use to develop applications.

>> > - The application of epsilon is implementation dependent.
>> >
>> > Epsilon works between two numbers.  There is not always a single way
>> > of implementing geometric operators.  This is surprising to the users.
>> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
>> > epsilon".
>>
>> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
>> ..". I'm not sure off-hand. But I don't see the relationship is
>> significant.

What I meant was the behaviour being unclear to even people who knows
about the epsilon.  If two boxes are overlapping with each other, one
who knows about EPSILON can expect the distance between them to be
less than EPSILON, but this wouldn't be true.

>> - Some operators are violating commutative property.
>>
>> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>
> Whether EPSILON is introduced or not, commutativity cannot be
> assured for it from calculation error, I suppose.

It can easily be assured by treating both sides of the operator the
same.  It is actually assured on my patch.

>> - Some operators are violating transitive property.
>>
>> For example, you cannot say "if point_a ~= point_b and point_b ~=
>> point_c then point_a ~= point_c".
>
> It holds only in ideal (or mathematical) world, or for similars
> of integer (which are strictly implemented mathematica world on
> computer).  Without the EPSILON, two points hardly match by
> floating point error, as I mentioned. I don't have an evidence
> though (sorry), mere equality among three floating point numbers
> is not assured.

Of course, it is assured.  Floating point numbers are deterministic.

>> - The operators with epsilon are not compatible with float operators.
>>
>> This is also surprising for the users.  For example, you cannot say
>> "if point_a << point_b then point_a[0] < point_b[0]".
>
> It is because "is strictly left of" just doesn't mean their
> x-coordinates are in such a relationship. The difference might be
> surprising but is a specification (truely not a bug:).

Then what does that mean?  Every operator with EPSILON is currently
surprising in different way.  People use databases to build
applications.  They often need to implement same operations on the
application side.  It is very hard to be bug-compatible with our
geometric types.

>> = Missing Bits on the Operator Classes

> This final section seems to mention the application but sorry, it
> still don't describe for me what application that this patch
> aiming to enable. But I can understand that you want to handle
> floating point numbers like integers. The epsilon is surely quite
> annoying for the purpose.

I will try to itemise what applications I am aiming to enable:

- Applications with very little GIS requirement

PostGIS can be too much work for an application in s different
business domain but has a little requirement to GIS.  The built-in
geometric types are incomplete to support real world geography.
Getting rid of epsilon would make this worse. In contrary, it would
allow people to deal with the difficulties on their own.

- Applications with geometric objects

I believe people could make use the geometric 

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-26 Thread Emre Hasegeli
> Even though I'm not sure but I don't see a "natural" (or
> agreeable by many poeple) ordering of geometric types in
> general. Anyway it's quite application (not application program
> but the relationship with the real world) specific.

We can just define it for point as "ORDER BY point.x, point.y".

> What we should not forget is that PostGIS does the same thing and
> it is widly used (I believe..). This means it not broken at least
> on a certain context. But it is a fact that it also quite
> inconvenient to get performance from, say, indexes.

I understand from Paul Ramsey's email [1] on this thread that PostGIS
doesn't currently have a tolerance.

> Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
> kind of stable behavior for differences under the level. But on
> the analogy of comparisons of floating point numbers, I suppose
> that inequality comparison could be done without the tolerance.

What do you mean exactly?

>> >> - Some operators are violating commutative property.
>> >>
>> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>> >
>> > Whether EPSILON is introduced or not, commutativity cannot be
>> > assured for it from calculation error, I suppose.
>>
>> It can easily be assured by treating both sides of the operator the
>> same.  It is actually assured on my patch.
>
> It surely holds for certain cases. Even how many applicable cases
> we guess, finally we cannot proof that it works generally. Just
> three times of 1/3 rotation breakes it.

It is a different story.  We cannot talk about commutative property of
rotation function that we currently don't have, because it would be an
asymmetrical operator.

The parallel operator is currently marked as commutative.  The planner
is free to switch the sides of the operation.  Therefore this is not
only a surprise, but a bug.

> Hmm, I have nothing more to say if you don't agree that floating
> point numbers involving any kind of arithmetic is hardly
> deterministic especially not defining its usage.

The floating point results are not random.  There are certain
guarantees.  The transitive property of equality is one of them.  Our
aim should be making things easier for our users by providing more
guarantees not breaking what is already there.

> The world of the geometric types in PostgreSQL *is* built
> so. There is nothing different with that Windows client can make
> different results from PostgreSQL on a Linux server for a simple
> floating point arithmetics, or even different binaries made by
> different version of compilers on the same platoform can. Relying
> on such coherency by accident is a bad policy.

Yes, the results are not portable.  We should only rely on the results
being stable on the same build.  The epsilon doesn't cure this
problem.  It arguably makes it worse.

> PostGIS or libgeos seems to prove it. They are designed exactly
> for this purpose and actually used.

Yes, PostGIS is a GIS application.  We are not.  Geometric types name
suggests to me them being useful for general purpose.

> So, the union of the two requirements seems to be having such
> parameter as a GUC.

That sounds doable to me.  We can use this opportunity to make all
operators consistent.  So the epsilon would apply to the ones that it
current is not.  We can still add btree and hash opclasses, and make
them give an error when this GUC is not 0.  We can even make this or
another GUC apply to floats making whole system more consistent.

Though, I know the community is against behaviour changing GUCs.  I
will not spend more time on this, before I get positive feedback from
others.

[1] 
https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gin and btree_gist for enums

2017-02-25 Thread Emre Hasegeli
> The reason this is kind of scary is that it's just blithely assuming
> that the function won't look at the *other* fields of the FmgrInfo.
> If it did, it would likely get very confused, since those fields
> would be describing the GIN support function, not the function we're
> calling.

I am sorry if it doesn't make sense, but wouldn't the whole thing be
better, if we refactor enum.c to call only he internal functions with
TypeCacheEntry just like the rangetypes?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-08-21 Thread Emre Hasegeli
> Here is v4, which changes the command from ALTER VALUE to RENAME VALUE,
> for consistency with RENAME ATTRIBUTE.

It looks like we always use "ALTER ... RENAME ... old_name TO
new_name" syntax, so it is better that way.  I have noticed that all
the other RENAMEs go through ExecRenameStmt().  We better do the same.

> +int nbr_index;
> +Form_pg_enum nbr_en;

What is "nbr"?  Why not calling them something like "i" and "val"?

> +   /* Locate the element to rename and check if the new label is already in

The project style for multi-line commands is to leave the first line
of the comment block empty.

> +if (strcmp(NameStr(nbr_en->enumlabel), oldVal) == 0)

I found namestrcmp() for this.

> +}
> +if (!old_tup)

I would leave a space in between.

> +ReleaseCatCacheList(list);
> +heap_close(pg_enum, RowExclusiveLock);

Maybe we better release them before reporting error, too.  I would
release the list after the loop, close the heap before ereport().


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-08-21 Thread Emre Hasegeli
> ... this part of the patch breaks the existing API for SP-GiST opclasses.
> That is a hard sell.  It may only matter for one existing opclass in core,
> but unless we have reason to think nobody is using any custom SP-GiST
> opclasses, that is not a pleasant thing to do.  How important is it really
> for this opclass?  Several of the existing opclasses use fixed numbers of
> child nodes, so why does this need something they don't?

This addition is required to implement the operator class cleanly.  We
could store the id of the child node as a "label", and add nodes when
labels are missing, but this complicates all operator class functions.
Other designs are also possible using the labels, but a simple design
with fixed number of nodes worked best for me.

Currently, SP-GiST framework supports fixed number of child nodes, if
the index is growing by page splits, not by prefix splits.  This is
not a fair API.  I assume it is designed by only having the prefix
tree in mind.  The change makes SP-GiST more reusable.

SP-GiST is not widely adopted in my experience.  Breaking this part of
the API would affect prefix tree implementations.  I don't think there
are much of them.  Supporting the new interface is easy for them.  We
can try to support the old interface by side of the new one, but this
wouldn't be very clean either.

I thought we were breaking the C interface in similar ways on major releases.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-08 Thread Emre Hasegeli
> Given that you are now familiar with the internals of how enums are
> implemented would it be possible to continue the work and add the "ALTER
> TYPE  DROP VALUE " command?

I was thinking to try developing it, but I would be more than happy to
help by testing and reviewing, if someone else would do.  I don't
think I have enough experience to think of all details of this
feature.

The main problem that has been discussed before was the indexes.  One
way is to tackle with it is to reindex all the tables after the
operation.  Currently we are doing it when the datatype of indexed
columns change.  So it should be possible, but very expensive.

Another way, Thomas Munro suggested when we were talking about it,
would be to add another column to mark the deleted rows to the catalog
table.  We can check for this column before allowing the value to be
used.  Indexes can continue using the deleted values.  We can also
re-use those entries when someone wants to add new value to the
matching place.  It should be safe as long as we don't update the sort
order.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Transactional enum additions - was Re: [HACKERS] Alter or rename enum value

2016-09-04 Thread Emre Hasegeli
> Got around to looking at this.  Attached is a revised version that I think
> is in committable shape.  The main non-cosmetic change is that the test
> for "type was created in same transaction as new value" now consists of
> comparing the xmins of the pg_type and pg_enum rows, without consulting
> GetCurrentTransactionId().  I did not like the original coding because
> it would pointlessly disallow references to enum values that were created
> in a parent transaction of the current subxact.  This way is still leaving
> some subxact use-cases on the table, as noted in the code comments, but
> it's more flexible than before.

Thank you for picking this up.  The patch looks good to me.  I think
this is a useful to support adding values to the enum created in the
same transaction.

> +   /*
> +* If the row is hinted as committed, it's surely safe.  This provides a
> +* fast path for all normal use-cases.
> +*/
> +   if (HeapTupleHeaderXminCommitted(enumval_tup->t_data))
> +   return;
> +
> +   /*
> +* Usually, a row would get hinted as committed when it's read or loaded
> +* into syscache; but just in case not, let's check the xmin directly.
> +*/
> +   xmin = HeapTupleHeaderGetXmin(enumval_tup->t_data);
> +   if (!TransactionIdIsInProgress(xmin) &&
> +   TransactionIdDidCommit(xmin))
> +   return;

This looks like transaction internal logic inside enum.c to me.  Maybe
it is because I am not much familiar with the internals.  I couldn't
find a similar pattern anywhere else, but it might still be useful to
invent something like HeapTupleHeaderXminReallyCommitted().

One risk about this feature is that the future enum functions would
not check if the value is safe to return.  Maybe we should append a
warning to the file header about this.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-04 Thread Emre Hasegeli
> I started looking at this patch.  I'm kind of unhappy with having *both*
> IF EXISTS and IF NOT EXISTS options on the statement, especially since
> the locations of those phrases in the syntax seem to have been chosen
> with a dartboard.  This feels way more confusing than it is useful.
> Is there really a strong use-case for either option?  I note that
> ALTER TABLE RENAME COLUMN, which is probably used a thousand times
> more often than this will be, has so far not grown either kind of option,
> which sure makes me think that this proposal is getting ahead of itself.

I think they can be useful.  I am writing a lot of migration scripts
for small projects.  It really helps to be able to run parts of them
again.  ALTER TYPE ... ADD VALUE already have IF NOT EXISTS option.  I
don't think we would lose anything to support both of them in here.

The syntax ALTER TYPE ... RENAME VALUE [ IF EXISTS ] ... TO ... [ IF
NOT EXISTS ] looks self-explaining to me.  I haven't confused when I
first saw.  IF EXISTS applying to the old value, IF NOT EXISTS
applying to the new value, are the only reasonable semantics one might
expect from renaming things.  Anybody is interpreting it wrong? or can
think of another syntax?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-11 Thread Emre Hasegeli
> Why not just disallow dropping a value that's still in use? That's certainly
> what I would prefer to happen by default...

Of course, we should disallow it.  That problem is what to do next.
We cannot just remove the value, because it might still be referenced
from the inner nodes of the indexes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-07 Thread Emre Hasegeli
> Bottom line here is that I'd rather commit ALTER TYPE RENAME VALUE with
> no EXISTS features and then see it accrete those features together with
> other types of RENAME, when and if there's a will to make that happen.

This sounds like a good conclusion to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-08-18 Thread Emre Hasegeli
> I did *not* push the hunk in citext.sgml, since that was alleging support
> that doesn't actually exist in this patch.  To make this work for citext,
> we need to add wrapper functions similar to citext's wrappers for
> regexp_matches.  And that in turn means a citext extension version bump,
> which makes it more notationally tedious than I felt like dealing with
> today.  I'm bouncing that requirement back to you to produce a separate
> patch for.

It is attached.


citext-regexp-match-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-08-18 Thread Emre Hasegeli
> Emre, I noticed you modified the commitfest entry
> (https://commitfest.postgresql.org/10/588/) to be for Andrew's
> transactional enum addition patch instead, but didn't change the title.
> I'll revert that as soon as it picks up this latest patch.  Do you wish
> to remain a reviewer for this patch, or should I remove you?

I confused with Andrew's transactional enum addition patch.  I guess
we need a separate entry on Commitfest App for that part of the thread
[1].  I am not sure, if that is possible.  I will do my best to review
both of them.

[1] https://www.postgresql.org/message-id/56ffe757.1090...@dunslane.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >