Re: [HACKERS] [PATCH] Improve geometric types

2017-11-09 Thread Emre Hasegeli
>> This is also effecting lseg ## box operator.
>
> Mmm.. It returns (1.5, 1.5) with the 0004 patch. It is surely a
> point on the second operand but I'm not sure it's right that the
> operator returns a specific point for two parallel segments.

I am changing it to return NULL, when they are parallel.

> I'd like to put comments on 0001 and 0004 only now:
>
>  - Adding [LR]DELIM_L looks good but they should be described in
>the comment just above.

I will mention it on the new version.

>  - I'm not sure the change of box_construct is good but currently
>all callers fits the new interface (giving two points, not
>four coordinates).

I tried to make things more consistent.  The other constructors takes points.

>  - close_lseg seems forgetting the case where the two given
>segments are crossing (and parallel).

I am re-implementing it covering those cases.

> - make_bound_box operates directly on the poly->boundbox. I'm
>   afraid that such coding hinders compiers from using registers.

I am changing it back.

>   This is a bit apart from this patch, it would be better if we
>   could eliminate internal calls using DirectFunctionCall.

We don't seem to be able to fix all issues without doing that.  I will
incorporate the change.

Thank you for your review.  I will address your other email before
posting new versions.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-10-31 Thread Emre Hasegeli
> I'm mostly happy with mentioned modifications, but I have few questions
> to clarify some points. I will send new patch in week or two.

I am glad you liked it.  Though, I think we should get approval from
more senior community members or committers about the syntax, before
we put more effort to the code.

> But configuration:
>
> CASE english_noun WHEN MATCH THEN english_hunspell ELSE simple END
>
> is not (as I understand ELSE can be used only with KEEP).
>
> I think we should decide to allow or disallow usage of different
> dictionaries for match checking (between CASE and WHEN) and a result
> (after THEN). If answer is 'allow', maybe we should allow the
> third example too for consistency in configurations.

I think you are right.  We better allow this too.  Then the CASE syntax becomes:

CASE config
WHEN [ NO ] MATCH THEN { KEEP | config }
[ ELSE config ]
END

> Based on formal definition it is possible to describe this example in
> following manner:
> CASE english_noun WHEN MATCH THEN english_hunspell END
>
> The question is same as in the previous example.

I couldn't understand the question.

> Currently, stopwords increment position, for example:
> SELECT to_tsvector('english','a test message');
> -
>  'messag':3 'test':2
>
> A stopword 'a' has a position 1 but it is not in the vector.

Is this problem only applies to stopwords and the whole thing we are
inventing?  Shouldn't we preserve the positions through the pipeline?

> If we want to save this behavior, we should somehow pass a stopword to
> tsvector composition function (parsetext in ts_parse.c) for counter
> increment or increment it in another way. Currently, an empty lexemes
> array is passed as a result of LexizeExec.
>
> One of possible way to do so is something like:
> CASE polish_stopword
> WHEN MATCH THEN KEEP -- stopword counting
> ELSE polish_isspell
> END

This would mean keeping the stopwords.  What we want is

CASE polish_stopword-- stopword counting
WHEN NO MATCH THEN polish_isspell
END

Do you think it is possible?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-10-26 Thread Emre Hasegeli
> The patch introduces way to configure FTS based on CASE/WHEN/THEN/ELSE
> construction.

Interesting feature.  I needed this flexibility before when I was
implementing text search for a Turkish private listing application.
Aleksandr and Arthur were kind enough to discuss it with me off-list
today.

> 1) Multilingual search. Can be used for FTS on a set of documents in
> different languages (example for German and English languages).
>
> ALTER TEXT SEARCH CONFIGURATION multi
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part WITH CASE
> WHEN english_hunspell AND german_hunspell THEN
>   english_hunspell UNION german_hunspell
> WHEN english_hunspell THEN english_hunspell
> WHEN german_hunspell THEN german_hunspell
> ELSE german_stem UNION english_stem
>   END;

I understand the need to support branching, but this syntax is overly
complicated.  I don't think there is any need to support different set
of dictionaries as condition and action.  Something like this might
work better:

ALTER TEXT SEARCH CONFIGURATION multi
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
  word, hword, hword_part WITH
CASE english_hunspell UNION german_hunspell
WHEN MATCH THEN KEEP
ELSE german_stem UNION english_stem
END;

To put it formally:

ALTER TEXT SEARCH CONFIGURATION name
ADD MAPPING FOR token_type [, ... ] WITH config

where config is one of:

dictionary_name
config { UNION | INTERSECT | EXCEPT } config
CASE config WHEN [ NO ] MATCH THEN [ KEEP ELSE ] config END

> 2) Combination of exact search with morphological one. This patch not
> fully solve the problem but it is a step toward solution. Currently, we
> should split exact and morphological search in query manually and use
> separate index for each part. With new way to configure FTS we can use
> following configuration:
>
> ALTER TEXT SEARCH CONFIGURATION exact_and_morph
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>   word, hword, hword_part WITH CASE
> WHEN english_hunspell THEN english_hunspell UNION simple
> ELSE english_stem UNION simple
>   END

This could be:

CASE english_hunspell
THEN KEEP
ELSE english_stem
END
UNION
simple

> 3) Using different dictionaries for recognizing and output generation.
> As I mentioned before, in new syntax condition and command are separate
> and we can use it for some more complex text processing. Here an
> example for processing only nouns:
>
> ALTER TEXT SEARCH CONFIGURATION nouns_only
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part WITH CASE
>   WHEN english_noun THEN english_hunspell
> END

This would also still work with the simpler syntax because
"english_noun", still being a dictionary, would pass the tokens to the
next one.

> 4) Special stopword processing allows us to discard stopwords even if
> the main dictionary doesn't support such feature (in example pl_ispell
> dictionary keeps stopwords in text):
>
> ALTER TEXT SEARCH CONFIGURATION pl_without_stops
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part WITH CASE
> WHEN simple_pl IS NOT STOPWORD THEN pl_ispell
>   END

Instead of supporting old way of putting stopwords on dictionaries, we
can make them dictionaries on their own.  This would then become
something like:

CASE polish_stopword
WHEN NO MATCH THEN polish_isspell
END


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Improve geometric types

2017-10-04 Thread Emre Hasegeli
> Now, it's also not clear that anything in PG really cares.  But if we
> do care, I think we should keep pg_hypot() ... and maybe clarify the
> comment a bit more.

I am not sure how useful NaNs are in geometric types context, but we
allow them, so inconsistent hypot() would be a problem.  I will change
my patches to keep pg_hypot().


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Improve geometric types

2017-10-03 Thread Emre Hasegeli
> Uh, I thought pg_hypot() was still needed on our oldest supported
> platforms.  Or is hypot() already supported there?  If not, I suppose we
> can keep the HYPOT() macro, and have it use hypot() on platforms that
> have it and pg_hypot elsewhere?  That particular fraction of 0001 seemed
> a bit dubious to me, as the largest possible source of platform
> dependency issues.

What is our oldest supported platform?

We can also just keep pg_hypot().  I don't think getting rid of it
justifies much work.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Improve geometric types

2017-10-03 Thread Emre Hasegeli
> I think if we're going to do this it should be separated out as its
> own patch.  Also, I think someone should explain what the reasoning
> behind the change is.  Do we, for example, foresee that the built-in
> code might be faster or less likely to overflow?  Because we're
> clearly taking a risk -- most trivially, that the BF will break, or
> more seriously, that some machines will have versions of this function
> that don't actually behave quite the same.

I included removal of pg_hypot() on my patch simply because the
comment on the function header is suggesting it.  I though if we are
going to clean this module up, we better deal it first.  I understand
the risk.  The patches include more changes.  It may be a good idea to
have those together.

> That brings up a related point.  How good is our test case coverage
> for hypot(), especially in strange corner cases, like this one
> mentioned in pg_hypot()'s comment:
>
>  * This implementation conforms to IEEE Std 1003.1 and GLIBC, in that the
>  * case of hypot(inf,nan) results in INF, and not NAN.

I don't see any tests of geometric types with INF or NaN.  Currently,
there isn't consistent behaviour for them.  I don't think we can
easily add portable ones on the current state, but we should be able
to do so with my patches.  I will look into it.

> I'm potentially willing to commit a patch that just makes the
> pg_hypot() -> hypot() change and does nothing else, if there are not
> objections to that change, but I want to be sure that we'll know right
> away if that turns out to break.

I can split this one into another patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Improve geometric types

2017-09-27 Thread Emre Hasegeli
> The patch replace pg_hypot with hypot in libc. The man page says
> as follows.
>
> man 3 hypot
>>   If the result overflows, a range error occurs, and the functions return
>>   HUGE_VAL, HUGE_VALF, or HUGE_VALL, respectively.
> ..
>>ERRORS
>>   See math_error(7) for information on how to determine whether an  error
>>   has occurred when calling these functions.
>>
>>   The following errors can occur:
>>
>>   Range error: result overflow
>>  errno  is  set  to ERANGE.  An overflow floating-point exception
>>  (FE_OVERFLOW) is raised.
>>
>>   Range error: result underflow
>>  An underflow floating-point exception (FE_UNDERFLOW) is raised.
>>
>>  These functions do not set errno for this case.
>
> So, the code seems to need some amendments following to this
> spec.

I included them on the latest version.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] close_ps, NULLs, and DirectFunctionCall

2017-09-21 Thread Emre Hasegeli
> Does this need fixing, and if so how?

My improve geometric types patch [1] fixes this issue, by cosidering
NaNs larger than any non-NAN same as the float types.  There are many
issues with the geometric types similar to this.  Let's combine our
efforts to put them into a shape.

[1] 
https://www.postgresql.org/message-id/flat/CAE2gYzxF7-5djV6-cEvqQu-fNsnt=eqbourx7zdg+vv6zmt...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Improve geometric types

2017-09-12 Thread Emre Hasegeli
> Hello, sorry to late for the party, but may I comment on this?

Thank you for picking this up again.

> The first patch reconstructs the operators in layers. These
> functions are called very frequently when used. Some function are
> already inlined in float.h but some static functions in float.h
> also can be and are better be inlined. Some of *_internal,
> point_construct, line_calculate_point and so on are the
> candidates.

They are static functions.  I though compiler can decide to inline
them.  Do you think adding "inline" to the function signatures are
necessary?

> You removed some DirectFunctionCall to the functions within the
> same file but other functions remain in the style,
> ex. poly_center or on_sl. The function called from the former
> seems large enough but the latter function calls a so small
> function that it could be inlined. Would you like to make some
> additional functions use C call (instead of DirectFunctionCall)
> and inlining them?

I tried to minimise my changes to make reviewing easier.  I can make
"_internal" functions for the remaining DirectFunctionCall()s, if you
find it necessary.

> This is not a fault of this patch, but some functions like on_pb
> seems missing comment to describe what it is. Would you like to
> add some?

I will add some on the next version.

> In the second patch, the additional include fmgrprotos.h in
> btree_gin.c seems needless.

It must be something I missed on rebase.  I will remove it.

> Some float[48] features were macros
> so that they share the same expressions between float4 and
> float8. They still seems sharing perfectly the same expressions
> in float.h. Is there any reason for converting them into typed
> inline functions?

Kevin Grittner suggested using inline functions instead of macros.
They are easier to use compared to macros, and avoid double-evaluation
hazards.

> In float.h, MAXDOUBLEWIDTH is redueced from 500 to 128, but the
> exponent of double is up to 308 so it doesn't seem sufficient. On
> the other hand we won't use non-scientific notation for extremely
> large numbers and it requires (perhaps) up to 26 bytes in the
> case. In the soruce code, most of them uses "%e" and one of them
> uses '%g". %e always takes the format of
> "-1.(17digits)e+308".. So it would be less than 26
> characters.
>
> =# set extra_float_digits to 3;
> =# select -1.221423424320453e308::float8;
>  ?column?
> ---
>  -1.22142342432045302e+308
>
> man printf: (linux)
>> Style e is used if the exponent from its conversion is less than
>> -4 or greater than or equal to the precision.
>
> So we should be safe to have a buffer with 26 byte length and 500
> bytes will apparently too large and even 128 will be too loose in
> most cases. So how about something like the following?
>
> #define MINDOUBLEWIDTH 32

Should it be same for float4 and float8?

> ...
> float4out@float.c:
>>int  ndig = FLT_DIG + extra_float_digits;
>>
>>if (ndig < 1)
>>   ndig = 1;
>>
>>len = snprintf(ascii, MINDOUBLEWIDTH + 1, "%+.*g", ndig, num);
>> if (len > MINDOUBLEWIDTH + 1)
>>{
>>ascii = (char *) repalloc(ascii, len);
>>if (snprintf(ascii, len, "%+.*e", ndig, num) > len)
>>   error(ERROR, "something wrong happens...");
>> }
>
> I don't think the if part can be used so there would be no
> performance degradation, I believe.

Wouldn't this change the output of the float datatypes?  That would be
a backwards incompatible change.

> I'd like to pause here.

I will submit new versions after your are done with your review.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-06 Thread Emre Hasegeli
> Good point. That's wrong, but I'm confused at why you kept the:
>
> + *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
>
> at all if that's the case. All the BRIN scan is going to do is build a
> bitmap of the matching ranges found.

My mind was not clear when I was working on it a year ago.

I've ended up with:
>
> + /*
> + * Charge a small amount per range tuple which we expect to match to. This
> + * is meant to reflect the costs of manipulating the bitmap. The BRIN scan
> + * will set a bit for each page in the range when we find a matching
> + * range, so we must multiply the charge by the number of pages in the
> + * range.
> + */
> + *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges *
> + pagesPerRange;


Isn't BRIN executes the operator only once per range?  I think we can just
multiply cpu_operator_cost and estimatedRanges.

I also noticed that you were doing:
>
> + if (get_index_stats_hook &&
> + (*get_index_stats_hook) (root, index->indexoid, 1, ))
>
> and
>
> + vardata.statsTuple = SearchSysCache3(STATRELATTINH,
> + ObjectIdGetDatum(index->indexoid),
> + Int16GetDatum(1),
>
> I wondered why you picked to hardcode that as 1, and I thought that's
> surely a bug. But then looking deeper it seems to be copied from
> btcostestimate(), which also seems to be a long-standing bug
> introduced in a536ed53bca40cb0d199824e358a86fcfd5db7f2. I'll go write
> a patch for that one now.


Yes, I copy-pasted from btcostestimate().

I do still have concerns about how the correlation value is used, and
> the fact that we use the highest value, but then the whole use of this
> value is far from perfect, and is just a rough indication of how
> things are.


I agree.  I tried to document how incomplete it is on the comments I wrote
to help future developers improve the situation.


Re: [HACKERS] BRIN cost estimate

2017-04-04 Thread Emre Hasegeli
> Interested to hear comments on this.


I don't have chance to test it right now, but I am sure it would be an
improvement over what we have right now.  There is no single correct
equation with so many unknowns we have.

>   *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);

Have you preserved this line intentionally?  This would make BRIN index
scan cost even higher, but the primary property of BRIN is to be cheap to
scan. Shouldn't bitmap heap scan node take into account of checking the
tuples in its cost?


Re: [HACKERS] BRIN cost estimate

2017-04-02 Thread Emre Hasegeli
- cond := format('%I %s %L', r.colname, r.oper, r.value);
+ cond := format('%s %s %L', r.colname, r.oper, r.value);

Why did you change this? It seems unrelated.


Must be a mistake.

+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = Min(BrinGetPagesPerRange(indexRel), baserel->pages);
+ Assert(baserel->pages > 0);
+ Assert(pagesPerRange > 0);
+ rangeProportion = (double) pagesPerRange / baserel->pages;
+ numRanges = 1.0 + (double) baserel->pages / pagesPerRange;
+ index_close(indexRel, AccessShareLock);

Why do you add 1.0 to numRanges? This gives one too many ranges.


I have tried to prevent division by zero that can happen later, but your
solution below sounds cleaner to me.

I also don't really like the way you're setting pagesPerRange. I'd suggest
keeping that as the raw value from the index metadata, and doing:

If you want the actual number of ranges then I think this is best expressed
as:

numRanges = Max(ceil(baserel->pages / pagesPerRange), 1.0);

The rangeProportion variable could be calculated based on that
too rangeProportion = 1.0 / numRanges;

That way you don't have to rely on relpages being > 0. It seems like a bad
assumption to make. I see there's some hack in estimate_rel_size() making
that true, but I think it's best not to rely on that hack.

- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
  *indexStartupCost += qual_arg_cost;
  *indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
  *indexPages = index->pages;

- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs.  Unlike other indexes, we are not processing
+ * tuples but ranges.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numRanges * qual_op_cost;

What's the reason for removing the  + list_length(indexOrderBys) here? I
don't think it's the business of this patch to touch that. BRIN may one day
support that by partition sorting non-overlapping ranges, so that could
well be why it was there in the first place.


I thought it was a mistake.  I agree we better not change it, even though I
have no idea how cost of BRIN sorting can be calculated.  I guess the
indexOrderBys list would always be empty for now, anyway.

- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;

Why is this not being charged qual_op_cost anymore?


I must have thought that BRIN doesn't execute the qual unlike btree.  I am
not sure what is the best equation to put there.  Previously, we were
returning good selectivity, but high cost.  I believe things should be
other way around.  Maybe what we really need is to use "numRanges" in here
instead of "selec * numTuples".

+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree.  We are using a product of logical and

Can you explain why this is the case?


My wording sounds wrong in here.  I should have said "worse than" instead
of "less than".

+ * class "minmax", and makes a little sense for the other one "inclusion".

"and" I think should be "but"


I agree.

I think it would also be good to write some regression tests for this. All
the changes I see that you did make to brin.sql seem unrelated to this
patch.


I added an expression index to test getting correlation of expressions.
The BRIN tests would call the estimation functions, but we don't have any
tests to check the result of the functions.  We can maybe prepare a test to
check BRIN is prefferred over btree when it would perform better, and maybe
also vice versa.

Unfortunately, I am on vacation for two weeks without my computer.  I can
post another version after 18th.  I know we are under time pressure for
release.  I wouldn't mind if you or Alvaro would change it anyway you like.


Re: [HACKERS] BRIN cost estimate

2017-03-26 Thread Emre Hasegeli
> If we want to have a variable which stores the number of ranges, then
> I think numRanges is better than numBlocks. I can't imagine many
> people would disagree there.

I renamed it with other two.

> At the very least please write a comment to explain this in the code.
> Right now it looks broken. If I noticed this then one day in the
> future someone else will. If you write a comment then person of the
> future will likely read it, and then not raise any questions about the
> otherwise questionable code.

I added a sentence about it.

> I do strongly agree that the estimates need improved here. I've
> personally had issues with bad brin estimates before, and I'd like to
> see it improved. I think the patch is not quite complete without it
> also considering stats on expression indexes. If you have time to go
> do that I'd suggest you go ahead with that.

I copy-pasted expression index support from btcostestimate() as well,
and extended the regression test.

I think this function can use more polishing before committing, but I
don't know where to begin.  There are single functions for every
access method in here.  I don't like them being in there to begin
with.  selfuncs.s is quite long with all kinds of dependencies and
dependents.  I think it would be better to have the access method
selectivity estimation functions under src/access.  Maybe we should
start doing so by moving this to src/access/brin/brin_selfuncs.c.
What do you think?


brin-correlation-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-21 Thread Emre Hasegeli
> Not sure what you mean here. I'm not speaking of the brin index am, I
> mean the get_index_stats_hook call which you've added.

I see.  Actually this part was from Alvaro.  I haven't noticed the
get_index_stats_hook call before, but it is still the same coding as
btcostestimate().  btcostestimate() also calls get_index_stats_hook,
and then Asserts nnumbers == 1.

> hmm, before what exactly? before your patch it didn't exist. You
> introduced it into brincostestimate().

I confused by looking at my changes on my repository I made on top of
Alvaro's.  I will rename it on the next version.

> At the very least please write a comment to explain this in the code.
> Right now it looks broken. If I noticed this then one day in the
> future someone else will. If you write a comment then person of the
> future will likely read it, and then not raise any questions about the
> otherwise questionable code.

Will do.

> I do strongly agree that the estimates need improved here. I've
> personally had issues with bad brin estimates before, and I'd like to
> see it improved. I think the patch is not quite complete without it
> also considering stats on expression indexes. If you have time to go
> do that I'd suggest you go ahead with that.

I will look into it this week.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-17 Thread Emre Hasegeli
> 1.
>
> + Assert(nnumbers == 1);
>
> I think its a bad idea to Assert() this. The stat tuple can come from
> a plugin which could do anything. Seems like if we need to be certain
> of that then it should be an elog(ERROR), maybe mention that we
> expected a 1 element array, but got  elements.

But it is not coming from a plugin which can do anything.  It is the
plugin we know.  Assert() is exact same coding with btcostestimate().

> 2.
>
> + Assert(varCorrelation >= 0 && varCorrelation <= 1);
>
> same goes for that. I don't think we want to Assert() that either.
> elog(ERROR) seems better, or perhaps we should fall back on the old
> estimation method in this case?

Again the correlation value is expected to be in this range.  I don't
think we even need to bother with Assert().  Garbage in garbage out.

> The Asserted range also seems to contradict the range mentioned in
> pg_statistic.h:

We are using Abs() of the value.

> 3.
>
> + numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
>
> should numBlocks be named numRanges? After all we call the option
> "pages_per_range".

It was named this way before.

> 4.
>
> + * correlated table is copied 4 times, the correlation would be 0.25,
> + * although the index would be almost as good as the version on the
>
> What's meant by "table is copied 4 times" ?
>
> As best as I can work out, it means.
>
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
>
> but I'm uncertain, and it's not very clear to me what it means.

This was exactly what I meant.  I included the INSERT INTO statement
to make it more clear.

> 5.
>
> + blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
>
> I missed the comment that explains why we divide by two here.

To get the arithmetic mean.  The paragraph explaining this had a word
missing.  I fixed it.

> 6.
>
> Might want to consider not applying this estimate when the statistics
> don't contain any STATISTIC_KIND_CORRELATION array.

I am not sure what would be better than using the value as 0 in this case.

> 7.
>
> + blockProportion = (double) BrinGetPagesPerRange(indexRel) /
> + baserel->pages;
>
> Perhaps the blockProportion should also be clamped to the number of
> pages in the relation. Even a 1 page relation is estimated to have 128
> pages with the default pages_per_range, by the method used in the
> patch.

I have failed to consider the case when there are less pages than
pages_per_range.  New version considers for this.

> Good news is, I'm seeing much better plans coming out in cases when
> the index is unlikely to help. So +1 for fixing this issue.

It is also useful when same columns are indexed with different access
methods.  If we let HOT-updates with BRIN indexes, people can freely
spread BRIN indexes around.  I have also noticed while testing again
that the patch positively effects selecting parallel bitmap index
scans.

The most doubtful part of the patch is the "(1 + logical_selectivity)
* (1 + physical_selectivity) - 1" equation I made up.
logical_selectivity is the value btree would return.  I am sure BRIN
should always return something greater.  physical_selectivity is the
value calculated to fix the logical_selectivity using correlation and
pages_per_range.  I had no idea how to combine them together, and
picked the current one after random tests with different datasets.  We
can try to make an empirical formula using the actual values, but it
would be too much tied with the tests and datasets we would use.

One of the things the patch makes worse is the case when the
selectivity is correctly estimated as 0.  For example, searching for
id = 101 where ids are changing between 0 and 100.  That is bad, but I
don't see a way to improve it without pushing down the job to the
opclasses and going through a lot more complication.  The adjustment
would be useful when searching for id = 101 where ids are changing
between 0 and 100 or 102 and 200.

I can look into supporting expression indexes, if you think something
very much incomplete like this has a chance to get committed.


brin-correlation-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Emre Hasegeli
> Hopefully, this time I got it correct.  Since I am unable to reproduce
> the issue so I will again need your help in verifying the fix.

It is not crashing with the new patch.  Thank you.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Emre Hasegeli
> Are you getting the crash with the same test case?

Yes.  Here is the new backtrace:

> * thread #1: tid = 0x51828fd, 0x000100caf314 
> postgres`tbm_prepare_shared_iterate [inlined] pg_atomic_write_u32_impl(val=0) 
> at generic.h:57, queue = 'com.apple.main-thread', stop reason = 
> EXC_BAD_ACCESS (code=1, address=0x0)
>   * frame #0: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
> [inlined] pg_atomic_write_u32_impl(val=0) at generic.h:57 [opt]
> frame #1: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
> [inlined] pg_atomic_init_u32_impl(val_=0) at generic.h:163 [opt]
> frame #2: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
> [inlined] pg_atomic_init_u32(val=0) + 17 at atomics.h:237 [opt]
> frame #3: 0x000100caf303 
> postgres`tbm_prepare_shared_iterate(tbm=) + 723 at 
> tidbitmap.c:875 [opt]
> frame #4: 0x000100c74844 postgres`BitmapHeapNext(node=) 
> + 436 at nodeBitmapHeapscan.c:154 [opt]
> frame #5: 0x000100c615b0 
> postgres`ExecProcNode(node=0x7fdabf8189f0) + 224 at execProcnode.c:459 
> [opt]
> frame #6: 0x000100c76ca9 postgres`ExecGather [inlined] 
> gather_getnext(gatherstate=) + 520 at nodeGather.c:276 [opt]
> frame #7: 0x000100c76aa1 postgres`ExecGather(node=) + 
> 497 at nodeGather.c:212 [opt]
> frame #8: 0x000100c61692 
> postgres`ExecProcNode(node=0x7fdabf818558) + 450 at execProcnode.c:541 
> [opt]
> frame #9: 0x000100c5cf70 postgres`standard_ExecutorRun [inlined] 
> ExecutePlan(estate=, planstate=, 
> use_parallel_mode=, operation=, numberTuples=0, 
> direction=, dest=) + 29 at execMain.c:1616 [opt]
>frame #10: 0x000100c5cf53 
> postgres`standard_ExecutorRun(queryDesc=, 
> direction=, count=0) + 291 at execMain.c:348 [opt]
>frame #11: 0x000100dac0df 
> postgres`PortalRunSelect(portal=0x7fdac000b240, forward=, 
> count=0, dest=) + 255 at pquery.c:921 [opt]
>frame #12: 0x000100dabc84 
> postgres`PortalRun(portal=0x7fdac000b240, count=, 
> isTopLevel='\x01', dest=, altdest=, 
> completionTag=) + 500 at pquery.c:762 [opt]
>frame #13: 0x000100da989b postgres`PostgresMain + 44 at 
> postgres.c:1101 [opt]
>frame #14: 0x000100da986f postgres`PostgresMain(argc=, 
> argv=, dbname=, username=) + 8927 at 
> postgres.c:4066 [opt]
>frame #15: 0x000100d2c113 postgres`PostmasterMain [inlined] BackendRun 
> + 7587 at postmaster.c:4317 [opt]
>frame #16: 0x000100d2c0e8 postgres`PostmasterMain [inlined] 
> BackendStartup at postmaster.c:3989 [opt]
>frame #17: 0x000100d2c0e8 postgres`PostmasterMain at postmaster.c:1729 
> [opt]
>frame #18: 0x000100d2c0e8 postgres`PostmasterMain(argc=, 
> argv=) + 7544 at postmaster.c:1337 [opt]
>frame #19: 0x000100ca528f postgres`main(argc=, 
> argv=) + 1567 at main.c:228 [opt]
>frame #20: 0x7fffb4e28255 libdyld.dylib`start + 1
>frame #21: 0x7fffb4e28255 libdyld.dylib`start + 1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli
> Please verify the fix.

The same test with both of the patches applied still crashes for me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli
> With my test case, I could not crash even with this patch applied.
> Can you provide your test case?

Yes:

> hasegeli=# create table r2 as select (random() * 3)::int as i from 
> generate_series(1, 100);
> SELECT 100
> hasegeli=# create index on r2 using brin (i);
> CREATE INDEX
> hasegeli=# analyze r2;
> ANALYZE
> hasegeli=# explain select * from only r2 where i = 10;
>  QUERY PLAN
> -
>  Gather  (cost=2867.50..9225.32 rows=1 width=4)
>Workers Planned: 2
>->  Parallel Bitmap Heap Scan on r2  (cost=1867.50..8225.22 rows=1 width=4)
>  Recheck Cond: (i = 10)
>  ->  Bitmap Index Scan on r2_i_idx  (cost=0.00..1867.50 rows=371082 
> width=0)
>Index Cond: (i = 10)
> (6 rows)
>
> hasegeli=# select * from only r2 where i = 10;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli
> This can crash at line:414, if either tuple is invalid memory(but I
> think it's not because we have already accessed this memory in above
> if check) or dtup is invalid (this is also not possible because
> brin_new_memtuple has already accessed this).

I was testing with the brin correlation patch [1] applied.  I cannot
crash it without the patch either.  I am sorry for not testing it
before.  The patch make BRIN selectivity estimation function access
more information.

[1] 
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli
> I don't know if this is the only problem

> I'll be in this general area today, so will mention if I stumble over
> anything that looks broken.

I was testing the same patch with a large dataset and got a different segfault:

> hasegeli=# explain select * from only mp_notification_20170225 where 
> server_id = 7;
>QUERY PLAN
> --
> Gather  (cost=26682.94..476995.88 rows=1 width=215)
>   Workers Planned: 2
>   ->  Parallel Bitmap Heap Scan on mp_notification_20170225  
> (cost=25682.94..475995.78 rows=1 width=215)
> Recheck Cond: (server_id = 7)
> ->  Bitmap Index Scan on mp_notification_block_idx  
> (cost=0.00..25682.94 rows=4557665 width=0)
>   Index Cond: (server_id = 7)
> (6 rows)
>
> hasegeli=# select * from only mp_notification_20170225 where server_id = 7;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

> * thread #1: tid = 0x5045a8f, 0x00010ae44558 
> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, 
> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414, queue = 
> 'com.apple.main-thread', stop reason = signal SIGUSR1
>  * frame #0: 0x00010ae44558 
> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, 
> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414 [opt]
>frame #1: 0x00010ae4000c 
> postgres`bringetbitmap(scan=0x7fea3c875c20, tbm=) + 428 at 
> brin.c:398 [opt]
>frame #2: 0x00010ae9b451 
> postgres`index_getbitmap(scan=0x7fea3c875c20, bitmap=) + 65 
> at indexam.c:726 [opt]
>frame #3: 0x00010b0035a9 
> postgres`MultiExecBitmapIndexScan(node=) + 233 at 
> nodeBitmapIndexscan.c:91 [opt]
>frame #4: 0x00010b002840 postgres`BitmapHeapNext(node=) + 
> 400 at nodeBitmapHeapscan.c:143 [opt]
>frame #5: 0x00010afef5d0 
> postgres`ExecProcNode(node=0x7fea3c873948) + 224 at execProcnode.c:459 
> [opt]
>frame #6: 0x00010b004cc9 postgres`ExecGather [inlined] 
> gather_getnext(gatherstate=) + 520 at nodeGather.c:276 [opt]
>frame #7: 0x00010b004ac1 postgres`ExecGather(node=) + 497 
> at nodeGather.c:212 [opt]
>frame #8: 0x00010afef6b2 
> postgres`ExecProcNode(node=0x7fea3c872f58) + 450 at execProcnode.c:541 
> [opt]
>frame #9: 0x00010afeaf90 postgres`standard_ExecutorRun [inlined] 
> ExecutePlan(estate=, planstate=, 
> use_parallel_mode=, operation=, numberTuples=0, 
> direction=, dest=) + 29 at execMain.c:1616 [opt]
>frame #10: 0x00010afeaf73 
> postgres`standard_ExecutorRun(queryDesc=, 
> direction=, count=0) + 291 at execMain.c:348 [opt]
>frame #11: 0x00010af8b108 
> postgres`ExplainOnePlan(plannedstmt=0x7fea3c871040, 
> into=0x, es=0x7fea3c805360, queryString=, 
> params=, planduration=) + 328 at explain.c:533 [opt]
>frame #12: 0x00010af8ab98 
> postgres`ExplainOneQuery(query=0x7fea3c805890, 
> cursorOptions=, into=0x, es=0x7fea3c805360, 
> queryString=,params=0x) + 280 at explain.c:369 
> [opt]
>frame #13: 0x00010af8a773 postgres`ExplainQuery(pstate=, 
> stmt=0x7fea3d005450, queryString="explain analyze select * from only 
> mp_notification_20170225 where server_id > 6;",params=0x, 
> dest=0x7fea3c8052c8) + 819 at explain.c:254 [opt]
>frame #14: 0x00010b13b660 
> postgres`standard_ProcessUtility(pstmt=0x7fea3d005fa8, 
> queryString="explain analyze select * from only mp_notification_20170225 
> where server_id > 6;",context=PROCESS_UTILITY_TOPLEVEL, 
> params=0x, dest=0x7fea3c8052c8, 
> completionTag=) + 1104 at utility.c:675 [opt]
>frame #15: 0x00010b13ad2a 
> postgres`PortalRunUtility(portal=0x7fea3c837640, 
> pstmt=0x7fea3d005fa8, isTopLevel='\x01', setHoldSnapshot=, 
> dest=0x7fea3c8052c8, completionTag=) + 90 at pquery.c:1165 
> [opt]
>frame #16: 0x00010b139f56 
> postgres`FillPortalStore(portal=0x7fea3c837640, isTopLevel='\x01') + 182 
> at pquery.c:1025 [opt]
>frame #17: 0x00010b139c22 
> postgres`PortalRun(portal=0x7fea3c837640, count=, 
> isTopLevel='\x01', dest=, altdest=, 
> completionTag=) + 402 at pquery.c:757 [opt]
>frame #18: 0x00010b13789b postgres`PostgresMain + 44 at 
> postgres.c:1101 [opt]
>frame #19: 0x00010b13786f postgres`PostgresMain(argc=, 
> argv=, dbname=, username=) + 8927 at 
> postgres.c:4066 [opt]
>frame #20: 0x00010b0ba113 postgres`PostmasterMain [inlined] BackendRun 
> + 7587 at postmaster.c:4317 [opt]
>frame #21: 0x00010b0ba0e8 postgres`PostmasterMain [inlined] 
> BackendStartup at postmaster.c:3989 [opt]
>frame #22: 0x00010b0ba0e8 postgres`PostmasterMain at postmaster.c:1729 
> [opt]
>frame #23: 

Re: [HACKERS] BRIN cost estimate

2017-03-13 Thread Emre Hasegeli
> Will Emre be around to make the required changes to the patch? I see
> it's been a while since it was originally posted.

I am around.  I can post an update in a few days.  Thank you for
picking this up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gin and btree_gist for enums

2017-02-25 Thread Emre Hasegeli
> The reason this is kind of scary is that it's just blithely assuming
> that the function won't look at the *other* fields of the FmgrInfo.
> If it did, it would likely get very confused, since those fields
> would be describing the GIN support function, not the function we're
> calling.

I am sorry if it doesn't make sense, but wouldn't the whole thing be
better, if we refactor enum.c to call only he internal functions with
TypeCacheEntry just like the rangetypes?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Press Release Draft - 2016-02-09 Cumulative Update

2017-02-07 Thread Emre Hasegeli
> As there are a lot of updates I did my best to consolidate some of the
> bullet points and as usual, people are directed to the release notes.
> Please let me know if there are any inaccuracies so I can fix them ASAP.

Just some minor points:

>  * Several fixes for PostgreSQL operating in hot standby mode

It sounded unnatural to me.  Maybe it is better without "PostgreSQL".

>  * Several vacuum and autovacuum fxies

Typo

>  * Several Unicode fixes

It sounded alarming to me.  I see just one related item on the release
notes.  Maybe we can clarify the problem.

>  * Sync our copy of the timezone library with IANA release tzcode2016j

This is repeated on the following sentence.

>  BEGIN;
>  DROP INDEX bad_index_name;
>  CREATE INDEX CONCURRENTLY bad_index_name ON table_name (column_name); /* 
> replace names with your original index definition */
>  COMMIT;

Maybe you meant CREATE INDEX without CONCURRENTLY?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-02-01 Thread Emre Hasegeli
> I think this patch is already in a good shape.

I am sorry for introducing this bug.  This fix looks good to me as well.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-02-01 Thread Emre Hasegeli
> Got it, but if other people don't agree then this is going nowhere.

Yes.  As I wrote, I don't particularly care about functions like "is
point on line".  I can prepare a patch to fix as many problems as
possible around those operators by preserving the current epsilon.

I though we were arguing about *all* operators.  Having containment
and placement operators consistent with each other, is the primary
thing I am trying to fix.  Is removing epsilon from them is
acceptable?

We can also stay away from changing operators like "~=" to minimise
compatibility break, if we keep the epsilon on some places.  We can
instead document this operator as "close enough", and introduce
another symbol for really "the same" operator.

That said, there are some places where it is hard to decide to apply
the epsilon or not.  For example, we can keep the epsilon to check for
two lines being parallel, but then should we return the intersection
point, or not?  Those issues may become more clear when I start
working on it, if preserving epsilon for those operators is the way to
go forward.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Emre Hasegeli
> Backing up a bit here, have we lost track of the problem that we're
> trying to solve?  Tom gave his opinion here:
>
> https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us
>
> But I don't see that the latest patch I can find does anything to fix
> that.

This is what he wrote:

> As I understand it, the key problem is that tests like "is point on line"
> would basically never succeed except in the most trivial cases, because of
> roundoff error.  That's not very nice, and it might cascade to larger
> problems like object-containment tests failing unexpectedly.  We would
> need to go through all the geometric operations and figure out where that
> kind of gotcha is significant and what we can do about it.  Seems like a
> fair amount of work :-(.  If somebody's willing to do that kind of
> investigation, then sure, but I don't think just blindly removing these
> macros is going to lead to anything good.

I re-implemented "is point on line" operator on my patch so that it
would, at least, work on the most trivial cases.  This is not very
nice, but it is predictable.  I have tried to prevent it cascade to
larger problems like object-containment tests failing unexpectedly.  I
have gone through all of the geometric operations and re-implemented
the ones with similar problems, too.  It is no work at all compared to
discussions we have to have :-).

My initial idea was to keep the fuzzy behaviour for some operators
like "is point on line", but the more I get into it the less value I
see in doing so.  The same family operators like "is point on line" is
currently badly broken:

> postgres=# select '(1,0)'::point ## '{1,2,0}'::line;
> ?column?
> --
> (2,2)
> (1 row)

This makes me wonder if anybody is ever using those operators.  In the
end, I don't care about those operators.  They are unlikely to be
supported by indexes.  I can simplify my patch to leave them as they
are.

> Now maybe that's not the problem that Emre is trying to solve,
> but then it is not very clear to me what problem he IS trying to
> solve.  And I think Kyotaro Horiguchi is trying to solve yet another
> problem which is again different.  So IMHO the first task here is to
> agree on a clear statement of what we'd like to fix, and then, given a
> patch, we can judge whether it's fixed.

I listed the problems I am trying to solve in here:

https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com

> Maybe I'm being dumb here and it's clear to you guys what the issues
> under discussion are.  If so, apologies for that, but the thread has
> gotten too theoretical for me and I can't figure out what the
> top-level concern is any more.  I believe we all agree these macros
> are bad, but there seems to be no agreement that I can discern on what
> would be better or why.

We couldn't agree on how we should treat on floating point numbers.  I
think we should threat them just like the "float" datatype.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-26 Thread Emre Hasegeli
> Even though I'm not sure but I don't see a "natural" (or
> agreeable by many poeple) ordering of geometric types in
> general. Anyway it's quite application (not application program
> but the relationship with the real world) specific.

We can just define it for point as "ORDER BY point.x, point.y".

> What we should not forget is that PostGIS does the same thing and
> it is widly used (I believe..). This means it not broken at least
> on a certain context. But it is a fact that it also quite
> inconvenient to get performance from, say, indexes.

I understand from Paul Ramsey's email [1] on this thread that PostGIS
doesn't currently have a tolerance.

> Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
> kind of stable behavior for differences under the level. But on
> the analogy of comparisons of floating point numbers, I suppose
> that inequality comparison could be done without the tolerance.

What do you mean exactly?

>> >> - Some operators are violating commutative property.
>> >>
>> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>> >
>> > Whether EPSILON is introduced or not, commutativity cannot be
>> > assured for it from calculation error, I suppose.
>>
>> It can easily be assured by treating both sides of the operator the
>> same.  It is actually assured on my patch.
>
> It surely holds for certain cases. Even how many applicable cases
> we guess, finally we cannot proof that it works generally. Just
> three times of 1/3 rotation breakes it.

It is a different story.  We cannot talk about commutative property of
rotation function that we currently don't have, because it would be an
asymmetrical operator.

The parallel operator is currently marked as commutative.  The planner
is free to switch the sides of the operation.  Therefore this is not
only a surprise, but a bug.

> Hmm, I have nothing more to say if you don't agree that floating
> point numbers involving any kind of arithmetic is hardly
> deterministic especially not defining its usage.

The floating point results are not random.  There are certain
guarantees.  The transitive property of equality is one of them.  Our
aim should be making things easier for our users by providing more
guarantees not breaking what is already there.

> The world of the geometric types in PostgreSQL *is* built
> so. There is nothing different with that Windows client can make
> different results from PostgreSQL on a Linux server for a simple
> floating point arithmetics, or even different binaries made by
> different version of compilers on the same platoform can. Relying
> on such coherency by accident is a bad policy.

Yes, the results are not portable.  We should only rely on the results
being stable on the same build.  The epsilon doesn't cure this
problem.  It arguably makes it worse.

> PostGIS or libgeos seems to prove it. They are designed exactly
> for this purpose and actually used.

Yes, PostGIS is a GIS application.  We are not.  Geometric types name
suggests to me them being useful for general purpose.

> So, the union of the two requirements seems to be having such
> parameter as a GUC.

That sounds doable to me.  We can use this opportunity to make all
operators consistent.  So the epsilon would apply to the ones that it
current is not.  We can still add btree and hash opclasses, and make
them give an error when this GUC is not 0.  We can even make this or
another GUC apply to floats making whole system more consistent.

Though, I know the community is against behaviour changing GUCs.  I
will not spend more time on this, before I get positive feedback from
others.

[1] 
https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-25 Thread Emre Hasegeli
I am responding both of your emails together.

> Perhaps I don't understand it. Many opclass are defined for
> btree. But since (ordinary) btree handles only one-dimentional,
> totally-orderable sets, geometric types even point don't fit
> it. Even if we relaxed it by removing EPSILON, multidimentional
> data still doesn't fit.

Yes, multidimentional data doesn't fit into btree.  Let's say we would
have to introduce operators called *<, *<=, *>, *>= to support btree
opclass for point.  I agree those operators would be useless, but
btree opclass is used for other purposes too.  "ORDER BY point",
merge-joins, btree index support for ~= would be useful.

Lack of ORDER BY, GROUP BY, and DISTINCT support is the major
inconvenience about the geometric types.  There are many complaints
about this on the mailing list.  Many frameworks and ORMs are not
prepared to deal with getting an error when they use certain types on
those clauses.

>> - Only some operators are using the epsilon.
>>
>> I included the list of the ones which doesn't use the epsilon on
>> initial post of this thread.  This makes the operators not compatible
>> with each other.  For example, you cannot say "if box_a @> box_b and
>> box_b @> point then box_a @> box_b".
>
> (It must be "then box_a @> point".)

Yes, I am sorry.

>  Both of the operators don't
> seem to use EPSILON so the transitivity holds, but perhaps we
> should change them to more appropriate shape, that is, to use
> EPSILON. Even having the tolerance, we can define the operators
> to keep the transitivity.

Yes, that would be another way.  In my view, this would make things
worse, because I believe epsilon approach is completely broken.  The
operators which are not using the epsilon are the only ones people can
sanely use to develop applications.

>> > - The application of epsilon is implementation dependent.
>> >
>> > Epsilon works between two numbers.  There is not always a single way
>> > of implementing geometric operators.  This is surprising to the users.
>> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
>> > epsilon".
>>
>> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
>> ..". I'm not sure off-hand. But I don't see the relationship is
>> significant.

What I meant was the behaviour being unclear to even people who knows
about the epsilon.  If two boxes are overlapping with each other, one
who knows about EPSILON can expect the distance between them to be
less than EPSILON, but this wouldn't be true.

>> - Some operators are violating commutative property.
>>
>> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>
> Whether EPSILON is introduced or not, commutativity cannot be
> assured for it from calculation error, I suppose.

It can easily be assured by treating both sides of the operator the
same.  It is actually assured on my patch.

>> - Some operators are violating transitive property.
>>
>> For example, you cannot say "if point_a ~= point_b and point_b ~=
>> point_c then point_a ~= point_c".
>
> It holds only in ideal (or mathematical) world, or for similars
> of integer (which are strictly implemented mathematica world on
> computer).  Without the EPSILON, two points hardly match by
> floating point error, as I mentioned. I don't have an evidence
> though (sorry), mere equality among three floating point numbers
> is not assured.

Of course, it is assured.  Floating point numbers are deterministic.

>> - The operators with epsilon are not compatible with float operators.
>>
>> This is also surprising for the users.  For example, you cannot say
>> "if point_a << point_b then point_a[0] < point_b[0]".
>
> It is because "is strictly left of" just doesn't mean their
> x-coordinates are in such a relationship. The difference might be
> surprising but is a specification (truely not a bug:).

Then what does that mean?  Every operator with EPSILON is currently
surprising in different way.  People use databases to build
applications.  They often need to implement same operations on the
application side.  It is very hard to be bug-compatible with our
geometric types.

>> = Missing Bits on the Operator Classes

> This final section seems to mention the application but sorry, it
> still don't describe for me what application that this patch
> aiming to enable. But I can understand that you want to handle
> floating point numbers like integers. The epsilon is surely quite
> annoying for the purpose.

I will try to itemise what applications I am aiming to enable:

- Applications with very little GIS requirement

PostGIS can be too much work for an application in s different
business domain but has a little requirement to GIS.  The built-in
geometric types are incomplete to support real world geography.
Getting rid of epsilon would make this worse. In contrary, it would
allow people to deal with the difficulties on their own.

- Applications with geometric objects

I believe people could make use the geometric 

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-11 Thread Emre Hasegeli
> - Floating point comparisons for gemetric types
>
>   Comparison related to geometric types is performed by FPeq
>   macro and similars defined in geo_decls.h. This intends to give
>   tolerance to the comparisons.
>
>  A
>   FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance)
>   FPne:   ->|  e | e  |<-  (<- means exclusive)
>   FPlt:  | e  |<-
>   FPle: |<=e |
>   FPgt:   ->|  e |
>   FPge:  | e=>|
>
>   These seems reasonable ignoring the tolerance amount issue.

I tried to explain below why I don't find this method reasonable.

> At least for the point type, (bitmap) index scan is consistent
> with sequential scan.  I remember that the issue was
> "inconsistency between indexed and non-indexed scans over
> geometric types" but they seem consistent with each other.

You can see on the Git history that it was a lot of trouble to make
the indexes consistent with the operators, and this is limiting
improving indexing of the geometric types.

> You mentioned b-tree, which don't have predefined opclass for
> geometric types. So the "inconsistency" may be mentioning the
> difference between geometric values and combinations of plain
> (first-class) values. But the two are different things and
> apparently using different operators (~= and = for equality) so
> the issue is not fit for this. More specifically, "p ~= p0" and
> "x = x0 and y = y0" are completely different.

Yes, the default btree operator class doesn't have to implement
standard equality and basic comparison operator symbols.  We can
implement it with different symbols.

> Could you let me (or any other guys on this ml) have more precise
> description on the problem and/or what you want to do with this
> patch?

I will try to itemize the current problems with the epsilon method:

= Operator Inconsistencies

My current patches address all of them.

- Only some operators are using the epsilon.

I included the list of the ones which doesn't use the epsilon on
initial post of this thread.  This makes the operators not compatible
with each other.  For example, you cannot say "if box_a @> box_b and
box_b @> point then box_a @> box_b".

- The application of epsilon is implementation dependent.

Epsilon works between two numbers.  There is not always a single way
of implementing geometric operators.  This is surprising to the users.
For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
epsilon".

This is also creating a high barrier on developing those operators.
Fixing bugs and performance regressions are very likely to change the
results.

- Some operators are just wrong:

Line operators are not well maintained.  The following are giving
wrong results when neither A or B of lines are not exactly 1:

* line # line
* line <-> line
* point ## line
* point ## lseg

They are broken independently from the epsilon, though it is not easy
to fix them without changing the results of the cases that are
currently working.

- Some operators are violating commutative property.

For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".

- Some operators are violating transitive property.

For example, you cannot say "if point_a ~= point_b and point_b ~=
point_c then point_a ~= point_c".

- The operators with epsilon are not compatible with float operators.

This is also surprising for the users.  For example, you cannot say
"if point_a << point_b then point_a[0] < point_b[0]".

- The operators are not checking for underflow or overflow.
- NaN values are not treated same as the float datatype.

Those are independent problems, though checking them with epsilon
would create additional set of inconsistencies.  Epsilon creates
especially more problems around 0.

= Missing Bits on the Operator Classes

I want to address those in the future, if my patches are accepted.

- There is no GiST or SP-GiST support for cross datatype operators
other than containment.

We can develop nice cross-datatype operator families to support more
operators.  It would have been easier, if we could rely on some
operators to implement others.  Geometric types serve the purpose of
demonstrating our indexing capabilities.  We should make them good
examples, not full of special cases with hidden bugs.

- There is no BRIN support for types other than the box.

BRIN inclusion operator class is datatype agnostic.  It uses some
operators to implement others which doesn't work for anything more the
box vs box operators, because of the inconsistencies.

- GROUP BY and DISTINCT are not working.
- ORDER BY is not working.

There are regular user complaints about this on the mailing list.  We
can easily provide default hash and btree operator classes that would
fix the problem, if we haven't had the epsilon.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contains and is contained by operators of inet datatypes

2016-11-29 Thread Emre Hasegeli
> I do not like this bit from the original post:
>
> EH> The patch removes the recently committed SP-GiST index support for the
> EH> existing operator symbols to give move reason to the users to use the
> EH> new symbols.

I reverted this part.  The new version of the patch is attached.


0001-inet-contain-op-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-18 Thread Emre Hasegeli
> To keep such kind of integrity, we should deeply consider about
> errors.

My point is that I don't think we can keep integrity together with the
fuzzy behaviour, or at least I don't have the skills to do that.  I
can just leave the more complicated operators like "is a
point on a line" as it is, and only change the basic ones.  Do you
think a smaller patch like this would be acceptable?

> That's a fate of FP numbers. Btree is uasble for such data since
> it is constructed using inequality operators but hash is almost
> unsuitable without rounding and normalization because of the
> nature of floating point numbers. Range scan on bree will work
> find but on the other hand equality doesn't work even on btree
> index from the same reason to the below.

Btree is very useful with floats.  There is no problem with it.

> Again, FP numbers generally cannot match exactly each other. You
> have to avoid that.

Again, they do match very well.  Floating point numbers are no magic.
They are perfectly deterministic.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contains and is contained by operators of inet datatypes

2016-11-18 Thread Emre Hasegeli
> The new names might be better if we were starting in a green field,
> but in themselves they are not any more mnemonic than what we had, and
> what we had has been there for a lot of years.  Also, if we accept both
> old names and new (which it seems like we'd have to), that creates new
> opportunities for confusion, which is a cost that should not be
> disregarded.

This is true for existing users of those operators.  New names are
much easier to get by the new users.  We are using them on all other
datatypes.  Datatypes like JSON is more popular than inet.  Many
people already know what <@ and @> mean.

> The original post proposed that we'd eventually get some benefit by
> being able to repurpose << and >> to mean something else, but the
> time scale over which that could happen is so long as to make it
> unlikely to ever happen.

I think we will immediately get benefit from the new operators because
of other reasons.  Repurposing them in far future was a minor point.
I though having a long term plan on this minor point is better than
having no plan.

> I'm inclined to think we should just reject this patch.  I'm certainly not
> going to commit it without seeing positive votes from multiple people.

It is a fair position.  Anybody care to vote?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-14 Thread Emre Hasegeli
> What way to deal with it is in your mind? The problem hides
> behind operators. To fix it a user should rewrite a expression
> using more primitive operators. For example, (line_a # point_a)
> should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in
> more primitive way. I regared this that the operator # just
> become useless.

Simple equations like this works well with the current algorithm:

> select '(0.1,0.1)'::point @ '(0,0),(1,1)'::line;

The operator does what you expect from it.  Users can use something
like you described to get fuzzy behaviour with an epsilon they choose.

> Regarding optimization, at least gcc generates seemingly not so
> different code for the two. The both generally generates extended
> code directly calling isnan() and so. Have you measured the
> performance of the two implement (with -O2, without
> --enable-cassert)?  This kind of hand-optimization gets
> legitimacy when we see a siginificant difference, according to
> the convention here.. I suppose.

I tested it with this program:

> int main()
> {
>double  i,
>j;
>int result = 0;
>
>for (i = 0.1; i < 1.0; i += 1.0)
>for (j = 0.1; j < 1.0; j += 1.0)
>if (float8_lt(i, j))
>result = (result + 1) % 10;
>
>return result;
> }

The one calling cmp() was noticeable slower.

./test1  0.74s user 0.00s system 99% cpu 0.748 total
./test2  0.89s user 0.00s system 99% cpu 0.897 total

This would probably be not much noticeable by calling SQL functions
which are doing a few comparisons only, but it may be necessary to do
many more comparisons on some places.  I don't find the optimised
versions less clear than calling the cmp().  I can change it the other
way, if you find it more clear.

> At least the comment you dropped by the patch,
>
>  int
>  float4_cmp_internal(float4 a, float4 b)
>  {
> -   /*
> -* We consider all NANs to be equal and larger than any non-NAN. This 
> is
> -* somewhat arbitrary; the important thing is to have a consistent 
> sort
> -* order.
> -*/
>
> seems very significant and should be kept anywhere relevant.

I will add it back on the next version.

> I seached pgsql-general ML but counldn't find a complaint about
> the current behavior. Even though I'm not familar with PostGIS, I
> found that it uses exactly the same EPSILON method with
> PostgreSQL.

Is it?  I understood from Paul Ramsey's comment on this thread [1]
that they don't.

> If we had an apparent plan to use them for other than
> earth-scale(?)  geometric usage, we could design what they should
> be. But without such a plan it is just a breakage of the current
> usage.

We give no promises about the geometric types being useful in earth scale.

> About What kind of (needless) complication you are saying? The
> fuzziness seems to me essential for geometric comparisons to work
> practically. Addition to that, I don't think that we're not
> allowed to change the behavior in such area of released versions
> the time after time.

Even when it is a total mess?

> I don't think index scan and tolerant comparison are not
> contradicting. Could you let me have an description about the
> indexing capabilities and the inconsistencies?

The first problem is that some operators are not using the epsilon.
This requires special treatment while developing index support for
operators.  I have tried to support point for BRIN using the box
operators, and failed because of that.

Comparing differences with epsilon is not applicable the same way to
every operator.  Even with simple operators like "point in box" it
covers different distances outside the box depending on where the
point is.  For example, "point <-> box < EPSILON" wouldn't be
equivalent with "point <@ box", when the point is outside corner of
the box.  Things get more complicated with lines.  Because of these,
we are easily violating basic expectations of the operators:

> regression=# select '{1000,0.01,0}'::line ?|| '{9,0.9,0}'::line;
>
> ?column?
> --
> f
> (1 row)
>
> regression=# select '{9,0.9,0}'::line ?|| '{1000,0.01,0}'::line;
> ?column?
> --
> t
> (1 row)

Another problem is lack of hash and btree operator classes.  In my
experience, the point datatype is by far most used one.  People often
try to use it on DISTINCT, GROUP BY, or ORDER BY clauses and complain
when it doesn't work.  There are many complaints like this on the
archives.  If we get rid of the epsilon, we can easily add those
operator classes.

[1] 
https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-13 Thread Emre Hasegeli
> We can remove the fuzz factor altogether but I think we also
> should provide a means usable to do similar things. At least "is
> a point on a line" might be useless for most cases without any
> fuzzing feature. (Nevertheless, it is a problem only when it is
> being used to do that:) If we don't find reasonable policy on
> fuzzing operations, it would be the proof that we shouldn't
> change the behavior.

It was my initial idea to keep the fuzzy comparison behaviour on some
places, but the more I get into I realised that it is almost
impossible to get this right.  Instead, I re-implemented some
operators to keep precision as much as possible.  The previous "is a
point on a line" operator would *never* give the true result without
the fuzzy comparison.  The new implementation would return true, when
precision is not lost.  I think this is a behaviour people, who are
working with floating points, are prepared to deal with.  By the way,
"is a point on a line" operator is quite wrong with the fuzzy
comparison at the moment [1].

> The 0001 patch adds many FP comparison functions individually
> considering NaN. As the result the sort order logic involving NaN
> is scattered around into the functions, then, you implement
> generic comparison function using them. It seems inside-out to
> me. Defining ordering at one place, then comparison using it
> seems to be reasonable.

I agree that it would be simpler to use the comparison function for
implementing other operators.  I have done it other way around to make
them more optimised.  They are called very often.  I don't think
checking exit code of the comparison function would be optimised the
same way.  I could leave the comparison functions as they are, but
re-implemented them using the others to keep documentation of NaN
comparison in the single place.

> If the center somehow goes extremely near to the origin, it could
> result in a false error.
>
>> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
>> ERROR:  value out of range: underflow
>
> I don't think this underflow is an error, and actually it is a
> change of the current behavior without a reasonable reason. More
> significant (and maybe unacceptable) side-effect is that it
> changes the behavior of ordinary operators. I don't think this is
> acceptable. More consideration is needed.
>
>> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
>> ERROR:  value out of range: underflow

This is the current behaviour of float datatype.  My patch doesn't
change that.  This problem would probably also apply to multiplying
very small values.  I agree that this is not the ideal behaviour.
Though I am not sure, if we should go to a different direction than
the float datatypes.

I think there is value in making geometric types compatible with the
float.  Users are going to mix them, anyway.  For example, users can
calculate the center of a box manually, and confuse when the built-in
operator behaves differently.

> In regard to fuzzy operations, libgeos seems to have several
> types of this kind of feature. (I haven't looked closer into
> them). Other than reducing precision seems overkill or
> unappliable for PostgreSQL bulitins. As Jim said, can we replace
> the fixed scale fuzz factor by precision reduction? Maybe, with a
> GUC variable (I hear someone's roaring..) to specify the amount
> defaults to fit the current assumption.

I am disinclined to try to implement something complicated for the
geometric types.  I think they are mostly useful for 2 purposes: uses
simple enough to not worth looking for better solutions, and
demonstrating our indexing capabilities.  The inconsistencies harm
both of those.

[1] 
https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-13 Thread Emre Hasegeli
> The reason for that is that you forgot to edit an alternative
> exptect file, which matches for en_US locale.

Now I understand.  Thank you for the explanation.

> But the test doesn't for collation and the only difference
> between the alternative expect files comes from the difference of
> collation for the query. "the workaround" seems to be the right
> way to do it. I recommend rather to leave the workaroud as it is
> and remove select_views_1.out from the "expected" directory.

I will do.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contains and is contained by operators of inet datatypes

2016-11-13 Thread Emre Hasegeli
> - I am not convinced that your changes to the descriptions of the operators
> necessarily make things clearer. For example "is contained by and smaller
> network (subnet)" only mentions subnets and not IP-addresses.

I was trying to avoid confusion.  <@ is the "contained by" operator
which is also returning true when both sides are equal.  We shouldn't
continue calling <<@ also "contained by".  I removed the "(subnet)"
and "(supernet)" additions.  Can you think of any better wording?

> - Maybe change "deprecated and will eventually be removed." to "deprecated
> and may be removed in a future release.". I prefer that latter wording but I
> am fine with either.

I copied that note from the Geometric Functions and Operators page.

> - Won't renaming the functions which implement risk breaking people's
> applications? While the new names are a bit nicer I am not sure it is worth
> doing.

You are right.  I reverted that part.

> - The changes to the code look generally good.

Thank you for the review.  New version is attached.


0001-inet-contain-op-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-10 Thread Emre Hasegeli
> Returning to the issue, the following query should give you the
> expected result.
>
> SELECT name, #thepath  FROM iexit ORDER BY name COLLATE "C", 2;

Yes, I have worked around it like this.  What I couldn't understand is
how my patch can cause this regression.  How is it passes on master
without COLLATE?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gin and btree_gist for enums

2016-11-04 Thread Emre Hasegeli
>> Here is a patch to add enum support to btree_gin and btree_gist. I didn't
>> include distance operations, as I didn't think it terribly important, and
>> there isn't a simple way to compute it sanely and efficiently, so no KNN
>> support.

I don't think we can implement a meaningful distance operator for enums.

> This time including the data file for the gist regression tests.

It doesn't apply to HEAD anymore.  I have tested it on b12fd41.

The GiST part of it doesn't really work.  The current patch compares
oids.  We need to change it to compare enumsortorder.  I could make it
fail easily:

> regression=# create type e as enum ('0', '2', '3');
> CREATE TYPE
> regression=# alter type e add value '1' after '0';
> ALTER TYPE
> regression=# create table t as select (i % 4)::text::e from 
> generate_series(0, 10) as i;
> SELECT 11
> regression=# create index on t using gist (e);
> SEGFAULT

> +ALTER OPERATOR FAMILY gist_enum_ops USING gist ADD

Why don't we just create it with those changes?

> + * Use a similar trick to that used for numeric for enums, since we don't

Do you mean "similar trick that is used" or "trick similar to numeric"?

> + * actually know the leftmost value of any enum without knowing the concrete
> + * type, so we use a dummy leftmost value of InvalidOid.

> +return DatumGetBool(
> +DirectFunctionCall2(enum_gt, 
> ObjectIdGetDatum(*((const Oid *) a)), ObjectIdGetDatum(*((const Oid *) b)))
> +);

Wouldn't it be nice to refactor enum_cmp_internal() to accept typcache
and use it there like the rangetypes?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE

2016-10-31 Thread Emre Hasegeli
The BRIN Bitmap Index Scan has the same problem.  I have seen people
confused by this.  I think N/A would clearly improve the situation.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FTS Configuration option

2016-10-13 Thread Emre Hasegeli
> With such syntax we also don't need the TSL_FILTER flag for lexeme. At
> the current time unaccent extension set this flag to pass a lexeme to
> a next dictionary. This flag is used by the text-search parser. It
> looks like a hard coded solution. User can't change this behaviour.

Exactly.

> Maybe also better to use -> instead of AND? AND would has another
> behaviour. I could create the following configuration:
>
> => ALTER TEXT SEARCH CONFIGURATION multi_conf
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part
> WITH (german_ispell AND english_ispell) OR simple;
>
> which will return both german_ispell and english_ispell results. But
> I'm not sure that this is a good solution.

I see you usecase for AND.  It might indeed be useful.  AND suits well to it.

Maybe THEN can be the keyword instead of -> for pass the results to
subsequent dictionaries.  They are all reserved keywords.  I guess it
wouldn't be a problem to use them.

> Of course if this syntax will be implemented, old syntax with commas
> also should be maintained.

Yes, we should definitely.  The comma can be interpreted either one of
the keywords depending on left hand side dictionary.

I would be glad to review, if you develop this feature.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FTS Configuration option

2016-10-12 Thread Emre Hasegeli
> => ALTER TEXT SEARCH CONFIGURATION multi_conf
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part
> WITH german_ispell (JOIN), english_ispell, simple;

I have something like this in my mind since I dealt with FTS for a
Turkish real estate listing application.  Being able to pipe output of
some dictionaries is a nice feature we have since 9.0, but it is not
always sufficient.  I think it is wrong to decide this per dictionary
bases.  Something slightly more complicated to connect dictionaries
parallel or serial to each other might be more useful.

My problem was related to the special characters on Turkish (ç, ğ, ı,
ö, ü).  It is very common to just type 7-bit-close-looking characters
(c, g, i, o, u) instead of those.  Unaccent extension changes them as
desired, and passes the altered words to the subsequent dictionary,
when this configuration is changed like this:

> ALTER TEXT SEARCH CONFIGURATION turkish
> ALTER MAPPING FOR word, hword, hword_part
> WITH unaccent, turkish_stem;

However then the stemmer doesn't do a good job on those words, because
the changed characters are important for the language.  What I really
needed was something like this:

> ALTER TEXT SEARCH CONFIGURATION turkish
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, 
> hword_part
> WITH (fix_mistyped_characters AND (turkish_hunspell OR turkish_stem) AND 
> unaccent);


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Contains and is contained by operators of inet datatypes

2016-10-01 Thread Emre Hasegeli
Attached patch adds <@, @>, <<@, and @>> operator symbols for inet
datatype to replace <<=, >>=, <<, and >>.  <@ and @> symbols are used
for containment for all datatypes except inet, particularly on the
geometric types, arrays; cube, hstore, intaray, ltree extensions.

<@ and @> symbols are standardised as on version 8.2 by Tom Lane at
2006 [1].  The previous symbols are left in-place but deprecated.  The
patch does exactly the same for inet datatypes.

The << and >> are standard symbols for strictly left of and strictly
right of operators.  Those operators would also make sense for inet
datatypes.  If we make this change now; we can remove the symbols, and
reuse them for new operators in distant future.

The patch removes the recently committed SP-GiST index support for the
existing operator symbols to give move reason to the users to use the
new symbols.  This change will also indirectly deprecate the
undocumented non-transparent btree index support that works sometimes
for some of the existing operators [2].

The patch includes changes on the regression tests and the
documentation.  I will add it to 2016-11 Commitfest.

[1] https://www.postgresql.org/message-id/14277.1157304...@sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/389.1042992...@sss.pgh.pa.us


0001-inet-contain-op-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-29 Thread Emre Hasegeli
> Well, those two results are not contradictory -- notice that you
> switched the order of the values in the comparison.  I don't think
> you've really found the explanation yet.

I am sorry I copy-pasted the wrong example:

"select_views" test runs:

> SELECT name, #thepath FROM iexit ORDER BY 1, 2

and expects to get rows in this order:

>  I- 580Ramp |8
>  I- 580/I-680  Ramp |2

with the collation on my laptop, this is actually true:

> regression=# select 'I- 580Ramp' < 'I- 580/I-680  
> Ramp';
>  ?column?
> --
>  t
> (1 row)

on the Linux server I am testing, it is not:

> regression=# select 'I- 580Ramp' < 'I- 580/I-680  
> Ramp';
>  ?column?
> --
>  f
> (1 row)

The later should be the case on your environment as the test was also
failing for you.  This is not consistent with the expected test
result.  Do you know how this test can still pass on the master?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-28 Thread Emre Hasegeli
> Emre, are you going to address the above?  It would have to be Real
> Soon Now.

Yes, I am working on it.  I found more problems, replaced more
algorithms.  That took a lot of time.  I will post the new version
really soon.  I wouldn't feel bad, if you wouldn't have enough time to
review it in this commitfest.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-11 Thread Emre Hasegeli
> Why not just disallow dropping a value that's still in use? That's certainly
> what I would prefer to happen by default...

Of course, we should disallow it.  That problem is what to do next.
We cannot just remove the value, because it might still be referenced
from the inner nodes of the indexes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-08 Thread Emre Hasegeli
> Given that you are now familiar with the internals of how enums are
> implemented would it be possible to continue the work and add the "ALTER
> TYPE  DROP VALUE " command?

I was thinking to try developing it, but I would be more than happy to
help by testing and reviewing, if someone else would do.  I don't
think I have enough experience to think of all details of this
feature.

The main problem that has been discussed before was the indexes.  One
way is to tackle with it is to reindex all the tables after the
operation.  Currently we are doing it when the datatype of indexed
columns change.  So it should be possible, but very expensive.

Another way, Thomas Munro suggested when we were talking about it,
would be to add another column to mark the deleted rows to the catalog
table.  We can check for this column before allowing the value to be
used.  Indexes can continue using the deleted values.  We can also
re-use those entries when someone wants to add new value to the
matching place.  It should be safe as long as we don't update the sort
order.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-07 Thread Emre Hasegeli
> Bottom line here is that I'd rather commit ALTER TYPE RENAME VALUE with
> no EXISTS features and then see it accrete those features together with
> other types of RENAME, when and if there's a will to make that happen.

This sounds like a good conclusion to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-04 Thread Emre Hasegeli
> I started looking at this patch.  I'm kind of unhappy with having *both*
> IF EXISTS and IF NOT EXISTS options on the statement, especially since
> the locations of those phrases in the syntax seem to have been chosen
> with a dartboard.  This feels way more confusing than it is useful.
> Is there really a strong use-case for either option?  I note that
> ALTER TABLE RENAME COLUMN, which is probably used a thousand times
> more often than this will be, has so far not grown either kind of option,
> which sure makes me think that this proposal is getting ahead of itself.

I think they can be useful.  I am writing a lot of migration scripts
for small projects.  It really helps to be able to run parts of them
again.  ALTER TYPE ... ADD VALUE already have IF NOT EXISTS option.  I
don't think we would lose anything to support both of them in here.

The syntax ALTER TYPE ... RENAME VALUE [ IF EXISTS ] ... TO ... [ IF
NOT EXISTS ] looks self-explaining to me.  I haven't confused when I
first saw.  IF EXISTS applying to the old value, IF NOT EXISTS
applying to the new value, are the only reasonable semantics one might
expect from renaming things.  Anybody is interpreting it wrong? or can
think of another syntax?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Transactional enum additions - was Re: [HACKERS] Alter or rename enum value

2016-09-04 Thread Emre Hasegeli
> Got around to looking at this.  Attached is a revised version that I think
> is in committable shape.  The main non-cosmetic change is that the test
> for "type was created in same transaction as new value" now consists of
> comparing the xmins of the pg_type and pg_enum rows, without consulting
> GetCurrentTransactionId().  I did not like the original coding because
> it would pointlessly disallow references to enum values that were created
> in a parent transaction of the current subxact.  This way is still leaving
> some subxact use-cases on the table, as noted in the code comments, but
> it's more flexible than before.

Thank you for picking this up.  The patch looks good to me.  I think
this is a useful to support adding values to the enum created in the
same transaction.

> +   /*
> +* If the row is hinted as committed, it's surely safe.  This provides a
> +* fast path for all normal use-cases.
> +*/
> +   if (HeapTupleHeaderXminCommitted(enumval_tup->t_data))
> +   return;
> +
> +   /*
> +* Usually, a row would get hinted as committed when it's read or loaded
> +* into syscache; but just in case not, let's check the xmin directly.
> +*/
> +   xmin = HeapTupleHeaderGetXmin(enumval_tup->t_data);
> +   if (!TransactionIdIsInProgress(xmin) &&
> +   TransactionIdDidCommit(xmin))
> +   return;

This looks like transaction internal logic inside enum.c to me.  Maybe
it is because I am not much familiar with the internals.  I couldn't
find a similar pattern anywhere else, but it might still be useful to
invent something like HeapTupleHeaderXminReallyCommitted().

One risk about this feature is that the future enum functions would
not check if the value is safe to return.  Maybe we should append a
warning to the file header about this.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-08-24 Thread Emre Hasegeli
> Aside from the disturbing frequency of 100-to-1 split ratios, it also
> looks like the inclusion of the masklen bit is hardly pulling its weight,
> though that might be a artifact of this data set.

I was expecting this.  Including masklen bit to decision was something
we could easily do.  It doesn't make the index more complicated, even
more simpler.  I think it would be useful, when host addresses with
masklen are indexed.  IRRExplorer dataset is just networks.

> I think that it'd be worth investigating changing to a split rule that
> uses the next two or three or four bits of the address, not just the
> single next bit, to index into more than four subnodes.  It's pretty clear
> how to adjust the insertion functions to support that, and a crude hack in
> that line suggested that the index does get noticeably smaller.  However,
> I didn't quite grok how to adjust the search (consistent) functions.
> Obviously we could apply the same rules in a loop, considering each
> successive address bit, but there may be a better way.

I have experimented with such designs before posting this patch, but
couldn't make any of them work.  It gets very complicated when more
than one bit is used.  When only 2 bits are used, there would be 12
child nodes:

* network bits 00
* network bits 01
* network bits 10
* network bits 11
* network bit 0 and host bit 0
* network bit 0 and host bit 1
* network bit 1 and host bit 0
* network bit 1 and host bit 1
* host bits 00
* host bits 01
* host bits 10
* host bits 11

Another design would be a prefix tree.  We could split by using a
byte, and store that byte as label.  Then there wouldn't be static
number of nodes.  We would need to iterate trough the labels and
re-execute the condition on all of them.  Surely this would perform
better for some working sets, but I don't think on all them.  I will
experiment with this, if I get the chance.

I belive the current index is useful as it is.  The wasted space
depends on the distribution of the keys:

> # create table a3 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 8)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create table a4 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 16)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create table a5 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 32)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create table a6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create index on a3 using spgist (inet);
> CREATE INDEX
> # create index on a4 using spgist (inet);
> CREATE INDEX
> # create index on a5 using spgist (inet);
> CREATE INDEX
> # create index on a6 using spgist (inet);
> CREATE INDEX
> # \di+
>   List of relations
> Schema |Name | Type  |  Owner   | Table | Size  | Description
> ---+-+---+--+---+---+-
> public | a3_inet_idx | index | hasegeli | a3| 42 MB |
> public | a4_inet_idx | index | hasegeli | a4| 46 MB |
> public | a5_inet_idx | index | hasegeli | a5| 55 MB |
> public | a6_inet_idx | index | hasegeli | a6| 56 MB |
> (4 rows)

It also gets better with the number of keys:

> # create table a6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 100) 
> as i;
> SELECT 100
> # create table b6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 200) 
> as i;
> SELECT 200
> # create table c6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 400) 
> as i;
> SELECT 400
> # create table d6 as select (trunc(random() * 256)::text || '.' || 
> trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 800) 
> as i;
> SELECT 800
> # create index on a6 using spgist (inet);
> CREATE INDEX
> # create index on b6 using spgist (inet);
> CREATE INDEX
> # create index on c6 using spgist (inet);
> CREATE INDEX
> # create index on d6 using spgist (inet);
> CREATE INDEX
> # \di+
>   List of relations
> Schema |Name | Type  |  Owner   | Table | Size   | Description
> ---+-+---+--+---++-
> public | a6_inet_idx | index | hasegeli | a6| 56 MB  |
> public | b6_inet_idx | index | hasegeli | b6| 109 MB |
> public | c6_inet_idx | index | hasegeli | c6| 181 MB |
> public | d6_inet_idx | index | hasegeli | a6| 336 MB |
> (4 rows)

Thank you for committing this.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] [PATCH] Alter or rename enum value

2016-08-24 Thread Emre Hasegeli
> That would require changing it from an AlterEnumStmt to a RenameStmt
> instead.  Those look to me like they're for renaming SQL objects,
> i.e. things addressed by identifiers, whereas enum labels are just
> strings.  Looking at the implementation of a few of the functions called
> by ExecRenameStmt(), they have very little in common with
> RenameEnumLabel() logic-wise.

Yes, you are right that this is not an identifier like others.  On the
other hand, all RENAME xxx TO yyy statements are RenameStmt at the
moment.  I think we better leave the decision to the committer.

>> What is "nbr"?  Why not calling them something like "i" and "val"?
>
> This is the same naming as used in AddEnumLabel(), which I used as a
> guide.

I see.  Judging from there, it should be shortcut for neighbour.  We
better use something else.

>> Maybe we better release them before reporting error, too.  I would
>> release the list after the loop, close the heap before ereport().
>
> As Tom said, this gets released automatically on error, and is again
> similar to how AddEnumLabel() does it.

I still don't see any reason not to ReleaseCatCacheList() after the
loop instead of writing it 3 times.

> Here is an updated patch.

I don't know, if it is used by the committer or not, but "existance"
is a typo on the commit message.

Other than these, it looks good to me.  I am marking it as Ready for Committer.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-08-21 Thread Emre Hasegeli
> Here is v4, which changes the command from ALTER VALUE to RENAME VALUE,
> for consistency with RENAME ATTRIBUTE.

It looks like we always use "ALTER ... RENAME ... old_name TO
new_name" syntax, so it is better that way.  I have noticed that all
the other RENAMEs go through ExecRenameStmt().  We better do the same.

> +int nbr_index;
> +Form_pg_enum nbr_en;

What is "nbr"?  Why not calling them something like "i" and "val"?

> +   /* Locate the element to rename and check if the new label is already in

The project style for multi-line commands is to leave the first line
of the comment block empty.

> +if (strcmp(NameStr(nbr_en->enumlabel), oldVal) == 0)

I found namestrcmp() for this.

> +}
> +if (!old_tup)

I would leave a space in between.

> +ReleaseCatCacheList(list);
> +heap_close(pg_enum, RowExclusiveLock);

Maybe we better release them before reporting error, too.  I would
release the list after the loop, close the heap before ereport().


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-08-21 Thread Emre Hasegeli
> ... this part of the patch breaks the existing API for SP-GiST opclasses.
> That is a hard sell.  It may only matter for one existing opclass in core,
> but unless we have reason to think nobody is using any custom SP-GiST
> opclasses, that is not a pleasant thing to do.  How important is it really
> for this opclass?  Several of the existing opclasses use fixed numbers of
> child nodes, so why does this need something they don't?

This addition is required to implement the operator class cleanly.  We
could store the id of the child node as a "label", and add nodes when
labels are missing, but this complicates all operator class functions.
Other designs are also possible using the labels, but a simple design
with fixed number of nodes worked best for me.

Currently, SP-GiST framework supports fixed number of child nodes, if
the index is growing by page splits, not by prefix splits.  This is
not a fair API.  I assume it is designed by only having the prefix
tree in mind.  The change makes SP-GiST more reusable.

SP-GiST is not widely adopted in my experience.  Breaking this part of
the API would affect prefix tree implementations.  I don't think there
are much of them.  Supporting the new interface is easy for them.  We
can try to support the old interface by side of the new one, but this
wouldn't be very clean either.

I thought we were breaking the C interface in similar ways on major releases.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-08-18 Thread Emre Hasegeli
> Emre, I noticed you modified the commitfest entry
> (https://commitfest.postgresql.org/10/588/) to be for Andrew's
> transactional enum addition patch instead, but didn't change the title.
> I'll revert that as soon as it picks up this latest patch.  Do you wish
> to remain a reviewer for this patch, or should I remove you?

I confused with Andrew's transactional enum addition patch.  I guess
we need a separate entry on Commitfest App for that part of the thread
[1].  I am not sure, if that is possible.  I will do my best to review
both of them.

[1] https://www.postgresql.org/message-id/56ffe757.1090...@dunslane.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-08-18 Thread Emre Hasegeli
> I did *not* push the hunk in citext.sgml, since that was alleging support
> that doesn't actually exist in this patch.  To make this work for citext,
> we need to add wrapper functions similar to citext's wrappers for
> regexp_matches.  And that in turn means a citext extension version bump,
> which makes it more notationally tedious than I felt like dealing with
> today.  I'm bouncing that requirement back to you to produce a separate
> patch for.

It is attached.


citext-regexp-match-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per-statement-level INSTEAD OF triggers

2016-08-10 Thread Emre Hasegeli
> It might be more useful after we get the infrastructure that Kevin's been
> working on to allow collecting all the updates into a tuplestore that
> could be passed to a statement-level trigger.  Right now I tend to agree
> that there's little point.

Maybe, this can be used to re-implement FOREIGN KEYs.  Never-ending
bulk DELETEs caused by lack of indexes on foreign key columns are
biting novice users quite often in my experience.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-11 Thread Emre Hasegeli
> I think that probably the most reasonable answer is to replace all the
> raw "double" comparisons in this code with float8_cmp_internal() or
> something that's the moral equivalent of that.  Does anyone want to
> propose something else?

We can probably get away by changing the comparison on the GiST code.
It is not likely to cause inconsistent results.  Comparisons with NaN
coordinates don't return true to anything, anyway:

# select '(3,4),(nan,5)'::box = '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

# select '(3,4),(nan,5)'::box < '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

# select '(3,4),(nan,5)'::box > '(3,4),(nan,5)'::box;
 ?column?
--
 f
(1 row)

> More generally, this example makes me fearful that NaN coordinates in
> geometric values are likely to cause all sorts of issues.  It's too late
> to disallow them, probably, but I wonder how can we identify other bugs
> of this ilk.

Is it reasonable to disallow NaN coordinates on the next major
release.  Are there any reason to deal with them?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regexp_match() returning text

2016-06-04 Thread Emre Hasegeli
> The main problem being solved is the use of a SETOF result.  I'm inclined to
> prefer that the final, single, result is still an array.

I have changed it like that.  New patch attached.

> I've got a style issue with the information_schema - I like to call it
> useless-use-of-E'' -  but that was there long before this patch...

We better not touch it.

> /* user mustn't specify 'g' for regexp_split */ - do we add "or
> regexp_match" or just removed the extraneous detail?

I don't think it would be a nice error message.

> There seems to be scope creep regarding "regexp_split_to_table" that I'm
> surprised to find.  Related to that is the unexpected removal of the
> "force_glob" parameter to setup_regexp_matches.  You took what was a single
> block of code and now duplicated it without any explanation in the commit
> message (a code comment wouldn't work for this kind of change).  The change
> to flags from passing a pointer to text to passing in a pointer to a
> previously derived pg_re_flags makes more sense on its face, and it is
> apparently a non-public API, but again constitutes a refactoring that at
> least would ideally be a separate commit from the one the introduces the new
> behavior.

That check doesn't belong to setup_regexp_matches() in the first place.
The arguments of the function are organised to be caller agnostic,
and then it gives an error on behalf of regexp_split().  The check
fits better to the regexp_split() functions even with duplication.

I can split it to another patch, but I think these kind of changes most
often go together.

Would you mind adding yourself to the reviewers on the Commitfest app?
I think you have already read though it.

Thanks for all the feedback.
From a6f24b34fdb39eaaca1d3819f7f528b1689725a4 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli <e...@hasegeli.com>
Date: Sun, 29 May 2016 18:53:37 +0200
Subject: [PATCH] Add regexp_match()

---
 doc/src/sgml/citext.sgml  |   5 ++
 doc/src/sgml/func.sgml|  62 +-
 src/backend/utils/adt/regexp.c| 119 +-
 src/include/catalog/pg_proc.h |   4 ++
 src/include/utils/builtins.h  |   2 +
 src/test/regress/expected/regex.out   |  28 
 src/test/regress/expected/strings.out |   4 +-
 src/test/regress/sql/regex.sql|   7 ++
 8 files changed, 183 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 7fdf302..9b4c68f 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry';
   
 
   
Similarly, all of the following functions perform matching
case-insensitively if their arguments are citext:
   
 
   

 
+  regexp_match()
+
+   
+   
+
   regexp_matches()
 


 
   regexp_replace()
 


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff7545d..64e975e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1935,32 +1935,49 @@
 or, if the argument is null, return NULL.
 Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)
'42.5'
   
 
   

 
+ regexp_match
+
+regexp_match(string text, pattern text [, flags text])
+   
+   text[]
+   
+Return a single captured substring resulting from matching a POSIX
+regular expression against the string. See
+ for more information.
+   
+   regexp_match('foobarbequebaz', '(bar)(beque)')
+   {bar,beque}
+  
+
+  
+   
+
  regexp_matches
 
 regexp_matches(string text, pattern text [, flags text])

setof text[]

 Return all captured substrings resulting from matching a POSIX regular
 expression against the string. See
  for more information.

-   regexp_matches('foobarbequebaz', '(bar)(beque)')
-   {bar,beque}
+   regexp_matches('foobarbequebaz', 'ba.', 'g')
+   {bar}{baz} (2 rows)
   
 
   

 
  regexp_replace
 
 regexp_replace(string text, pattern text, replacement text [, flags text])

text
@@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression
 pattern matching


 substring


 regexp_replace


+regexp_match
+   
+   
 regexp_matches


 regexp_split_to_table


 regexp_split_to_array

 

@@ -4168,66 +4188,78 @@ substring('foobar' from 'o(.)b')   o
 regexp_replace('foobarbaz', 'b..', 'X')
fooXbaz
 regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regex

[HACKERS] regexp_match() returning text

2016-05-29 Thread Emre Hasegeli
Attached patch adds regexp_match() function which is a simple variant of
regexp_matches() that doesn't return a set.  It is based on Tom Lane's
comment to bug #10889 [1].

[1] https://www.postgresql.org/message-id/23769.1404747...@sss.pgh.pa.us
From f8c113c77864ef1ca6386195aea02b2090ff17b6 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli <e...@hasegeli.com>
Date: Sun, 29 May 2016 18:53:37 +0200
Subject: [PATCH] Add regexp_match()

---
 doc/src/sgml/citext.sgml   |   5 ++
 doc/src/sgml/func.sgml |  59 +++---
 src/backend/catalog/information_schema.sql |   2 +-
 src/backend/utils/adt/regexp.c | 125 +
 src/include/catalog/pg_proc.h  |   4 +
 src/include/utils/builtins.h   |   2 +
 src/test/regress/expected/regex.out|  28 +++
 src/test/regress/expected/strings.out  |   4 +-
 src/test/regress/sql/regex.sql |   7 ++
 9 files changed, 188 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 7fdf302..9b4c68f 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -119,20 +119,25 @@ SELECT * FROM users WHERE nick = 'Larry';
   
 
   
Similarly, all of the following functions perform matching
case-insensitively if their arguments are citext:
   
 
   

 
+  regexp_match()
+
+   
+   
+
   regexp_matches()
 


 
   regexp_replace()
 


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff7545d..503dfe5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1935,20 +1935,37 @@
 or, if the argument is null, return NULL.
 Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)
'42.5'
   
 
   

 
+ regexp_match
+
+regexp_match(string text, pattern text [, flags text])
+   
+   text
+   
+Return a single captured substring resulting from matching a POSIX regular
+expression against the string. See
+ for more information.
+   
+   regexp_match('foobarbequebaz', 'barbeque')
+   barbeque
+  
+
+  
+   
+
  regexp_matches
 
 regexp_matches(string text, pattern text [, flags text])

setof text[]

 Return all captured substrings resulting from matching a POSIX regular
 expression against the string. See
  for more information.

@@ -4009,20 +4026,23 @@ substring('foobar' from '#"o_b#"%' for '#')NULLregular expression
 pattern matching


 substring


 regexp_replace


+regexp_match
+   
+   
 regexp_matches


 regexp_split_to_table


 regexp_split_to_array

 

@@ -4168,66 +4188,81 @@ substring('foobar' from 'o(.)b')   o
 regexp_replace('foobarbaz', 'b..', 'X')
fooXbaz
 regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
fooXarYXazY
 

 
 
+ The regexp_match function returns the first captured
+ substring resulting from matching a POSIX regular expression
+ pattern.  It has the syntax
+ regexp_match(string, pattern
+ , flags ).
+ If the pattern does not match, the function returns
+ NULL.  It ignores the parenthesized subexpressions in
+ the pattern.  regexp_matches can be used in
+ this case (see below for details).
+ The flags parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior.  Supported flags
+ are described in .
+
+
+
  The regexp_matches function returns a text array of
  all of the captured substrings resulting from matching a POSIX
- regular expression pattern.  It has the syntax
- regexp_matches(string, pattern
- , flags ).
+ regular expression pattern.  It has the same syntax as
+ regexp_match.
  The function can return no rows, one row, or multiple rows (see
  the g flag below).  If the pattern
  does not match, the function returns no rows.  If the pattern
  contains no parenthesized subexpressions, then each row
  returned is a single-element text array containing the substring
  matching the whole pattern.  If the pattern contains parenthesized
  subexpressions, the function returns a text array whose
  n'th element is the substring matching the
  n'th parenthesized subexpression of the pattern
  (not counting non-capturing parentheses; see below for
  details).
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
- function's behavior

Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2016-05-29 Thread Emre Hasegeli
> My interpretation of the standard is that FILTER is not allowable for
> a window function, and IGNORE|RESPECT NULLS is not allowable for an
> ordinary aggregate.

Yes, it is clear.

> So if we support IGNORE|RESPECT NULLS for anything other than a window
> function, we have to come up with our own semantics.

I don't think this clause is useful for aggregates especially while we
already have the FILTER clause.  Though, I can see this error message
being useful:

> ERROR:  IGNORE NULLS is only implemented for the lead and lag window functions

Can we still give this message when the syntax is not part of the OVER clause?

Thank you for returning back to this patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Floating point comparison inconsistencies of the geometric types

2016-05-27 Thread Emre Hasegeli
There are those macros defined for the built-in geometric types:

> #define EPSILON 1.0E-06

> #define FPzero(A)   (fabs(A) <= EPSILON)
> #define FPeq(A,B)   (fabs((A) - (B)) <= EPSILON)
> #define FPne(A,B)   (fabs((A) - (B)) > EPSILON)
> #define FPlt(A,B)   ((B) - (A) > EPSILON)
> #define FPle(A,B)   ((A) - (B) <= EPSILON)
> #define FPgt(A,B)   ((A) - (B) > EPSILON)
> #define FPge(A,B)   ((B) - (A) <= EPSILON)

with this warning:

>  *XXX These routines were not written by a numerical analyst.

Most of the geometric operators use those macros for comparison, but
those  do not:

* polygon << polygon
* polygon &< polygon
* polygon &> polygon
* polygon >> polygon
* polygon <<| polygon
* polygon &<| polygon
* polygon |&> polygon
* polygon |>> polygon
* box @> point
* point <@ box
* lseg <@ box
* circle @> point
* point <@ circle

This is really a bug that needs to be fixed one way or another.  I think
that it is better to fix it by removing the macros all together.  I
am not sure how useful they are in practice.  I haven't seen anyone
complaining about the above operators not using the macros.  Though,
people often complain about the ones using the macros and the problems
caused by them.

The hackers evidently don't like the macros, either.  That should be
why they are not used on the new operators.  What annoys me most about
this situation is the inconsistency blocks demonstrating our indexes.
Because of this, we had to rip out point type support from BRIN
inclusion operator class which could be useful to PostGIS.

Fixing it has been discussed many times before [1][2][3][4] with
no result.  Here is my plan to fix the situation covering the problems
around it:

1) Reimplement some operators to avoid divisions

The attached patch does it on some of the line operators.

2) Use exact comparison on everywhere except the operators fuzzy
   comparison is certainly required

The attach patch changes all operators except some "lseg" operators.
"lseg" stores two points on a line.  Most of the operations done on it
are lossy.  I don't see a problem treating them differently, those
operators are very unlikely to be index supported.

3) Check numbers for underflow and overflow

I am thinking to use CHECKFLOATVAL on utils/adt/float.c, but it is not
exposed outside at the moment.  Would it be okay to create a new header
file utils/float.h to increase code sharing between float and geometric
types?

4) Implement relative fuzzy comparison for the remaining operators

It is better to completely get rid of those macros while we are on it.
I think we can get away by implementing just a single function for fuzzy
equality, not for other comparisons.  I am inclined to put it to
utils/adt/float.c.  Is this a good idea?

Tom Lane commented on the function posted to the list [1] on 2002:

> Not like that. Perhaps use a fraction of the absolute value of the
> one with larger absolute value.  As-is it's hard to tell how FLT_EPSILON
> is measured.

I cannot image how the function would look like.  I would appreciate
any guidance.

5) Implement default hash operator class for all geometric types

This would solve most complained problem of those types allowing them
to used with DISTINCT and GROUP BY.

6) Implement default btree operator class at least for the point type

This would let the point type to be used with ORDER BY.  It is
relatively straight forward to implement it for the point type.  Is it
a good idea to somehow implement it for other types?

7) Add GiST index support for missing cross type operators

Currently only contained by operators are supported by an out-of-range
strategy number.  I think we can make the operator classes much nicer
by allowing really cross type operator families.

Comments?

[1] 
https://www.postgresql.org/message-id/flat/d90a5a6c612a39408103e6ecdd77b8290fd...@voyager.corporate.connx.com
[2] https://www.postgresql.org/message-id/flat/4a7c2c4b.5020...@netspace.net.au
[3] https://www.postgresql.org/message-id/flat/12549.1346111...@sss.pgh.pa.us
[4] 
https://www.postgresql.org/message-id/flat/20150512181307.gj2...@alvh.no-ip.org
From aa79e331595860489cdbbdce2d5f35a7d1f33783 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli <e...@hasegeli.com>
Date: Wed, 25 May 2016 17:53:19 +0200
Subject: [PATCH] Stop using FP macros on geo_ops.c

---
 src/backend/access/gist/gistproc.c|  17 +-
 src/backend/access/spgist/spgkdtreeproc.c |  24 +--
 src/backend/utils/adt/geo_ops.c   | 312 +++---
 src/backend/utils/adt/geo_spgist.c|  24 +--
 src/include/utils/geo_decls.h |  16 --
 src/test/regress/expected/point.out   |   8 +-
 6 files changed, 193 insertions(+), 208 deletions(-)

diff --git a/src/backend/acces

Re: [HACKERS] WIP: Access method extendability

2016-04-04 Thread Emre Hasegeli
I think the variable "magick" should be "magic".  Patch attached.


bloom-magick.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-31 Thread Emre Hasegeli
> Thank you, pushed

Thank you for working on this.

I noticed that have overlooked this:

 static RectBox *
-initRectBox()
+initRectBox(void)
 {


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alter or rename enum value

2016-03-28 Thread Emre Hasegeli
> I do not know whether this would be a meaningful improvement for
> common use-cases, though.  (It'd help if people were more specific
> about the use-cases they need to work.)

For what its worth, in the company I am working for, InnoGames GmbH,
not being able to alter enums is the number one pain point with
PostgreSQL.  We have hundreds of small databases on the backend of the
game worlds.  They are heavily using enums.  New values to the enums
need to be added or to be removed for the new features.  We are
relying on the transactions for the database migrations, so we
couldn't make use of ALTER TYPE ADD VALUE.

Some functions found on the Internet [1] which change the values on
the catalog had been used, until they corrupted some indexes.  (I
believe those functions are still quite common.)  Now, we are using a
function to replace an enum type on all tables with another one, but
we are not at all happy with this solution.  It requires all objects
which were using the enum to be dropped and recreated, and it rewrites
the tables, so it greatly increases the migration time and effort.

[1] http://en.dklab.ru/lib/dklab_postgresql_enum/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-27 Thread Emre Hasegeli
>>> I'll try to explain with two-dimensional example over points. ASCII-art:
>>
>> Thank you for the explanation.  Should we incorporate this with the patch.
>
> added

I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.

>>> + cmp_double(const double a, const double b)
>>
>> Does this function necessary?  We can just compare the doubles.
>
> Yes, it compares with limited precision as it does by geometry operations.
> Renamed to point actual arguments.

I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.

>>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
>>
>> The "rectangle" variable in here should be renamed.
>
> fixed

I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.

> evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
> evalRangeBox() will palloc its result then we need to copy its result into
> RangeBox struct and free. Let both fucntion use the same interface.

I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.

> it works until allthesame branch contains only one inner node. Otherwise
> traversalValue will be freeed several times, see spgWalk().

It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist


q4d-5.patch
Description: Binary data


box-index-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-24 Thread Emre Hasegeli
> +  * boxtype_spgist.c

The names on the file header need to be changed, too.

> I'll try to explain with two-dimensional example over points. ASCII-art:
>|
>|
> 1  |  2
>|
> ---+-
>|P
>  3 |  4
>|
>|
>
> '+' with 'A' represents a centroid or, other words, point which splits plane
> for 4 quadrants and it strorend in parent node. 1,2,3,4 are labels of
> quadrants, each labeling will be the same for all pictures and all
> centriods, and i will not show them in pictures later to prevent too
> complicated images. Let we add C - child node (and again, it will split
> plane for 4 quads):
>
>
>| |
>+-+---
>  X |B|C
>| |
> ---+-+---
>|P|A
>| |
>|
>|
>
> A and B are points of intersection of lines. So, box PBCAis a bounding box
> for points contained in 3-rd (see labeling above). For example X labeled
> point is not a descendace of child node with centroid  C because it must be
> in branch of 1-st quad of parent node. So, each node (except root) will have
> a limitation in its quadrant. To transfer that limitation the traversalValue
> is used.
>
> To keep formatting I attached a txt file with this fragment.

Thank you for the explanation.  Should we incorporate this with the patch.

>>> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development
>>> Group
>>
>> 2016.
>
> fixed

Not on the patch.

> + cmp_double(const double a, const double b)

Does this function necessary?  We can just compare the doubles.

> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)

The "rectangle" variable in here should be renamed.

> + Assert(is_infinite(b) == 0);

This is failing on my test.  You can reproduce with the script I have sent.

>> Wouldn't it be simpler to palloc and return the value on those functions?
>
> evalRangeBox() initializes part of RectBox, so, it could not palloc its
> result.
> Other methods use the same signature just for consistency.

I couldn't understand it.  evalRangeBox() can palloc and return the
result.  evalRectBox() can return the result palloc'ed by
evalRangeBox().  The caller wouldn't need to palloc anything.

>> Many variables are defined with "const".  I am not sure they are all
>> that doesn't change.  If it applies to the pointer, "out" should also
>> not change in here.  Am I wrong?
>
> No, changes

I now read about "const".  I am sorry for not being experienced in C.
The new version of the patch marks them as "const" by mistake.

I went through all other variables:

> + int r = is_infinite(a);
>
> + double  x = *(double *) a;
> + double  y = *(double *) b;
>
> + spgInnerConsistentIn *in = (spgInnerConsistentIn *) 
> PG_GETARG_POINTER(0);
>
> + spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
>
> + BOX*leafBox = DatumGetBoxP(in->leafDatum);

Shouldn't they be "const", too?

>>> +/*
>>> + * Begin. This block evaluates the median of coordinates of boxes
>>> + */
>>
>> I would rather explain what the function does on the function header.
>
> fixed

The "end" part of it is still there.

>> Do we really need to copy the traversalValues on allTheSame case.
>> Wouldn't it work if just the same value is passed for all of them.
>> The search shouldn't continue after allTheSame case.
>
> Seems, yes. spgist tree could contain a locng branches with allTheSame.

Does SP-GiST allows any node under allTheSame to not being allTheSame?
 Not setting traversalValues for allTheSame worked fine with my test.

> + if (in->allTheSame)

Most of the things happening before this check is not used in there.
Shouldn't we move this to the top of the function?

> + out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);

This could go before allTheSame block.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN is missing in multicolumn indexes documentation

2016-03-23 Thread Emre Hasegeli
> I guess multicolumn BRIN behaves similarly to B-tree or GiST. But I'm
> no expert, so I need someone knowledgeable to confirm this. If the
> following wording is OK, I will update the patch.

Multicolumn BRIN is like GIN.  Every column is indexed separately.
The order of the columns doesn't matter.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Emre Hasegeli
Here are my comments about the operator class implementation:

> + *implementation of quad-4d tree over boxes for SP-GiST.

Isn't the whole thing actually 3D?

> +  * For example consider the case of intersection.

No need for a new line after this.

> +  * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner 
> nodes.

I couldn't get the term 4D point.  Maybe it means that we are using
box datatype as the prefix, but we are not treating them as boxes.

> + * We use traversalValue to calculate quadrant bounds from parent's quadrant
> + * bounds.

I couldn't understand this sentence.  We are using the parent's prefix
and the quadrant to generate the traversalValue.  I think this is the
crucial part of the opclass.  It would be nice to explain it more
clearly.

> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

2016.

> + *  src/backend/utils/adt/boxtype_spgist.c

Maybe we should name this file as geo_spgist.c to support other types
in the future.

> + #include "utils/builtins.h";

Extra ;.

> + #include "utils/datum.h"

I think this is unnecessary.

> + /* InfR type implements doubles and +- infinity */
> + typedef struct
> + {
> +int infFlag;
> +double  val;
> + }   InfR;

Why do we need this?  Can't we store infinity in "double"?

> + /* wrap double to InfR */
> + static InfR
> + toInfR(double v, InfR * r)
> + {
> +r->infFlag = NotInf;
> +r->val = v;
> + }

This isn't returning the value.

> + typedef struct
> + {
> +Range   range_x;
> +Range   range_y;
> + }   Rectangle;

Wouldn't it be simpler to just using BOX instead of this?

> + static void
> + evalIRangeBox(const IRangeBox *range_box, const Range *range, const int 
> half1,
> +  const int half2, IRangeBox *new_range_box)
>
> + static void
> + evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
> + const uint8 quadrant, IRectBox * new_rect_box)
>
> + inline static void
> + initializeUnboundedBox(IRectBox * rect_box)

Wouldn't it be simpler to palloc and return the value on those functions?

> + static int
> + intersect2D(const Range * range, const IRangeBox * range_box)

Wouldn't it be better to return "bool" on those functions.

> +return ((p1 >= 0) && (p2 <= 0));

Extra parentheses.

> + iconst spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
> + ispgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);

Many variables are defined with "const".  I am not sure they are all
that doesn't change.  If it applies to the pointer, "out" should also
not change in here.  Am I wrong?

> +/*
> + * Begin. This block evaluates the median of coordinates of boxes
> + */

I would rather explain what the function does on the function header.

> + memcpy(new_rect_box, rect_box, sizeof(IRectBox));

Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.

> + for (i = 0; flag && i < in->nkeys && flag; i++)

It checks flag two times.

> +boxPointerToRectangle(
> +DatumGetBoxP(in->scankeys[i].sk_argument),
> +p_query_rect
> +);

I don't think this matches the project coding style.

> + int flag = 1,

Wouldn't it be better as "bool"?

The regression tests for the remaining operators are still missing.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-18 Thread Emre Hasegeli
Here are my first comments.  I haven't read the actual index
implementation, yet.

I think traversal value is a useful addition.  It is nice that the
implementation for the range types is also changed.  My questions
about them are:

Do reconstructedValues is required now?  Wouldn't it be cleaner to use
the new field on the prefix tree implementation, too?

We haven't had specific memory context for reconstructedValues.  Why
is it required for the new field?

> +   Memory for traversalValues should be allocated in
> +   the default context, but make sure to switch to
> +   traversalMemoryContext before allocating memory
> +   for pointers themselves.

This sentence is not understandable.  Shouldn't it say "the memory
should *not* be allocated in the default context"?  What are the
"pointers themselves"?

The box opclass is broken because of the floating point arithmetics of
the box type.  You can reproduce it with the attached script.  We
really need to fix the geometric types, before building more on them.
They fail to serve the only purpose they are useful for, demonstrating
features.

It think the opclass should support the cross data type operators like
box @> point.  Ideally we should do it by using multiple opclasses in
an opfamily.  The floating problem will bite us again in here, because
some of the operators are not using FP macros.

The tests check not supported operator "@".  It should be "<@".  It is
nice that the opclass doesn't support long deprecated operators.

We needs tests for the remaining operators and for adding new values
to the index.  I am not sure create_index.sql is the best place for
them.  Maybe we should use the test scripts of "box".

> + #define LT  -1
> + #define GT   1
> + #define EQ   0

Using these numbers is a very well established pattern.  I don't think
macros make the code any more readable.

> + /* SP-GiST API functions */
> + Datum   spg_box_quad_config(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_choose(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_picksplit(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
> + Datum   spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);

I guess they should go to the header file.


box-index-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create opclass documentation outdated

2016-03-10 Thread Emre Hasegeli
>> In create_opclass.sgml, not only do we have the list of AMs supporting
>> STORAGE, but there is also a paragraph describing which AMs do what
>> for input datatypes of FUNCTION members (around line 175).
>
> I placed BRIN together with gist/gin/spgist here, namely that the optype
> defaults to the opclass' datatype.

Requiring STORAGE type for BRIN opclasses is more of a bug than a
feature.  None of the operator classes actually support storing values
with different type.  We had intended to support it for inclusion
opclass to index points by casting them to box, but then ripped it out
because of inconsistencies on the operators caused by floating point
arithmetics.

The problem is that the operator classes try to query the strategies
using the datatype they got from TupleDesc structure.  It fails at
least for cidr, because it is casted implicitly and indexed as inet.
All of the BRIN operator classes can fail the same way for user
defined types.  Adding STORAGE to all operator classes have seemed to
me like an easy fix at the time I noticed this problem, but what we
really need to do is to fix this than document.  We should to make
BRIN use the datatype of the operator class as btree does.

> +   of each operator class interpret the strategy nnumbers according to the

Typo: nnumbers.

> +   Operator classes based on the Inclusion framework can
> +   theoretically support cross-data-type operations, but there's no
> +   demonstrating implementation yet.

This part of the inclusion class is not committed, so this paragraph
shouldn't be there.

Other than those, the changes look good to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-09 Thread Emre Hasegeli
>> Spgist index tree is much better  than gist - 12149 pages vs 1334760 !

I assume this is the reason why it is bigger.  IP addresses are very
well suited to SP-GiST.  They naturally do not overlap.

> I also noticed, that spgist is much faster than gist for other inet
> operators. I'd like to see in 9.6.

Unfortunately, I missed the deadline of the last commitfest.  It is on
the next one:

https://commitfest.postgresql.org/10/571/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-03 Thread Emre Hasegeli
> Emre, I checked original thread and didn't find sample data. Could you 
> provide them for testing ?

I found it on the Git history:

https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SP-GiST support for inet datatypes

2016-03-02 Thread Emre Hasegeli
Attached patches add SP-GiST support to the inet datatypes.  The operator
class comes with a small change on the SP-GiST framework to allow fixed
number of child nodes.

The index is like prefix tree except that it doesn't bother to split the
addresses into parts as text is split.  It also doesn't use labels to know
the part after the prefix, but relies on static node numbers.

The GiST index released with version 9.4 performs really bad with real
world data.  SP-GiST works much better with the query posted to the
performance list [1] a while ago:

> hasegeli=# SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn =
2914;
> SELECT 732
>
> hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
>QUERY PLAN
>

>  Nested Loop  (cost=0.41..571742.27 rows=2248 width=7) (actual
time=12.643..20474.813 rows=8127 loops=1)
>->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
time=0.017..0.524 rows=732 loops=1)
>->  Index Only Scan using route_gist on routes  (cost=0.41..552.05
rows=22900 width=7) (actual time=4.851..27.948 rows=11 loops=732)
>  Index Cond: (route && (hmm.route)::inet)
>  Heap Fetches: 8127
>  Planning time: 1.507 ms
>  Execution time: 20475.605 ms
> (7 rows)
>
> hasegeli=# DROP INDEX route_gist;
> DROP INDEX
>
> hasegeli=# CREATE INDEX route_spgist ON routes USING spgist (route);
> CREATE INDEX
>
> hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
>   QUERY PLAN
>
-
>  Nested Loop  (cost=0.41..588634.27 rows=2248 width=7) (actual
time=0.081..16.961 rows=8127 loops=1)
>->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
time=0.022..0.079 rows=732 loops=1)
>->  Index Only Scan using route_spgist on routes  (cost=0.41..575.13
rows=22900 width=7) (actual time=0.014..0.021 rows=11 loops=732)
>  Index Cond: (route && (hmm.route)::inet)
>  Heap Fetches: 8127
>  Planning time: 1.376 ms
>  Execution time: 15.936 ms

[1]
http://www.postgresql.org/message-id/flat/alpine.DEB.2.11.1508251504160.31004@pyrite#alpine.DEB.2.11.1508251504160.31004@pyrite


spgist-fixed-nnodes.patch
Description: Binary data


inet-spgist-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2016-01-03 Thread Emre Hasegeli
> But is it? Is it impossible for the BRIN bitmap index scan to return 0 rows
> (say, if the value being matched is outside the min/max boundary for every
> block range?) Granted, if we document that it always returns 0 and should be
> ignored, then confusing the actual 0 with the 0 as a representation of
> “unknown” would be less a problem.

How about -1 ?  It is an impossible value for sure.  Maybe we should
change BitmapAnd and BitmapOr nodes, too.  It is better to make it
obvious that it is not the correct value.  I don't think many people
would notice the note on the documentation.

On the other hand, returning -1 broke parser of explain.depesz.com [1].

[1] http://explain.depesz.com/s/tAkd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Emre Hasegeli
> which is much closer to the actual number of rows removed by the index
> recheck + the one left.

Is it better to be closer?  We are saying those are the "actual"
values not the estimates.  If we cannot provide the actual rows, I
think it is better to provide nothing.  Something closer to the
reality would create more confusion.  Maybe, we just just return the
number of blocks, and put somewhere a note about it.  The actual row
count is already available on the upper part of the plan.

By the way, the estimation is a bigger problem than that.  Please see
my patch [1] about it.

[1] 
http://www.postgresql.org/message-id/cae2gyzzjvzpy-1csgzjjyh69izsa13segfc4i4r2z0qbq2p...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rows estimate in explain analyze for the BRIN index

2015-12-30 Thread Emre Hasegeli
> I don’t see how to solve this problem without changing explain analyze output 
> to accommodate for “unknown” value. I don’t think “0” is a non-confusing 
> representation of “unknown” for most people, and from the practical 
> standpoint, a “best effort” estimate is better than 0 (i.e. I will be able to 
> estimate how efficient BRIN index is for my tables in terms of the number of 
> tuples retrieved/thrown away)

The number of retrieved and thrown away rows are already available on
the upper part of the plan.  Bitmap Index Scan should provide the rows
that matched the index.  Another alternative would be just returning
the number of matching pages (by not multiplying with 10).  It might
be better understood.  The users who can understand the EXPLAIN
ANALYZE output shouldn't be expecting BRIN to return rows.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2015-12-24 Thread Emre Hasegeli
> The patch is attached.

Now, it is actually attached.


brin-correlation-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2015-12-24 Thread Emre Hasegeli
> Somebody wrote to me a few days ago that the BRIN cost estimation is
> rather poor.  One immediately obvious issue which I think is easily
> fixed is the correlation estimate, which is currently hardcoded to 1.
>
> Since a BRIN scan always entails a scan of the relation in physical
> order, it's simple to produce a better estimate for that, per the
> attached patch.  (Note: trying to run this on expression indexes will
> cause a crash.  I need to complete that part ...)
>
> There are other improvements we've been discussing, but I'm unclear that
> I will have time to study them to get a useful patch quickly enough.  If
> somebody else wants to mess with this, please get in touch.
>
> Thoughts?

As we have discusses, the correlation is not used for bitmap index
scan cost estimation.  I think the best we can do in here is to somehow
increase the selectivity.  I suggest something like this:

/*
* Index selectivity is important for the planner to calculate the cost
* of the bitmap heap scan.  Unfortunately, we don't have a robust way to
* estimate selectivity of BRIN.  It can dependent on many things.  This
* is a long rationale about the incomplete calculation we have at the
* moment.
*
* Our starting point is that BRIN selectivity has to be less than the
* selectivity of the btree.  We are using a product of logical and
* physical selectivities to achieve this.  The equality of
*
* (1 + logical_selectivity) * (1 + physical_selectivity) - 1
*
* is used to make sure the result is not less than any of the values.
*
* The logical selectivity is calculated using the indexable expressions
* of the WHERE clause.  The physical selectivity is calculated using
* the block proportion and the maximum correlation.  The block
* proportion is a comparable value with selectivity.  It is the
* selectivity of the smallest unit of the index.  The final selectivity
* can never be less than that.
*
* Using the contrary of the correlation by subtracting it from 1 is not
* not really a comparable value with the selectivity.  It is just a
* value between 0 and 1.  On the other hand, it is the only value
* related to the BRIN quality, we have available right now.  We are
* using the arithmetic of it with the block proportion to normalise it.
* This part of the physical selectivity is likely to be more effective
* than the block proportion in many circumstances as there would be
* many blocks on big tables.
*
* Using the contrary of the correlation of a column as selectivity of
* the index is wrong in many ways.  First of all, it cannot be applied
* to all BRIN operator classes.  It makes sense for the main built-in
* operator class "minmax", and makes a little sense for the other one
* "inclusion".  It wouldn't probably make any sense for a bloom filter
* implementation, if there would be any.  Maybe, we should push down
* this function to the operator class, but there is not enough reason
* to do it right now.
*
* Second, correlation is not dependent to any indexed expression.  It
* probably doesn't make any sense for the complicated operators.  It
* would probably effect basic comparison operators differently than
* equality operator.  The effect would even differ by count of those
* expressions.  For example, x IN (10, 20, 30) would be effected from
* correlation more than x = 15, even when their selectivities are the
* same.
*
* Last but not least, the correlation is a single value for the whole
* range.  The indexed table can partly be very well correlated, but
* the correlation value can still be very low.  For example, if a
* perfectly correlated table is copied 4 times, the correlation would
* be 0.25, although the index would be almost as good as the version on
* the initial table.  Or the expression can match the better correlated
* part of the table.  It is not hard to imagine more scenarios where
* the correlation is a bad value to use as the selectivity.  We should
* probably improve this by collecting more statistics, one day.
*
* Another problem in here is that the caller assumes the selectivity
* by tuples.  It might have been better, if we had a way to return it
* as some number of pages.  On the other hand, even though we know about
* the index, it is not too much easier for us to estimate the number of
* matching pages then it is for the caller.  We are likely to make too
* much mistake by relying on the correlation, anyway.  We are at least
* not making it worse in here.
*/
blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
selec = (1 + qualSelectivity) * (1 + blockSelectivity) - 1;
CLAMP_PROBABILITY(selec);

The patch is attached.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Segfault while using an array domain

2015-11-29 Thread Emre Hasegeli
I was getting segfaults while working on the current master for a while.
This is the shortest way I could found to reproduce the problem:

create or replace function is_distinct_from(anyelement, anyelement)
returns boolean language sql
as 'select $1 is distinct from $2';

create operator !== (
procedure = is_distinct_from,
leftarg = anyelement,
rightarg = anyelement
);

create domain my_list int[] check (null !== all (value));

create table my_table (my_column my_list);

insert into my_table values ('{1}');
insert into my_table values ('{1}');

Here is the backtrace:

> * thread #1: tid = 0x108710, 0x0001040ebf82 
> postgres`MemoryContextDelete(context=0x7f7f7f7f7f7f7f7f) + 18 at mcxt.c:205, 
> queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=EXC_I386_GPFLT)
>   * frame #0: 0x0001040ebf82 
> postgres`MemoryContextDelete(context=0x7f7f7f7f7f7f7f7f) + 18 at mcxt.c:205
> frame #1: 0x000103ea60ac postgres`fmgr_sql(fcinfo=0x7fa5e50150a8) 
> + 252 at functions.c:1047
> frame #2: 0x000103e9f6f5 
> postgres`ExecEvalScalarArrayOp(sstate=0x7fa5e5015038, 
> econtext=, isNull="", isDone=) + 885 at 
> execQual.c:2660
> frame #3: 0x000103ea1bb4 
> postgres`ExecEvalCoerceToDomain(cstate=, 
> econtext=0x7fa5e6065110, isNull="", isDone=) + 180 at 
> execQual.c:4009
> frame #4: 0x000103ea208a postgres`ExecProject + 39 at execQual.c:5345
> frame #5: 0x000103ea2063 postgres`ExecProject(projInfo=, 
> isDone=0x7fff5bef58bc) + 387 at execQual.c:5560
> frame #6: 0x000103eb96a3 postgres`ExecResult(node=0x7fa5e6064ff8) 
> + 179 at nodeResult.c:155
> frame #7: 0x000103e9b57c 
> postgres`ExecProcNode(node=0x7fa5e6064ff8) + 92 at execProcnode.c:392
> frame #8: 0x000103eb5f12 
> postgres`ExecModifyTable(node=0x7fa5e6064ea0) + 434 at 
> nodeModifyTable.c:1331
> frame #9: 0x000103e9b5bb 
> postgres`ExecProcNode(node=0x7fa5e6064ea0) + 155 at execProcnode.c:396
> frame #10: 0x000103e97a90 postgres`standard_ExecutorRun [inlined] 
> ExecutePlan(estate=, planstate=0x7fa5e6064ea0, 
> use_parallel_mode='\0', operation=, numberTuples=0, 
> direction=, dest=) + 87 at execMain.c:1566
> frame #11: 0x000103e97a39 
> postgres`standard_ExecutorRun(queryDesc=0x7fa5e6061038, 
> direction=, count=0) + 201 at execMain.c:338
> frame #12: 0x000103fc18da 
> postgres`ProcessQuery(plan=0x7fa5e604fbd8, sourceText="insert into 
> my_table values ('{1}');", params=0x, 
> dest=0x7fa5e604fcd0, completionTag="") + 218 at pquery.c:185
> frame #13: 0x000103fc0ddb 
> postgres`PortalRunMulti(portal=0x7fa5e480a238, isTopLevel='\x01', 
> dest=0x7fa5e604fcd0, altdest=0x7fa5e604fcd0, completionTag="") + 331 
> at pquery.c:1283
> frame #14: 0x000103fc06f8 
> postgres`PortalRun(portal=0x7fa5e480a238, count=9223372036854775807, 
> isTopLevel='\x01', dest=0x7fa5e604fcd0, altdest=0x7fa5e604fcd0, 
> completionTag="") + 552 at pquery.c:812
> frame #15: 0x000103fbe8d6 postgres`PostgresMain + 48 at 
> postgres.c:1105
> frame #16: 0x000103fbe8a6 postgres`PostgresMain(argc=, 
> argv=, dbname=, username=) + 9414 at 
> postgres.c:4032
> frame #17: 0x000103f503c8 postgres`PostmasterMain [inlined] 
> BackendRun + 8328 at postmaster.c:4237
> frame #18: 0x000103f503a2 postgres`PostmasterMain [inlined] 
> BackendStartup at postmaster.c:3913
> frame #19: 0x000103f503a2 postgres`PostmasterMain at postmaster.c:1684
> frame #20: 0x000103f503a2 postgres`PostmasterMain(argc=, 
> argv=) + 8290 at postmaster.c:1292
> frame #21: 0x000103ed759f postgres`main(argc=, 
> argv=) + 1567 at main.c:223
> frame #22: 0x7fff8f1245c9 libdyld.dylib`start + 1

I can reproduce it on 9.5 branch too, but not on 9.4 branch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new full text search configurations

2015-11-21 Thread Emre Hasegeli
> I checked new snowball site http://snowballstem.org/ and found several new
> stemmers appeared (as external contributions):
>
> Irish and Czech
> Object Pascal codegenerator for Snowball
> Two stemmers for Romanian
> Hungarian
> Turkish
> Armenian
> Basque (Euskera)
> Catalan
>
> Some of them we don't have in our list of default configurations. Since
> these are external, not official stemmers, it'd be nice if  people  look and
> test them. If they are fine, we can prepare new configurations for 9.6.

We have configurations for the ones included to the Snowball, namely
Romanian, Hungarian, and Turkish.  I don't know why the others are not
included but listed on the page as external contributions.  It might
be a good idea to wait for someone to commit them to the upstream.

I have checked the changes on the algorithms [1].  They don't seemed
to be updated much after 2007, but recently a new one for Tamil
language is added.  It might be a good candidate for a new
configuration.

[1] https://github.com/snowballstem/snowball/commits/master/algorithms


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support

2015-11-07 Thread Emre Hasegeli
Thank you for working on this.

I tried the patch with a Turkish dictionary [1] I could find on the
Internet.  It worked for some words, but not others:

> hasegeli=# create text search dictionary hunspell_tr (template = ispell, 
> dictfile = tr, afffile = tr);
> CREATE TEXT SEARCH DICTIONARY
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilki'); -- The root "fox"
> ---
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkinin'); -- Genitive form, 
> affix 3290
> ts_lexize
> ---
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkiler'); -- Plural form, affix 
> 4371
> ts_lexize
> ---
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkiyi'); -- Accusative form, 
> affix 2646
> ts_lexize
> ---
>
> (1 row)

It seems to have something to do with the order of the affixes.  It
works, if I move affix 2646 to the beginning of the list.

[1] https://tr-spell.googlecode.com/files/dict_aff_5000_suffix_113_words.zip


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-10-25 Thread Emre Hasegeli
> Currently PostgreSQL only has trigonometric functions that work in
> radians. I think it would be quite useful to have an equivalent set of
> functions that worked in degrees. In other environments these are
> commonly spelled sind(), cosd(), etc.

I would prefer gradian over degree.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] point_ops for GiST

2015-10-12 Thread Emre Hasegeli
>> Pls. don't misunderstand my questions: They are directed to get an
>> even more useful spatial data handling of PostgreSQL. I'm working with
>> PostGIS since years and are interested in any work regarding spatial
>> types...
>>
>> Can anyone report use cases or applications of these built-in geometric
>> types?
>>
>> Would'nt it be even more useful to concentrate to PostGIS
>> geometry/geography types and extend BRIN to these types?
>
>
> Note, that PostGIS is a different project which is maintained by separate
> team. PostGIS have its own priorities, development plan etc.
> PostgreSQL have to be self-consistent. In particular, it should have
> reference implementation of operator classes which extensions can use as the
> pattern. This is why it's important to maintain built-in geometric types.
>
> In short: once we implement it for built-in geometric types, you can ask
> PostGIS team to do the same for their geometry/geography.

The problem is that geometric types don't even serve well to this
purpose in their current state.  We have to make the operators
consistent to better demonstrate index support of cross type
operators.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] point_ops for GiST

2015-10-12 Thread Emre Hasegeli
> This was already fixed for GiST.
> See following discussion
> http://www.postgresql.org/message-id/capphfdvgticgniaj88vchzhboxjobuhjlm6c09q_op_u9eo...@mail.gmail.com
> and commit
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3c29b196b0ce46662cb9bb7a1f91079fbacbcabb
> "Consistent" method of GiST influences only search and can't lead to corrupt
> indexes. However, "same" method can lead to corrupt indexes.
> However, this is not the reason to not backpatch the changes and preserve
> buggy behaviour; this is the reason to recommend reindexing to users. And it
> was already backpatched.

Fixing it on the opclass is not an option for BRIN.  We designed BRIN
opclasses extensible using extra SQL level support functions and
operators.  It is possible to support point datatype using box vs
point operators.  Doing so would lead to wrong results, because point
operators use FP macros, but box_contain_pt() doesn't.

GiST opclass could be more clean and extensible, if we wouldn't have
those macros.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improve BRIN documentation somewhat

2015-07-21 Thread Emre Hasegeli
 Emre Hasegeli told me on IM he's going to submit a patch to add
 something similar for the inclusion opclass framework.

It is attached.  Thank you for pushing forward to improve the documentation.


0001-Improve-BRIN-documentation-for-Inclusion-opclass.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-20 Thread Emre Hasegeli
 to handle DROP dependency behaviors properly.  (On reflection, maybe
 better if it's bernoulli(internal) returns tablesample_handler,
 so as to guarantee that such a function doesn't create a conflict with
 any user-defined function of the same name.)

 The probability of conflict seems high with the system() so yeah we'd need
 some kind of differentiator.

Maybe it would be even better to have something like bernoulli(tablesample)
where tablesample defined as pseudo-type like trigger.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] point_ops for GiST

2015-06-13 Thread Emre Hasegeli
 Emre Hasegeli just pointed out to me that this patch introduced
 box_contain_pt() and in doing so used straight C comparison (= etc)
 instead of FPlt() and friends.  I would think that that's a bug and
 needs to be changed -- but certainly not backpatched, because gist
 indexes would/might become corrupt.

The problem with this is BRIN inclusion opclass uses some operators to
implement others.  It was using box @ point operator to implement
point ~= point operator by indexing points in boxes.  The former
doesn't use the macros, but later does.  The opclass could return
wrong result when the point right near the index boundaries.

Currently, there are not BRIN opclasses for geometric types except box
because of this reason.  I would like to work on supporting them for
the next release.  I think the best way is to change the operators
which are not using the macros to be consistent with the others.  Here
is the list:

* polygon  polygon
* polygon  polygon
* polygon  polygon
* polygon  polygon
* polygon | polygon
* polygon | polygon
* polygon | polygon
* polygon | polygon
* box @ point
* point @ box
* lseg @ box
* circle @ point
* point @ circle

I can send a patch, if it is acceptable.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-10 Thread Emre Hasegeli
 I pushed patches 04 and 07, as well as adopting some of the changes to
 the regression test in 06.  I'm afraid I caused a bit of merge pain for
 you -- sorry about that.

No problem.  I rebased the remaining ones.


brin-inclusion-v09-02-strategy-numbers.patch
Description: Binary data


brin-inclusion-v09-03-remove-assert-checking.patch
Description: Binary data


brin-inclusion-v09-05-box-vs-point-operators.patch
Description: Binary data


brin-inclusion-v09-06-inclusion-opclasses.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-06 Thread Emre Hasegeli
 Looking at patch 04, it seems to me that it would be better to have
 the OpcInfo struct carry the typecache struct rather than the type OID,
 so that we can avoid repeated typecache lookups in brin_deform_tuple;

 Here's the patch.

Looks better to me.  I will incorporate with this patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-06 Thread Emre Hasegeli
 Can you please explain what is the purpose of patch 07?  I'm not sure I
 understand; are we trying to avoid having to add pg_amproc entries for
 these operators and instead piggy-back on btree opclass definitions?
 Not too much in love with that idea; I see that there is less tedium in
 that the brin opclass definition is simpler.  One disadvantage is a 3x
 increase in the number of syscache lookups to get the function you need,
 unless I'm reading things wrong.  Maybe this is not performance critical.

It doesn't use btree opclass definitions.  It uses brin opclass
pg_amop entries instead of duplicating them in pg_amproc.
The pg_amproc.h header says:

 * The amproc table identifies support procedures associated with index
 * operator families and classes.  These procedures can't be listed in pg_amop
 * since they are not the implementation of any indexable operator.

In our case, these procedures can be listed in pg_amop as they
are implementations of indexable operators.

The more important change on this patch is to request procedures for
the right data types.  Minmax opclasses return wrong results without
this patch.  You can reproduce it with this query on
the regression database:

select * from brintest where timestampcol = '1979-01-29 11:05:09'::timestamptz;

 Anyway I tried applying it on isolation, and found that it fails the
 assertion that tests the union support proc in brininsert.  That
 doesn't seem okay.  I mean, it's okay not to run the test for the
 inclusion opclasses, but why does it now fail in minmax which was
 previously passing?  Couldn't figure it out.

The regression tests passed when I tried it on the current master.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-05 Thread Emre Hasegeli
 Indeed, I have done some testing of the patch but more people testing would
 be nice.

The inclusion opclass should work for other data types as long
required operators and SQL level support functions are supplied.
Maybe it would work for PostGIS, too.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-05 Thread Emre Hasegeli
 Nice, I think it is ready now other than the issues Alvaro raised in his
 review[1]. Have you given those any thought?

I already replied his email [1].  Which issues do you mean?

[1] 
http://www.postgresql.org/message-id/CAE2gYzxQ-Gk3q3jYWT=1enlebsgcgu28+1axml4omcwjbkp...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-04-14 Thread Emre Hasegeli
 Judging from a quick look, I think patches 1 and 5 can be committed
 quickly; they imply no changes to other parts of BRIN.  (Not sure why 1
 and 5 are separate.  Any reason for this?)  Also patch 2.

Not much reason except that 1 includes only functions, but 5 includes operators.

 Patch 4 looks like a simple bugfix (or maybe a generalization) of BRIN
 framework code; should also be committable right away.  Needs a closer
 look of course.

 Patch 3 is a problem.  That code is there because the union proc is only
 used in a corner case in Minmax, so if we remove it, user-written Union
 procs are very likely to remain buggy for long.  If you have a better
 idea to test Union in Minmax, or some other way to turn that stuff off
 for the range stuff, I'm all ears.  Just lets make sure the support
 procs are tested to avoid stupid bugs.  Before I introduced that, my
 Minmax Union proc was all wrong.

I removed this test because I don't see a way to support it.  I
believe any other implementation that is more complicated than minmax
will fail in there.  It is better to cache them with the regression
tests, so I tried to improve them.  GiST, SP-GiST and GIN don't have
similar checks, but they have more complicated user defined functions.

 Patch 7 I don't understand.  Will have to look closer.  Are you saying
 Minmax will depend on Btree opclasses?  I remember thinking in doing it
 that way at some point, but wasn't convinced for some reason.

No, there isn't any additional dependency.  It makes minmax operator
classes use the procedures from the pg_amop instead of adding them to
pg_amproc.

It also makes the operator class safer for cross data type usage.
Actually, I just checked and find out that we got wrong answers from
index on the current master without this patch.  You can reproduce it
with this query on the regression database:

select * from brintest where timestampcol = '1979-01-29 11:05:09'::timestamptz;

inclusion-opclasses patch make it possible to add cross type brin
regression tests.  I will add more of them on the next version.

 Patch 6 seems the real meat of your own stuff.  I think there should be
 a patch 8 also but it's not attached ... ??

I had another commit not to intended to be sent.  Sorry about that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2015-02-15 Thread Emre Hasegeli
New version of the patch attached with the optimization to break the
loop before looking at all of the histogram values.  I can reduce
join selectivity estimation runtime by reducing the values of the
left hand side or both of the sides, if there is interest.

  Even if the above aspects of the code are really completely right, the
  comments fail to explain why.  I spent a lot of time on the comments,
  but so far as these points are concerned they still only explain what
  is being done and not why it's a useful calculation to make.

 I couldn't write better comments because I don't have strong arguments
 about it.  We can say that we don't try to make use of the both of
 the endpoints, because we don't know how to combine them.  We only use
 the one with matching family and masklen, and when both of them match
 we use the distant one to be on the safer side.

I added two more sentences to explain the calculation.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index 73fc1ca..51a33c2 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,972 @@
 /*-
  *
  * network_selfuncs.c
  *	  Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * This module provides estimators for the subnet inclusion and overlap
+ * operators.  Estimates are based on null fraction, most common values,
+ * and histogram of inet/cidr columns.
  *
  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *	  src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
 #include utils/inet.h
+#include utils/lsyscache.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity for the various inclusion operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for specified operator */
+#define DEFAULT_SEL(operator) \
+	((operator) == OID_INET_OVERLAP_OP ? \
+	 DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static Selectivity networkjoinsel_inner(Oid operator,
+	 VariableStatData *vardata1, VariableStatData *vardata2);
+static Selectivity networkjoinsel_semi(Oid operator,
+	VariableStatData *vardata1, VariableStatData *vardata2);
+static Selectivity mcv_population(float4 *mcv_numbers, int mcv_nvalues);
+static Selectivity inet_hist_value_sel(Datum *values, int nvalues,
+	Datum constvalue, int opr_codenum);
+static Selectivity inet_mcv_join_sel(Datum *mcv1_values,
+  float4 *mcv1_numbers, int mcv1_nvalues, Datum *mcv2_values,
+  float4 *mcv2_numbers, int mcv2_nvalues, Oid operator);
+static Selectivity inet_mcv_hist_sel(Datum *mcv_values, float4 *mcv_numbers,
+  int mcv_nvalues, Datum *hist_values, int hist_nvalues,
+  int opr_codenum);
+static Selectivity inet_hist_inclusion_join_sel(Datum *hist1_values,
+			 int hist1_nvalues,
+			 Datum *hist2_values, int hist2_nvalues,
+			 int opr_codenum);
+static Selectivity inet_semi_join_sel(Datum lhs_value,
+   bool mcv_exists, Datum *mcv_values, int mcv_nvalues,
+   bool hist_exists, Datum *hist_values, int hist_nvalues,
+   double hist_weight,
+   FmgrInfo *proc, int opr_codenum);
+static int	inet_opr_codenum(Oid operator);
+static int	inet_inclusion_cmp(inet *left, inet *right, int opr_codenum);
+static int inet_masklen_inclusion_cmp(inet *left, inet *right,
+		   int opr_codenum);
+static int inet_hist_match_divider(inet *boundary, inet *query,
+		int opr_codenum);
+
+/*
+ * Selectivity estimation for the subnet inclusion/overlap operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_FLOAT8(0.001);
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3);
+	VariableStatData vardata;
+	Node	   *other;
+	bool		varonleft;
+	Selectivity selec,
+mcv_selec,
+non_mcv_selec;
+	Datum		constvalue,
+			   *hist_values;
+	int			hist_nvalues;
+	Form_pg_statistic stats;
+	double		sumcommon,
+nullfrac;
+	FmgrInfo	proc;
+
+	/*
+	 * If expression is not (variable op something) or (something op
+	 * variable), then punt and return a default estimate.
+	 */
+	if (!get_restriction_variable(root, args, varRelid,
+  vardata, other, varonleft))
+		PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+
+	/*
+	 * Can't do anything useful if the something is not a constant, either.
+	 */
+	if (!IsA(other, Const))
+	{
+		

Re: [HACKERS] BRIN range operator class

2014-12-14 Thread Emre Hasegeli
 I thought we can do better than minmax for the inet data type,
 and ended up with a generalized opclass supporting both inet and range
 types.  Patch based on minmax-v20 attached.  It works well except
 a few small problems.  I will improve the patch and add into
 a commitfest after BRIN framework is committed.

I wanted to send a new version before the commitfest to get some
feedback, but it is still work in progress.  Patch attached rebased
to the current HEAD.  This version supports more operators and
box from geometric data types.  Opclasses are renamed to inclusion_ops
to be more generic.  The problems I mentioned remain beause I
couldn't solve them without touching the BRIN framework.

 To support more operators I needed to change amstrategies and
 amsupport on the catalog.  It would be nice if amsupport can be set
 to 0 like am strategies.

I think it would be nicer to get the functions from the operators
with using the strategy numbers instead of adding them directly as
support functions.  I looked around a bit but couldn't find
a sensible way to support it.  Is it possible without adding them
to the RelationData struct?

 Inet data types accept IP version 4 and version 6.  It isn't possible
 to represent union of addresses from different versions with a valid
 inet type.  So, I made the union function return NULL in this case.
 Then, I tried to store if returned value is NULL or not, in
 column-values[] as boolean, but it failed on the pfree() inside
 brin_dtuple_initilize().  It doesn't seem right to free the values
 based on attr-attbyval.

This problem remains.  There is also a similar problem with the
range types, namely empty ranges.  There should be special cases
for them on some of the strategies.  I tried to solve the problems
in several different ways, but got a segfault one line or another.
This makes me think that BRIN framework doesn't support to store
different types than the indexed column in the values array.
For example, brin_deform_tuple() iterates over the values array and
copies them using the length of the attr on the index, not the length
of the type defined by OpcInfo function.  If storing another types
aren't supported, why is it required to return oid's on the OpcInfo
function.  I am confused.

I didn't try to support other geometric types than box as I couldn't
managed to store a different type on the values array, but it would
be nice to get some feedback about the overall design.  I was
thinking to add a STORAGE parameter to the index to support other
geometric types.  I am not sure that adding the STORAGE parameter
to be used by the opclass implementation is the right way.  It
wouldn't be the actual thing that is stored by the index, it will be
an element in the values array.  Maybe, data type specific opclasses
is the way to go, not a generic one as I am trying.


brin-inclusion-v02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-12-02 Thread Emre Hasegeli
 I spent a fair chunk of the weekend hacking on this patch to make
 it more understandable and fix up a lot of what seemed to me pretty
 clear arithmetic errors in the upper layers of the patch.  However,
 I couldn't quite convince myself to commit it, because the business
 around estimation for partial histogram-bucket matches still doesn't
 make any sense to me.  Specifically this:

 /* Partial bucket match. */
 left_divider = inet_hist_match_divider(left, query, opr_codenum);
 right_divider = inet_hist_match_divider(right, query, 
 opr_codenum);

 if (left_divider = 0 || right_divider = 0)
 match += 1.0 / pow(2.0, Max(left_divider, right_divider));

 Now unless I'm missing something pretty basic about the divider
 function, it returns larger numbers for inputs that are further away
 from each other (ie, have more not-in-common significant address bits).
 So the above calculation seems exactly backwards to me: if one endpoint
 of a bucket is close to the query, or even an exact match, and the
 other endpoint is further away, we completely ignore the close/exact
 match and assign a bucket match fraction based only on the further-away
 endpoint.  Isn't that exactly backwards?

You are right that partial bucket match calculation isn't fair on
some circumstances.

 I experimented with logic like this:

 if (left_divider = 0  right_divider = 0)
 match += 1.0 / pow(2.0, Min(left_divider, right_divider));
 else if (left_divider = 0 || right_divider = 0)
 match += 1.0 / pow(2.0, Max(left_divider, right_divider));

 ie, consider the closer endpoint if both are valid.  But that didn't seem
 to work a whole lot better.  I think really we need to consider both
 endpoints not just one to the exclusion of the other.

I have tried many combinations like this.  Including arithmetic,
geometric, logarithmic mean functions.  I could not get good results
with any of them, so I left it in a basic form.

Max() works pretty well most of the time, because if the query matches
the bucket generally it is close to both of the endpoints.  By using
Max(), we are actually crediting the ones which are close to the both
of the endpoints.  

 I'm also not exactly convinced by the divider function itself,
 specifically about the decision to fail and return -1 if the masklen
 comparison comes out wrong.  This effectively causes the masklen to be
 the most significant part of the value (after the IP family), which seems
 totally wrong.  ISTM we ought to consider the number of leading bits in
 common as the primary indicator of how far apart a query and a
 histogram endpoint are.

The partial match calculation with Max() is especially unfair on
the buckets where more significant bits change.  For example 63/8 and
64/8.  Returning -1 instead of a high divider, forces it to use
the divider for the other endpoint.  We consider the number of leading
bits in common as the primary indicator, just for the other endpoint.

I have also experimented with the count of the common bits of
the endpoints of the bucket for better partial match calculation.
I could not find out a meaningful equation with it.

 Even if the above aspects of the code are really completely right, the
 comments fail to explain why.  I spent a lot of time on the comments,
 but so far as these points are concerned they still only explain what
 is being done and not why it's a useful calculation to make.

I couldn't write better comments because I don't have strong arguments
about it.  We can say that we don't try to make use of the both of
the endpoints, because we don't know how to combine them.  We only use
the one with matching family and masklen, and when both of them match
we use the distant one to be on the safer side.

Thank you for looking at it.  Comments look much better now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-12-02 Thread Emre Hasegeli
 Actually, there's a second large problem with this patch: blindly
 iterating through all combinations of MCV and histogram entries makes the
 runtime O(N^2) in the statistics target.  I made up some test data (by
 scanning my mail logs) and observed the following planning times, as
 reported by EXPLAIN ANALYZE:

 explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip;
 explain analyze select * from relays r1, relays r2 where r1.ip  r2.ip;

 stats targeteqjoinsel   networkjoinsel

 100 0.27 ms 1.85 ms
 10002.56 ms 167.2 ms
 1   56.6 ms 13987.1 ms

 I don't think it's necessary for network selectivity to be quite as
 fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning
 time for a query that might need just milliseconds to execute :-(

 It seemed to me that it might be possible to reduce the runtime by
 exploiting knowledge about the ordering of the histograms, but
 I don't have time to pursue that right now.

I make it break the loop when we passed the last possible match. Patch
attached.  It reduces the runtime almost 50% with large histograms.

We can also make it use only some elements of the MCV and histogram
for join estimation.  I have experimented with reducing the right and
the left hand side of the lists on the previous versions.  I remember
it was better to reduce only the left hand side.  I think it would be
enough to use log(n) elements of the right hand side MCV and histogram.
I can make the change, if you think selectivity estimation function
is the right place for this optimization.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index f854847..16f39db 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -612,20 +612,23 @@ inet_hist_value_sel(Datum *values, int nvalues, Datum constvalue,
 		return 0.0;
 
 	query = DatumGetInetPP(constvalue);
 
 	/* left is the left boundary value of the current bucket ... */
 	left = DatumGetInetPP(values[0]);
 	left_order = inet_inclusion_cmp(left, query, opr_codenum);
 
 	for (i = 1; i  nvalues; i++)
 	{
+		if (left_order == 256)
+			break;
+
 		/* ... and right is the right boundary value */
 		right = DatumGetInetPP(values[i]);
 		right_order = inet_inclusion_cmp(right, query, opr_codenum);
 
 		if (left_order == 0  right_order == 0)
 		{
 			/* The whole bucket matches, since both endpoints do. */
 			match += 1.0;
 		}
 		else if ((left_order = 0  right_order = 0) ||
@@ -854,20 +857,23 @@ inet_opr_codenum(Oid operator)
 static int
 inet_inclusion_cmp(inet *left, inet *right, int opr_codenum)
 {
 	if (ip_family(left) == ip_family(right))
 	{
 		int			order;
 
 		order = bitncmp(ip_addr(left), ip_addr(right),
 		Min(ip_bits(left), ip_bits(right)));
 
+		if (order  0)
+			return 256;
+
 		if (order != 0)
 			return order;
 
 		return inet_masklen_inclusion_cmp(left, right, opr_codenum);
 	}
 
 	return ip_family(left) - ip_family(right);
 }
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BRIN range operator class

2014-10-19 Thread Emre Hasegeli
 Once again, many thanks for the review.  Here's a new version.  I have
 added operator classes for int8, text, and actually everything that btree
 supports except:
 bool
 record
 oidvector
 anyarray
 tsvector
 tsquery
 jsonb
 range
 
 since I'm not sure that it makes sense to have opclasses for any of
 these -- at least not regular minmax opclasses.  There are some
 interesting possibilities, for example for range types, whereby we store
 in the index tuple the union of all the range in the block range.

I thought we can do better than minmax for the inet data type,
and ended up with a generalized opclass supporting both inet and range
types.  Patch based on minmax-v20 attached.  It works well except
a few small problems.  I will improve the patch and add into
a commitfest after BRIN framework is committed.

To support more operators I needed to change amstrategies and
amsupport on the catalog.  It would be nice if amsupport can be set
to 0 like amstrategies.

Inet data types accept IP version 4 and version 6.  It is not possible
to represent union of addresses from different versions with a valid
inet type.  So, I made the union function return NULL in this case.
Then, I tried to store if returned value is NULL or not, in
column-values[] as boolean, but it failed on the pfree() inside
brin_dtuple_initilize().  It doesn't seem right to free the values
based on attr-attbyval.

I think the same opclass can be used for geometric types.  I can
rename it to inclusion_ops instead of range_ops.  The GiST opclasses
for the geometric types use bounding boxes.  It wouldn't be possible
to use a different data type in a generic oplass.  Maybe STORAGE
parameter can be used for that purpose.

 (I had an opclass for anyenum too, but on further thought I removed it
 because it is going to be pointless in nearly all cases.)

It can be useful in some circumstances.  We wouldn't lose anything
by supporting more types.  I think we should even add an operator
class for boolean.
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 12ba3f4..7663113 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -249,6 +249,18 @@
  /entry
 /row
 row
+ entryliteralinet_range_ops/literal/entry
+ entrytypeinet/type/entry
+ entry
+  literalamp;amp;/
+  literalgt;gt;/
+  literalgt;gt;=/
+  literallt;lt;/literal
+  literallt;lt;=/literal
+  literal=/
+ /entry
+/row
+row
  entryliteralbpchar_minmax_ops/literal/entry
  entrytypecharacter/type/entry
  entry
@@ -370,6 +382,23 @@
  /entry
 /row
 row
+ entryliteralrange_ops//entry
+ entryany range type/entry
+ entry
+  literalamp;amp;/
+  literalamp;gt;/
+  literalamp;lt;/
+  literalgt;gt;/
+  literallt;lt;/
+  literallt;@/
+  literal=/
+  literal@gt;/
+  literal@gt;/
+ /entry
+ entry
+ /entry
+/row
+row
  entryliteralpg_lsn_minmax_ops/literal/entry
  entrytypepg_lsn/type/entry
  entry
diff --git a/src/backend/access/brin/Makefile b/src/backend/access/brin/Makefile
index ac44fcd..019c582 100644
--- a/src/backend/access/brin/Makefile
+++ b/src/backend/access/brin/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/access/brin
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = brin.o brin_pageops.o brin_revmap.o brin_tuple.o brin_xlog.o \
-	   brin_minmax.o
+OBJS = brin.o brin_pageops.o brin_range.o brin_revmap.o brin_tuple.o \
+	   brin_xlog.o brin_minmax.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/brin/brin_range.c b/src/backend/access/brin/brin_range.c
new file mode 100644
index 000..b63b80a
--- /dev/null
+++ b/src/backend/access/brin/brin_range.c
@@ -0,0 +1,323 @@
+/*
+ * brin_range.c
+ *		Implementation of range opclass for BRIN
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/brin/brin_range.c
+ */
+#include postgres.h
+
+#include access/genam.h
+#include access/brin_internal.h
+#include access/brin_tuple.h
+#include access/skey.h
+#include catalog/pg_type.h
+#include utils/datum.h
+#include utils/lsyscache.h
+#include utils/syscache.h
+
+
+/*
+ * Procedure numbers must not collide with BRIN_PROCNUM defines in
+ * brin_internal.h.  Note we only need inequality functions.
+ */
+#define		RANGE_NUM_PROCNUMS		10	/* # support procs */
+#define		PROCNUM_CONTAINS		5
+#define		PROCNUM_UNION			6
+#define		PROCNUM_BEFORE			7	/* required for overright strategy */
+#define		PROCNUM_OVERLEFT		8	/* required for after strategy */
+#define		PROCNUM_OVERLAPS		9	/* required for overlaps strategy */
+#define		PROCNUM_OVERRIGHT		10	/* required for before strategy */
+#define		PROCNUM_AFTER			11	/* required for after strategy */
+#define		PROCNUM_ADJACENT		12	/* required for 

Re: [HACKERS] Shapes on the regression test for polygon

2014-10-14 Thread Emre Hasegeli
 I extracted Emre's diagram adjustments from the patch and applied it,
 and no tabs now.  Emre, I assume your regression changes did not affect
 the diagram contents.

Thank you for looking at it.  I wanted to make the tests consistent
with the diagrams.  Now they look better but they still don't make
sense with the tests.  I looked at it some more, and come to the
conclusion that removing them is better than changing the tests.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >