subject:"Re\: \[HACKERS\] Floating point comparison inconsistencies of the geometric types"

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-02-01 Thread Emre Hasegeli

> Got it, but if other people don't agree then this is going nowhere.

Yes.  As I wrote, I don't particularly care about functions like "is
point on line".  I can prepare a patch to fix as many problems as
possible around those operators by preserving the current epsilon.

I though we were arguing about *all* operators.  Having containment
and placement operators consistent with each other, is the primary
thing I am trying to fix.  Is removing epsilon from them is
acceptable?

We can also stay away from changing operators like "~=" to minimise
compatibility break, if we keep the epsilon on some places.  We can
instead document this operator as "close enough", and introduce
another symbol for really "the same" operator.

That said, there are some places where it is hard to decide to apply
the epsilon or not.  For example, we can keep the epsilon to check for
two lines being parallel, but then should we return the intersection
point, or not?  Those issues may become more clear when I start
working on it, if preserving epsilon for those operators is the way to
go forward.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Robert Haas

On Tue, Jan 31, 2017 at 1:06 PM, Emre Hasegeli  wrote:
> This is what he wrote:
>
>> As I understand it, the key problem is that tests like "is point on line"
>> would basically never succeed except in the most trivial cases, because of
>> roundoff error.  That's not very nice, and it might cascade to larger
>> problems like object-containment tests failing unexpectedly.  We would
>> need to go through all the geometric operations and figure out where that
>> kind of gotcha is significant and what we can do about it.  Seems like a
>> fair amount of work :-(.  If somebody's willing to do that kind of
>> investigation, then sure, but I don't think just blindly removing these
>> macros is going to lead to anything good.
>
> I re-implemented "is point on line" operator on my patch so that it
> would, at least, work on the most trivial cases.

Right.  So you and he do not agree on the goal.  He wants it to work
in MORE than the most trivial cases, and you wrote it so that it will
work in AT LEAST the most trivial cases.  Those are not the same.

> I listed the problems I am trying to solve in here:
>
> https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com

Those problems boil down to "let's get rid of the tolerance
completely", and there's no consensus on that being a good thing to
do.  In particular, I can't remember anybody but you being
unequivocally in favor of it.

> We couldn't agree on how we should treat on floating point numbers.  I
> think we should threat them just like the "float" datatype.

Got it, but if other people don't agree then this is going nowhere.

Personally, I've got my doubts about how sane it is to make anything
work here with the built-in floating point types.  The CPU is going to
do some kind of black magic, which may differ between machines, and
good luck getting consistent semantics that work everywhere.  For
example it seems entirely plausible (with some implementation) that
you might find that (1, 2) is on the line from (0, 0) to (2, 4) but
that (1, 3) is not on the line from (0, 0) to (2, 6) because the slope
1/2 can be represented exactly in base 2 and the slope 1/3 cannot.  If
you try to fix that kind of thing by introducing a tolerance, then you
might get it wrong when the line goes from (0, 0) to (10,
21).  Now, if we represented values using home-grown
arbitrary-precision arithmetic - like numeric does - then it might be
possible to get such cases right, especially if you supported exact
rather than approximate representations of fractional quantities.
Otherwise, I think there are inevitably going to be artifacts, and
tinkering with this is just changing which artifacts we get without
really solving the problem.  But I am notorious for my low opinion of
floating-point arithmetic; maybe somebody who understands it better or
likes it more can come up with something that's more clearly an
improvement.  Regardless, we have to have an agreement in order to
commit anything here, and I don't see one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Emre Hasegeli

> Backing up a bit here, have we lost track of the problem that we're
> trying to solve?  Tom gave his opinion here:
>
> https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us
>
> But I don't see that the latest patch I can find does anything to fix
> that.

This is what he wrote:

> As I understand it, the key problem is that tests like "is point on line"
> would basically never succeed except in the most trivial cases, because of
> roundoff error.  That's not very nice, and it might cascade to larger
> problems like object-containment tests failing unexpectedly.  We would
> need to go through all the geometric operations and figure out where that
> kind of gotcha is significant and what we can do about it.  Seems like a
> fair amount of work :-(.  If somebody's willing to do that kind of
> investigation, then sure, but I don't think just blindly removing these
> macros is going to lead to anything good.

I re-implemented "is point on line" operator on my patch so that it
would, at least, work on the most trivial cases.  This is not very
nice, but it is predictable.  I have tried to prevent it cascade to
larger problems like object-containment tests failing unexpectedly.  I
have gone through all of the geometric operations and re-implemented
the ones with similar problems, too.  It is no work at all compared to
discussions we have to have :-).

My initial idea was to keep the fuzzy behaviour for some operators
like "is point on line", but the more I get into it the less value I
see in doing so.  The same family operators like "is point on line" is
currently badly broken:

> postgres=# select '(1,0)'::point ## '{1,2,0}'::line;
> ?column?
> --
> (2,2)
> (1 row)

This makes me wonder if anybody is ever using those operators.  In the
end, I don't care about those operators.  They are unlikely to be
supported by indexes.  I can simplify my patch to leave them as they
are.

> Now maybe that's not the problem that Emre is trying to solve,
> but then it is not very clear to me what problem he IS trying to
> solve.  And I think Kyotaro Horiguchi is trying to solve yet another
> problem which is again different.  So IMHO the first task here is to
> agree on a clear statement of what we'd like to fix, and then, given a
> patch, we can judge whether it's fixed.

I listed the problems I am trying to solve in here:

https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com

> Maybe I'm being dumb here and it's clear to you guys what the issues
> under discussion are.  If so, apologies for that, but the thread has
> gotten too theoretical for me and I can't figure out what the
> top-level concern is any more.  I believe we all agree these macros
> are bad, but there seems to be no agreement that I can discern on what
> would be better or why.

We couldn't agree on how we should treat on floating point numbers.  I
think we should threat them just like the "float" datatype.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Robert Haas

On Thu, Jan 26, 2017 at 5:53 AM, Emre Hasegeli  wrote:
> Though, I know the community is against behaviour changing GUCs.  I
> will not spend more time on this, before I get positive feedback from
> others.

As if on cue, let me say that a behavior-changing GUC sounds like a
terrible idea to me.  It's almost never good when a GUC can cause the
same queries to return answers in different sessions, and even worse,
it seems like the GUC might have the effect of letting us build
indexes that are only valid for the value of the GUC with which they
were built.

Backing up a bit here, have we lost track of the problem that we're
trying to solve?  Tom gave his opinion here:

https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us

But I don't see that the latest patch I can find does anything to fix
that.  Now maybe that's not the problem that Emre is trying to solve,
but then it is not very clear to me what problem he IS trying to
solve.  And I think Kyotaro Horiguchi is trying to solve yet another
problem which is again different.  So IMHO the first task here is to
agree on a clear statement of what we'd like to fix, and then, given a
patch, we can judge whether it's fixed.

Maybe I'm being dumb here and it's clear to you guys what the issues
under discussion are.  If so, apologies for that, but the thread has
gotten too theoretical for me and I can't figure out what the
top-level concern is any more.  I believe we all agree these macros
are bad, but there seems to be no agreement that I can discern on what
would be better or why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-30 Thread Michael Paquier

On Thu, Jan 26, 2017 at 9:11 PM, Kyotaro HORIGUCHI
 wrote:
>> Though, I know the community is against behaviour changing GUCs.  I
>> will not spend more time on this, before I get positive feedback from
>> others.
>
> That's too bad. I'm sorry that I wasn't very helpful..

You make a constructive discussion in a way that you think makes sense
by providing feedback, there's nothing bad in that IMO, quite the
contrary to be honest.

(I am just showing up here to tell that I have moved this patch to CF 2017-03).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-26 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 26 Jan 2017 11:53:28 +0100, Emre Hasegeli  wrote in 

> > Even though I'm not sure but I don't see a "natural" (or
> > agreeable by many poeple) ordering of geometric types in
> > general. Anyway it's quite application (not application program
> > but the relationship with the real world) specific.
> 
> We can just define it for point as "ORDER BY point.x, point.y".

It's nonsense. Totally for a convenient. Anyone can do so on
their own application but PostgreSQL cannot have it as a
platform.

> > What we should not forget is that PostGIS does the same thing and
> > it is widly used (I believe..). This means it not broken at least
> > on a certain context. But it is a fact that it also quite
> > inconvenient to get performance from, say, indexes.
> 
> I understand from Paul Ramsey's email [1] on this thread that PostGIS
> doesn't currently have a tolerance.

Thank you for the pointer. (My memory is too small as 8bit CPU)
Looking into the source of PostGIS 2.3.0 (Maybe the latest)
Surely EPSILON is used is someplaces. FPeq and similars are also
defined. liblwgeom_internal.h even defines another tolerance
EPSILON_SQLMM..

Paul> The real answer for a GIS system is to have an explicit tolerance
Paul> parameter for calculations like distance/touching/containment, but
Paul> unfortunately we didn't do that so now we have our own
Paul> compatibility/boil the ocean problem if we ever wanted/were funded to
Paul> add one.

This doesn't seem saying PostGIS doesn't have fixed-amount
tolerance. Alhtough we don't necessarily need compatibility with
PostGIS, it makes the problem rather complex since we lose an
apparent start point for this. It would be good if we make geom
comparators to have explicit tolerances as Paul said but maybe it
is overdone. So I proposed the varialbe implicit tolerance.

> > Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
> > kind of stable behavior for differences under the level. But on
> > the analogy of comparisons of floating point numbers, I suppose
> > that inequality comparison could be done without the tolerance.
> 
> What do you mean exactly?

Sorry for poor wording. I'll try in different way. It means
comparison on numbers that contains certain amount of error gives
unstable result for differences small enough comparing their
precision.

> >> >> - Some operators are violating commutative property.
> >> >>
> >> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| 
> >> >> line_a".
> >> >
> >> > Whether EPSILON is introduced or not, commutativity cannot be
> >> > assured for it from calculation error, I suppose.
> >>
> >> It can easily be assured by treating both sides of the operator the
> >> same.  It is actually assured on my patch.
> >
> > It surely holds for certain cases. Even how many applicable cases
> > we guess, finally we cannot proof that it works generally. Just
> > three times of 1/3 rotation breakes it.
> 
> It is a different story.  We cannot talk about commutative property of
> rotation function that we currently don't have, because it would be an
> asymmetrical operator.

That's wrong. Any shpaes represented by geometric types assumed
to get any geometric operatsions such as transision, rotation and
others. It is fundamental for geometric types.

> The parallel operator is currently marked as commutative.  The planner
> is free to switch the sides of the operation.  Therefore this is not
> only a surprise, but a bug.

Strictly it is not commutative, but assuming FP error or EPSILON,
commutation among them would be acceptable. Having larger
tolerance, the defference from commutated expression becomes
relatively small for the type's domain. As far as I know, even
summation of two floating points is not guaranteed to yield
strictly the same result for commutated opeation. But the
difference doesn't affect for most cases and programmers make a
program so that such differences don't matter in the
objective. This is basically the same thing with the case of
geo-types.

> > Hmm, I have nothing more to say if you don't agree that floating
> > point numbers involving any kind of arithmetic is hardly
> > deterministic especially not defining its usage.
> 
> The floating point results are not random.  There are certain
> guarantees.  The transitive property of equality is one of them.  Our
> aim should be making things easier for our users by providing more
> guarantees not breaking what is already there.

Sorry for repeating the same thing but floating point numbers
after getting arithmetic operations must considered that they
have fuzziness or error (the same can occur for even just after
assigning). They must not be handled as exact numbers. Please
study about handling floating point numbers. If such strictness
is required and no arithmetic involved, it is proper to store
them, say, in a string form.

If such a behavior is required but want to

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-26 Thread Emre Hasegeli

> Even though I'm not sure but I don't see a "natural" (or
> agreeable by many poeple) ordering of geometric types in
> general. Anyway it's quite application (not application program
> but the relationship with the real world) specific.

We can just define it for point as "ORDER BY point.x, point.y".

> What we should not forget is that PostGIS does the same thing and
> it is widly used (I believe..). This means it not broken at least
> on a certain context. But it is a fact that it also quite
> inconvenient to get performance from, say, indexes.

I understand from Paul Ramsey's email [1] on this thread that PostGIS
doesn't currently have a tolerance.

> Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
> kind of stable behavior for differences under the level. But on
> the analogy of comparisons of floating point numbers, I suppose
> that inequality comparison could be done without the tolerance.

What do you mean exactly?

>> >> - Some operators are violating commutative property.
>> >>
>> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>> >
>> > Whether EPSILON is introduced or not, commutativity cannot be
>> > assured for it from calculation error, I suppose.
>>
>> It can easily be assured by treating both sides of the operator the
>> same.  It is actually assured on my patch.
>
> It surely holds for certain cases. Even how many applicable cases
> we guess, finally we cannot proof that it works generally. Just
> three times of 1/3 rotation breakes it.

It is a different story.  We cannot talk about commutative property of
rotation function that we currently don't have, because it would be an
asymmetrical operator.

The parallel operator is currently marked as commutative.  The planner
is free to switch the sides of the operation.  Therefore this is not
only a surprise, but a bug.

> Hmm, I have nothing more to say if you don't agree that floating
> point numbers involving any kind of arithmetic is hardly
> deterministic especially not defining its usage.

The floating point results are not random.  There are certain
guarantees.  The transitive property of equality is one of them.  Our
aim should be making things easier for our users by providing more
guarantees not breaking what is already there.

> The world of the geometric types in PostgreSQL *is* built
> so. There is nothing different with that Windows client can make
> different results from PostgreSQL on a Linux server for a simple
> floating point arithmetics, or even different binaries made by
> different version of compilers on the same platoform can. Relying
> on such coherency by accident is a bad policy.

Yes, the results are not portable.  We should only rely on the results
being stable on the same build.  The epsilon doesn't cure this
problem.  It arguably makes it worse.

> PostGIS or libgeos seems to prove it. They are designed exactly
> for this purpose and actually used.

Yes, PostGIS is a GIS application.  We are not.  Geometric types name
suggests to me them being useful for general purpose.

> So, the union of the two requirements seems to be having such
> parameter as a GUC.

That sounds doable to me.  We can use this opportunity to make all
operators consistent.  So the epsilon would apply to the ones that it
current is not.  We can still add btree and hash opclasses, and make
them give an error when this GUC is not 0.  We can even make this or
another GUC apply to floats making whole system more consistent.

Though, I know the community is against behaviour changing GUCs.  I
will not spend more time on this, before I get positive feedback from
others.

[1] 
https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-26 Thread Kyotaro HORIGUCHI

Hello, Emre.

At Wed, 25 Jan 2017 12:24:18 +0100, Emre Hasegeli  wrote in 

> I am responding both of your emails together.
> 
> > Perhaps I don't understand it. Many opclass are defined for
> > btree. But since (ordinary) btree handles only one-dimentional,
> > totally-orderable sets, geometric types even point don't fit
> > it. Even if we relaxed it by removing EPSILON, multidimentional
> > data still doesn't fit.
> 
> Yes, multidimentional data doesn't fit into btree.  Let's say we would
> have to introduce operators called *<, *<=, *>, *>= to support btree
> opclass for point.  I agree those operators would be useless, but
> btree opclass is used for other purposes too.  "ORDER BY point",
> merge-joins, btree index support for ~= would be useful.

Btree cannot be used unless any kind of total-orderedness is
defined but hash index would be usable so.

> Lack of ORDER BY, GROUP BY, and DISTINCT support is the major
> inconvenience about the geometric types.  There are many complaints
> about this on the mailing list.  Many frameworks and ORMs are not
> prepared to deal with getting an error when they use certain types on
> those clauses.

Even though I'm not sure but I don't see a "natural" (or
agreeable by many poeple) ordering of geometric types in
general. Anyway it's quite application (not application program
but the relationship with the real world) specific.

> >> - Only some operators are using the epsilon.
> >>
> >> I included the list of the ones which doesn't use the epsilon on
> >> initial post of this thread.  This makes the operators not compatible
> >> with each other.  For example, you cannot say "if box_a @> box_b and
> >> box_b @> point then box_a @> box_b".
> >
> > (It must be "then box_a @> point".)
> 
> Yes, I am sorry.
> 
> >  Both of the operators don't
> > seem to use EPSILON so the transitivity holds, but perhaps we
> > should change them to more appropriate shape, that is, to use
> > EPSILON. Even having the tolerance, we can define the operators
> > to keep the transitivity.
> 
> Yes, that would be another way.  In my view, this would make things
> worse, because I believe epsilon approach is completely broken.  The
> operators which are not using the epsilon are the only ones people can
> sanely use to develop applications.

What we should not forget is that PostGIS does the same thing and
it is widly used (I believe..). This means it not broken at least
on a certain context. But it is a fact that it also quite
inconvenient to get performance from, say, indexes.

> >> > - The application of epsilon is implementation dependent.
> >> >
> >> > Epsilon works between two numbers.  There is not always a single way
> >> > of implementing geometric operators.  This is surprising to the users.
> >> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
> >> > epsilon".
> >>
> >> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
> >> ..". I'm not sure off-hand. But I don't see the relationship is
> >> significant.
> 
> What I meant was the behaviour being unclear to even people who knows
> about the epsilon.  If two boxes are overlapping with each other, one
> who knows about EPSILON can expect the distance between them to be
> less than EPSILON, but this wouldn't be true.

Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
kind of stable behavior for differences under the level. But on
the analogy of comparisons of floating point numbers, I suppose
that inequality comparison could be done without the tolerance.

> >> - Some operators are violating commutative property.
> >>
> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
> >
> > Whether EPSILON is introduced or not, commutativity cannot be
> > assured for it from calculation error, I suppose.
> 
> It can easily be assured by treating both sides of the operator the
> same.  It is actually assured on my patch.

It surely holds for certain cases. Even how many applicable cases
we guess, finally we cannot proof that it works generally. Just
three times of 1/3 rotation breakes it.

> >> - Some operators are violating transitive property.
> >>
> >> For example, you cannot say "if point_a ~= point_b and point_b ~=
> >> point_c then point_a ~= point_c".
> >
> > It holds only in ideal (or mathematical) world, or for similars
> > of integer (which are strictly implemented mathematica world on
> > computer).  Without the EPSILON, two points hardly match by
> > floating point error, as I mentioned. I don't have an evidence
> > though (sorry), mere equality among three floating point numbers
> > is not assured.
> 
> Of course, it is assured.  Floating point numbers are deterministic.

Hmm, I have nothing more to say if you don't agree that floating
point numbers involving any kind of arithmetic is hardly
deterministic especially not defining its usage.

> >> - The operators with

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-25 Thread Emre Hasegeli

I am responding both of your emails together.

> Perhaps I don't understand it. Many opclass are defined for
> btree. But since (ordinary) btree handles only one-dimentional,
> totally-orderable sets, geometric types even point don't fit
> it. Even if we relaxed it by removing EPSILON, multidimentional
> data still doesn't fit.

Yes, multidimentional data doesn't fit into btree.  Let's say we would
have to introduce operators called *<, *<=, *>, *>= to support btree
opclass for point.  I agree those operators would be useless, but
btree opclass is used for other purposes too.  "ORDER BY point",
merge-joins, btree index support for ~= would be useful.

Lack of ORDER BY, GROUP BY, and DISTINCT support is the major
inconvenience about the geometric types.  There are many complaints
about this on the mailing list.  Many frameworks and ORMs are not
prepared to deal with getting an error when they use certain types on
those clauses.

>> - Only some operators are using the epsilon.
>>
>> I included the list of the ones which doesn't use the epsilon on
>> initial post of this thread.  This makes the operators not compatible
>> with each other.  For example, you cannot say "if box_a @> box_b and
>> box_b @> point then box_a @> box_b".
>
> (It must be "then box_a @> point".)

Yes, I am sorry.

>  Both of the operators don't
> seem to use EPSILON so the transitivity holds, but perhaps we
> should change them to more appropriate shape, that is, to use
> EPSILON. Even having the tolerance, we can define the operators
> to keep the transitivity.

Yes, that would be another way.  In my view, this would make things
worse, because I believe epsilon approach is completely broken.  The
operators which are not using the epsilon are the only ones people can
sanely use to develop applications.

>> > - The application of epsilon is implementation dependent.
>> >
>> > Epsilon works between two numbers.  There is not always a single way
>> > of implementing geometric operators.  This is surprising to the users.
>> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
>> > epsilon".
>>
>> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
>> ..". I'm not sure off-hand. But I don't see the relationship is
>> significant.

What I meant was the behaviour being unclear to even people who knows
about the epsilon.  If two boxes are overlapping with each other, one
who knows about EPSILON can expect the distance between them to be
less than EPSILON, but this wouldn't be true.

>> - Some operators are violating commutative property.
>>
>> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>
> Whether EPSILON is introduced or not, commutativity cannot be
> assured for it from calculation error, I suppose.

It can easily be assured by treating both sides of the operator the
same.  It is actually assured on my patch.

>> - Some operators are violating transitive property.
>>
>> For example, you cannot say "if point_a ~= point_b and point_b ~=
>> point_c then point_a ~= point_c".
>
> It holds only in ideal (or mathematical) world, or for similars
> of integer (which are strictly implemented mathematica world on
> computer).  Without the EPSILON, two points hardly match by
> floating point error, as I mentioned. I don't have an evidence
> though (sorry), mere equality among three floating point numbers
> is not assured.

Of course, it is assured.  Floating point numbers are deterministic.

>> - The operators with epsilon are not compatible with float operators.
>>
>> This is also surprising for the users.  For example, you cannot say
>> "if point_a << point_b then point_a[0] < point_b[0]".
>
> It is because "is strictly left of" just doesn't mean their
> x-coordinates are in such a relationship. The difference might be
> surprising but is a specification (truely not a bug:).

Then what does that mean?  Every operator with EPSILON is currently
surprising in different way.  People use databases to build
applications.  They often need to implement same operations on the
application side.  It is very hard to be bug-compatible with our
geometric types.

>> = Missing Bits on the Operator Classes

> This final section seems to mention the application but sorry, it
> still don't describe for me what application that this patch
> aiming to enable. But I can understand that you want to handle
> floating point numbers like integers. The epsilon is surely quite
> annoying for the purpose.

I will try to itemise what applications I am aiming to enable:

- Applications with very little GIS requirement

PostGIS can be too much work for an application in s different
business domain but has a little requirement to GIS.  The built-in
geometric types are incomplete to support real world geography.
Getting rid of epsilon would make this worse. In contrary, it would
allow people to deal with the difficulties on their own.

- Applications with geometric objects

I believe people could make use the geometric

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-24 Thread Kyotaro HORIGUCHI

Sorry, this is the continuation of the previous comment.

At Wed, 18 Jan 2017 17:36:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170118.173612.13506164.horiguchi.kyot...@lab.ntt.co.jp>
> Hello.
> 
> (I apologize in advance for possible inaccurate wording on maths,
> which might cause confusion..)
> 
> At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeli  wrote 
> in 
> > > - Floating point comparisons for gemetric types
> > >
> > >   Comparison related to geometric types is performed by FPeq
> > >   macro and similars defined in geo_decls.h. This intends to give
> > >   tolerance to the comparisons.
> > >
> > >  A
> > >   FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance)
> > >   FPne:   ->|  e | e  |<-  (<- means exclusive)
> > >   FPlt:  | e  |<-
> > >   FPle: |<=e |
> > >   FPgt:   ->|  e |
> > >   FPge:  | e=>|
> > >
> > >   These seems reasonable ignoring the tolerance amount issue.
> > 
> > I tried to explain below why I don't find this method reasonable.
> 
> Thank you very much for the explanation.
> 
> > > At least for the point type, (bitmap) index scan is consistent
> > > with sequential scan.  I remember that the issue was
> > > "inconsistency between indexed and non-indexed scans over
> > > geometric types" but they seem consistent with each other.
> > 
> > You can see on the Git history that it was a lot of trouble to make
> > the indexes consistent with the operators, and this is limiting
> > improving indexing of the geometric types.
> 
> Yeah. geo_ops.c has the following comment from its first version
> in the current repository.
> 
> | *   Relational operators for Points.
> | *   Since we do have a sense of coordinates being
> | *   "equal" to a given accuracy (point_vert, point_horiz),
> 
> So, we take it as a requirement for geometric types. All the
> difficulties come from the "nature".
> 
> > > You mentioned b-tree, which don't have predefined opclass for
> > > geometric types. So the "inconsistency" may be mentioning the
> > > difference between geometric values and combinations of plain
> > > (first-class) values. But the two are different things and
> > > apparently using different operators (~= and = for equality) so
> > > the issue is not fit for this. More specifically, "p ~= p0" and
> > > "x = x0 and y = y0" are completely different.
> > 
> > Yes, the default btree operator class doesn't have to implement
> > standard equality and basic comparison operator symbols.  We can
> > implement it with different symbols.
> 
> Perhaps I don't understand it. Many opclass are defined for
> btree. But since (ordinary) btree handles only one-dimentional,
> totally-orderable sets, geometric types even point don't fit
> it. Even if we relaxed it by removing EPSILON, multidimentional
> data still doesn't fit.
> 
> Still we can implement equality mechanism *over* multiple btree
> indexes, but it cannot be a opclass for btree.
> 
> > > Could you let me (or any other guys on this ml) have more precise
> > > description on the problem and/or what you want to do with this
> > > patch?
> > 
> > I will try to itemize the current problems with the epsilon method:
> > 
> > = Operator Inconsistencies
> > 
> > My current patches address all of them.
> 
> No. It just removes tolerans that is a "requirement" for
> coordinates of geometric types. I think we don't have enough
> reason to remove it. We could newly define geometric types with
> no-tolerance as 'exact_point" or something but it still doesn't
> fit btree.
> 
> > - Only some operators are using the epsilon.
> > 
> > I included the list of the ones which doesn't use the epsilon on
> > initial post of this thread.  This makes the operators not compatible
> > with each other.  For example, you cannot say "if box_a @> box_b and
> > box_b @> point then box_a @> box_b".
> 
> (It must be "then box_a @> point".)  Both of the operators don't
> seem to use EPSILON so the transitivity holds, but perhaps we
> should change them to more appropriate shape, that is, to use
> EPSILON. Even having the tolerance, we can define the operators
> to keep the transitivity.
> 
> > - The application of epsilon is implementation dependent.
> > 
> > Epsilon works between two numbers.  There is not always a single way
> > of implementing geometric operators.  This is surprising to the users.
> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
> > epsilon".
> 
> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
> ..". I'm not sure off-hand. But I don't see the relationship is
> significant.
> 
> > This is also creating a high barrier on developing those operators.
> > Fixing bugs and performance regressions are very likely to change the
> > results.
> 
> I agree to you. The effect of the EPSILON is quite arbitrary and
> difficult

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-18 Thread Kyotaro HORIGUCHI

Hello.

(I apologize in advance for possible inaccurate wording on maths,
which might cause confusion..)

At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeli  wrote in 

> > - Floating point comparisons for gemetric types
> >
> >   Comparison related to geometric types is performed by FPeq
> >   macro and similars defined in geo_decls.h. This intends to give
> >   tolerance to the comparisons.
> >
> >  A
> >   FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance)
> >   FPne:   ->|  e | e  |<-  (<- means exclusive)
> >   FPlt:  | e  |<-
> >   FPle: |<=e |
> >   FPgt:   ->|  e |
> >   FPge:  | e=>|
> >
> >   These seems reasonable ignoring the tolerance amount issue.
> 
> I tried to explain below why I don't find this method reasonable.

Thank you very much for the explanation.

> > At least for the point type, (bitmap) index scan is consistent
> > with sequential scan.  I remember that the issue was
> > "inconsistency between indexed and non-indexed scans over
> > geometric types" but they seem consistent with each other.
> 
> You can see on the Git history that it was a lot of trouble to make
> the indexes consistent with the operators, and this is limiting
> improving indexing of the geometric types.

Yeah. geo_ops.c has the following comment from its first version
in the current repository.

| * Relational operators for Points.
| * Since we do have a sense of coordinates being
| * "equal" to a given accuracy (point_vert, point_horiz),

So, we take it as a requirement for geometric types. All the
difficulties come from the "nature".

> > You mentioned b-tree, which don't have predefined opclass for
> > geometric types. So the "inconsistency" may be mentioning the
> > difference between geometric values and combinations of plain
> > (first-class) values. But the two are different things and
> > apparently using different operators (~= and = for equality) so
> > the issue is not fit for this. More specifically, "p ~= p0" and
> > "x = x0 and y = y0" are completely different.
> 
> Yes, the default btree operator class doesn't have to implement
> standard equality and basic comparison operator symbols.  We can
> implement it with different symbols.

Perhaps I don't understand it. Many opclass are defined for
btree. But since (ordinary) btree handles only one-dimentional,
totally-orderable sets, geometric types even point don't fit
it. Even if we relaxed it by removing EPSILON, multidimentional
data still doesn't fit.

Still we can implement equality mechanism *over* multiple btree
indexes, but it cannot be a opclass for btree.

> > Could you let me (or any other guys on this ml) have more precise
> > description on the problem and/or what you want to do with this
> > patch?
> 
> I will try to itemize the current problems with the epsilon method:
> 
> = Operator Inconsistencies
> 
> My current patches address all of them.

No. It just removes tolerans that is a "requirement" for
coordinates of geometric types. I think we don't have enough
reason to remove it. We could newly define geometric types with
no-tolerance as 'exact_point" or something but it still doesn't
fit btree.

> - Only some operators are using the epsilon.
> 
> I included the list of the ones which doesn't use the epsilon on
> initial post of this thread.  This makes the operators not compatible
> with each other.  For example, you cannot say "if box_a @> box_b and
> box_b @> point then box_a @> box_b".

(It must be "then box_a @> point".)  Both of the operators don't
seem to use EPSILON so the transitivity holds, but perhaps we
should change them to more appropriate shape, that is, to use
EPSILON. Even having the tolerance, we can define the operators
to keep the transitivity.

> - The application of epsilon is implementation dependent.
> 
> Epsilon works between two numbers.  There is not always a single way
> of implementing geometric operators.  This is surprising to the users.
> For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
> epsilon".

Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
..". I'm not sure off-hand. But I don't see the relationship is
significant.

> This is also creating a high barrier on developing those operators.
> Fixing bugs and performance regressions are very likely to change the
> results.

I agree to you. The effect of the EPSILON is quite arbitrary and
difficult to see their chemistry.

> - Some operators are just wrong:
> 
> Line operators are not well maintained.  The following are giving
> wrong results when neither A or B of lines are not exactly 1:
> 
> * line # line
> * line <-> line
> * point ## line
> * point ## lseg

These are not comparison operators so EPSILON is unrelated. And I
agree that they needs fix.

> They are broken independently from the epsilon, though it is not easy
> to fix them without

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-11 Thread Emre Hasegeli

> - Floating point comparisons for gemetric types
>
>   Comparison related to geometric types is performed by FPeq
>   macro and similars defined in geo_decls.h. This intends to give
>   tolerance to the comparisons.
>
>  A
>   FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance)
>   FPne:   ->|  e | e  |<-  (<- means exclusive)
>   FPlt:  | e  |<-
>   FPle: |<=e |
>   FPgt:   ->|  e |
>   FPge:  | e=>|
>
>   These seems reasonable ignoring the tolerance amount issue.

I tried to explain below why I don't find this method reasonable.

> At least for the point type, (bitmap) index scan is consistent
> with sequential scan.  I remember that the issue was
> "inconsistency between indexed and non-indexed scans over
> geometric types" but they seem consistent with each other.

You can see on the Git history that it was a lot of trouble to make
the indexes consistent with the operators, and this is limiting
improving indexing of the geometric types.

> You mentioned b-tree, which don't have predefined opclass for
> geometric types. So the "inconsistency" may be mentioning the
> difference between geometric values and combinations of plain
> (first-class) values. But the two are different things and
> apparently using different operators (~= and = for equality) so
> the issue is not fit for this. More specifically, "p ~= p0" and
> "x = x0 and y = y0" are completely different.

Yes, the default btree operator class doesn't have to implement
standard equality and basic comparison operator symbols.  We can
implement it with different symbols.

> Could you let me (or any other guys on this ml) have more precise
> description on the problem and/or what you want to do with this
> patch?

I will try to itemize the current problems with the epsilon method:

= Operator Inconsistencies

My current patches address all of them.

- Only some operators are using the epsilon.

I included the list of the ones which doesn't use the epsilon on
initial post of this thread.  This makes the operators not compatible
with each other.  For example, you cannot say "if box_a @> box_b and
box_b @> point then box_a @> box_b".

- The application of epsilon is implementation dependent.

Epsilon works between two numbers.  There is not always a single way
of implementing geometric operators.  This is surprising to the users.
For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
epsilon".

This is also creating a high barrier on developing those operators.
Fixing bugs and performance regressions are very likely to change the
results.

- Some operators are just wrong:

Line operators are not well maintained.  The following are giving
wrong results when neither A or B of lines are not exactly 1:

* line # line
* line <-> line
* point ## line
* point ## lseg

They are broken independently from the epsilon, though it is not easy
to fix them without changing the results of the cases that are
currently working.

- Some operators are violating commutative property.

For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".

- Some operators are violating transitive property.

For example, you cannot say "if point_a ~= point_b and point_b ~=
point_c then point_a ~= point_c".

- The operators with epsilon are not compatible with float operators.

This is also surprising for the users.  For example, you cannot say
"if point_a << point_b then point_a[0] < point_b[0]".

- The operators are not checking for underflow or overflow.
- NaN values are not treated same as the float datatype.

Those are independent problems, though checking them with epsilon
would create additional set of inconsistencies.  Epsilon creates
especially more problems around 0.

= Missing Bits on the Operator Classes

I want to address those in the future, if my patches are accepted.

- There is no GiST or SP-GiST support for cross datatype operators
other than containment.

We can develop nice cross-datatype operator families to support more
operators.  It would have been easier, if we could rely on some
operators to implement others.  Geometric types serve the purpose of
demonstrating our indexing capabilities.  We should make them good
examples, not full of special cases with hidden bugs.

- There is no BRIN support for types other than the box.

BRIN inclusion operator class is datatype agnostic.  It uses some
operators to implement others which doesn't work for anything more the
box vs box operators, because of the inconsistencies.

- GROUP BY and DISTINCT are not working.
- ORDER BY is not working.

There are regular user complaints about this on the mailing list.  We
can easily provide default hash and btree operator classes that would
fix the problem, if we haven't had the epsilon.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-10 Thread Kyotaro HORIGUCHI

Hello, 

At Fri, 18 Nov 2016 10:58:27 +0100, Emre Hasegeli  wrote in 

> > To keep such kind of integrity, we should deeply consider about
> > errors.
> 
> My point is that I don't think we can keep integrity together with the
> fuzzy behaviour, or at least I don't have the skills to do that.  I
> can just leave the more complicated operators like "is a
> point on a line" as it is, and only change the basic ones.  Do you
> think a smaller patch like this would be acceptable?

The size of the patch is not a problem. I regret that I haven't
made your requirement clear. So as the startpoint of the new
discussion, I briefly summarize the current implement of
geometric comparisons.

- Floating point comparisons for gemetric types

  Comparison related to geometric types is performed by FPeq
  macro and similars defined in geo_decls.h. This intends to give
  tolerance to the comparisons.

 A
  FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance)
  FPne:   ->|  e | e  |<-  (<- means exclusive)
  FPlt:  | e  |<-  
  FPle: |<=e |
  FPgt:   ->|  e |
  FPge:  | e=>|

  These seems reasonable ignoring the tolerance amount issue.


- Consistency between index and non-index scans.

 GIST supports geometric types.

 =# create table tpnt1(id int, p point);
 =# insert into tpnt1 (select i + 200, point(i*1.0e-6 / 100.0, i * 1.0e-6 / 
100.0) from generate_series(-200, 200) as i);
 =# create index on tpnt1 using gist (p);
 =# set enable_seqscan to false;
 =# set enable_bitmapscan to true;
 =# select count(*) from tpnt1 where p ~= point(0, 0);
201
 =# select count(*) from tpnt1 where p << point(0, 0);
100
 =# set enable_seqscan to true;
 =# set enable_bitmapscan to false;
 =# select count(*) from tpnt1 where p ~= point(0, 0);
201
 =# select count(*) from tpnt1 where p << point(0, 0);
100

At least for the point type, (bitmap) index scan is consistent
with sequential scan.  I remember that the issue was
"inconsistency between indexed and non-indexed scans over
geometric types" but they seem consistent with each other.

You mentioned b-tree, which don't have predefined opclass for
geometric types. So the "inconsistency" may be mentioning the
difference between geometric values and combinations of plain
(first-class) values. But the two are different things and
apparently using different operators (~= and = for equality) so
the issue is not fit for this. More specifically, "p ~= p0" and
"x = x0 and y = y0" are completely different.


Could you let me (or any other guys on this ml) have more precise
description on the problem and/or what you want to do with this
patch?

regards,
 
-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-20 Thread Kyotaro HORIGUCHI

Hello,

# Sorry for a trash I emitted before..

Perhaps you don't assume any calculations applied on stored
geo-type values. Please be patient to disucuss with me. (Sorry
for perhas hard-to-read English..)

At Fri, 18 Nov 2016 10:58:27 +0100, Emre Hasegeli  wrote in 

> > To keep such kind of integrity, we should deeply consider about
> > errors.
> 
> My point is that I don't think we can keep integrity together with the
> fuzzy behaviour, or at least I don't have the skills to do that.  I
> can just leave the more complicated operators like "is a
> point on a line" as it is, and only change the basic ones.  Do you
> think a smaller patch like this would be acceptable?

The size of the patch is not a matter. It broken the current
behavior is a matter. It shows strange behavior for some cases
but it designed to work reasonable for some cases.

> > That's a fate of FP numbers. Btree is uasble for such data since
> > it is constructed using inequality operators but hash is almost
> > unsuitable without rounding and normalization because of the
> > nature of floating point numbers. Range scan on bree will work
> > find but on the other hand equality doesn't work even on btree
> > index from the same reason to the below.
> 
> Btree is very useful with floats.  There is no problem with it.

I don't mind btrees on bare floats. It will 'work' if error is
properly considered, or error can be omitted for the
purpose. Point has some kind of physical context or expected
behavior other than that of bare floats which the EPSILON
implies.

> > Again, FP numbers generally cannot match exactly each other. You
> > have to avoid that.
> 
> Again, they do match very well.  Floating point numbers are no magic.
> They are perfectly deterministic.

No magic, yes, so it is always accompanied by error. If you store
points, then searching for *any of them*, it works fine. It's no
magic. If you obtained a ponit from any physical data source, say
GPS receiver, then search identical points from a table, it would
be almost impossible to find a "exactly the same" location. It's
no magic.  Once a calculation has been applied on them, it is no
longer guaranteed to work in the same way. It's also no magic.

CREATE OR REPLACE FUNCTION rotate_point (integer, float8) RETURNS void AS $$ 
UPDATE p SET x = x * cos($2) - y * sin($2), y = x * sin($2) + y * cos($2) WHERE 
i = $1 $$ LANGUAGE SQL;
ALTER TABLE p (i int, x float8, y float8);
INSERT INTO p VALUES (0, 1.0, 1.0), (1, 1.0, 1.0);
SELECT rotate_point(0, pi() * 2.0 / 3.0);
SELECT rotate_point(0, pi() * 2.0 / 3.0);
SELECT rotate_point(0, pi() * 2.0 / 3.0);
SELECT * FROM p WHERE i = 0;
 i | x | y 
---+---+---
 0 | 1 | 0.999

So

SELECT p0.x = p1.x AND p0.y = p1.y FROM p p0, p p1 WHERE p0.i = 0 and p1.i = 1;
 ?column? 
--
 f
(1 row)

This just repeats 1/3 rotation 3 times. Ideally p0 and p1 are
identical but this primitive operation generates visible error
that makes the comparison fail. Even though these two points are
expected be identical in some geometric viewpoint. This is a part
of the meaning of the EPSILON.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-18 Thread Emre Hasegeli

> To keep such kind of integrity, we should deeply consider about
> errors.

My point is that I don't think we can keep integrity together with the
fuzzy behaviour, or at least I don't have the skills to do that.  I
can just leave the more complicated operators like "is a
point on a line" as it is, and only change the basic ones.  Do you
think a smaller patch like this would be acceptable?

> That's a fate of FP numbers. Btree is uasble for such data since
> it is constructed using inequality operators but hash is almost
> unsuitable without rounding and normalization because of the
> nature of floating point numbers. Range scan on bree will work
> find but on the other hand equality doesn't work even on btree
> index from the same reason to the below.

Btree is very useful with floats.  There is no problem with it.

> Again, FP numbers generally cannot match exactly each other. You
> have to avoid that.

Again, they do match very well.  Floating point numbers are no magic.
They are perfectly deterministic.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-17 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 14 Nov 2016 11:41:52 +0100, Emre Hasegeli  wrote in 

> > What way to deal with it is in your mind? The problem hides
> > behind operators. To fix it a user should rewrite a expression
> > using more primitive operators. For example, (line_a # point_a)
> > should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in
> > more primitive way. I regared this that the operator # just
> > become useless.
> 
> Simple equations like this works well with the current algorithm:
> 
> > select '(0.1,0.1)'::point @ '(0,0),(1,1)'::line;

That's too simple for this discussion. If it satisfies someone's
requirement, a geometiric type with integer would go. Any
trigonometric calculations (by other than right angles) would
break it.

> The operator does what you expect from it.  Users can use something
> like you described to get fuzzy behaviour with an epsilon they choose.
> 
> > Regarding optimization, at least gcc generates seemingly not so
> > different code for the two. The both generally generates extended
> > code directly calling isnan() and so. Have you measured the
> > performance of the two implement (with -O2, without
> > --enable-cassert)?  This kind of hand-optimization gets
> > legitimacy when we see a siginificant difference, according to
> > the convention here.. I suppose.
> 
> I tested it with this program:
> 
> > int main()
> > {
> >double  i,
> >j;
> >int result = 0;
> >
> >for (i = 0.1; i < 1.0; i += 1.0)
> >for (j = 0.1; j < 1.0; j += 1.0)
> >if (float8_lt(i, j))
> >result = (result + 1) % 10;
> >
> >return result;
> > }
> 
> The one calling cmp() was noticeable slower.
> 
> ./test1  0.74s user 0.00s system 99% cpu 0.748 total
> ./test2  0.89s user 0.00s system 99% cpu 0.897 total
> 
> This would probably be not much noticeable by calling SQL functions
> which are doing a few comparisons only, but it may be necessary to do
> many more comparisons on some places.  I don't find the optimised
> versions less clear than calling the cmp().  I can change it the other
> way, if you find it more clear.

Thanks for the measurment. I had similar numbers (14:13) for
separate code from Pg, but probably this difference is buried
under other steps. The law of stay-stupid says that this kind of
optimization should be consider only after it found to be
problematic, I think.


> > At least the comment you dropped by the patch,
> >
> >  int
> >  float4_cmp_internal(float4 a, float4 b)
> >  {
> > -   /*
> > -* We consider all NANs to be equal and larger than any non-NAN. 
> > This is
> > -* somewhat arbitrary; the important thing is to have a consistent 
> > sort
> > -* order.
> > -*/
> >
> > seems very significant and should be kept anywhere relevant.
> 
> I will add it back on the next version.
> 
> > I seached pgsql-general ML but counldn't find a complaint about
> > the current behavior. Even though I'm not familar with PostGIS, I
> > found that it uses exactly the same EPSILON method with
> > PostgreSQL.
> 
> Is it?  I understood from Paul Ramsey's comment on this thread [1]
> that they don't.

I saw the repository of PostGIS by myself, but only in
box2d.c. Other objects have their own EPSILONs, EPSILOG_SQLMM,
DBL_EPSLON, FLT_EPSILON... Maybe that's all

|  * Copyright (C) 2008-2011 Paul Ramsey 
|  * Copyright (C) 2008 Mark Cave-Ayland 
| 
| #ifndef EPSILON
| #define EPSILON1.0E-06
| #endif
| #ifndef FPeq
| #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON)
| #endif

I agree that we should have our own tolerance method, and it migh
be an explicit one. But anyway we need any kind of tolerance on
comparing FP numbers. Removing such tolerance as preparation for
implementing another one is quite natural, but released version
should have any kind of tolerance.

> > If we had an apparent plan to use them for other than
> > earth-scale(?)  geometric usage, we could design what they should
> > be. But without such a plan it is just a breakage of the current
> > usage.
> 
> We give no promises about the geometric types being useful in earth scale.

So, we shouldn't just remove it.

> > About What kind of (needless) complication you are saying? The
> > fuzziness seems to me essential for geometric comparisons to work
> > practically. Addition to that, I don't think that we're not
> > allowed to change the behavior in such area of released versions
> > the time after time.
> 
> Even when it is a total mess?

Yes. The seemingly 'total-mess' is intended for a use with
lon/lat values and probably gave useful result at the time of
development. The mess is rather better than just ruined.

> > I don't think index scan and tolerant comparison are not
> > contradicting. Could you let me have an description about the
> > indexing capabilities and the

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-14 Thread Emre Hasegeli

> What way to deal with it is in your mind? The problem hides
> behind operators. To fix it a user should rewrite a expression
> using more primitive operators. For example, (line_a # point_a)
> should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in
> more primitive way. I regared this that the operator # just
> become useless.

Simple equations like this works well with the current algorithm:

> select '(0.1,0.1)'::point @ '(0,0),(1,1)'::line;

The operator does what you expect from it.  Users can use something
like you described to get fuzzy behaviour with an epsilon they choose.

> Regarding optimization, at least gcc generates seemingly not so
> different code for the two. The both generally generates extended
> code directly calling isnan() and so. Have you measured the
> performance of the two implement (with -O2, without
> --enable-cassert)?  This kind of hand-optimization gets
> legitimacy when we see a siginificant difference, according to
> the convention here.. I suppose.

I tested it with this program:

> int main()
> {
>double  i,
>j;
>int result = 0;
>
>for (i = 0.1; i < 1.0; i += 1.0)
>for (j = 0.1; j < 1.0; j += 1.0)
>if (float8_lt(i, j))
>result = (result + 1) % 10;
>
>return result;
> }

The one calling cmp() was noticeable slower.

./test1  0.74s user 0.00s system 99% cpu 0.748 total
./test2  0.89s user 0.00s system 99% cpu 0.897 total

This would probably be not much noticeable by calling SQL functions
which are doing a few comparisons only, but it may be necessary to do
many more comparisons on some places.  I don't find the optimised
versions less clear than calling the cmp().  I can change it the other
way, if you find it more clear.

> At least the comment you dropped by the patch,
>
>  int
>  float4_cmp_internal(float4 a, float4 b)
>  {
> -   /*
> -* We consider all NANs to be equal and larger than any non-NAN. This 
> is
> -* somewhat arbitrary; the important thing is to have a consistent 
> sort
> -* order.
> -*/
>
> seems very significant and should be kept anywhere relevant.

I will add it back on the next version.

> I seached pgsql-general ML but counldn't find a complaint about
> the current behavior. Even though I'm not familar with PostGIS, I
> found that it uses exactly the same EPSILON method with
> PostgreSQL.

Is it?  I understood from Paul Ramsey's comment on this thread [1]
that they don't.

> If we had an apparent plan to use them for other than
> earth-scale(?)  geometric usage, we could design what they should
> be. But without such a plan it is just a breakage of the current
> usage.

We give no promises about the geometric types being useful in earth scale.

> About What kind of (needless) complication you are saying? The
> fuzziness seems to me essential for geometric comparisons to work
> practically. Addition to that, I don't think that we're not
> allowed to change the behavior in such area of released versions
> the time after time.

Even when it is a total mess?

> I don't think index scan and tolerant comparison are not
> contradicting. Could you let me have an description about the
> indexing capabilities and the inconsistencies?

The first problem is that some operators are not using the epsilon.
This requires special treatment while developing index support for
operators.  I have tried to support point for BRIN using the box
operators, and failed because of that.

Comparing differences with epsilon is not applicable the same way to
every operator.  Even with simple operators like "point in box" it
covers different distances outside the box depending on where the
point is.  For example, "point <-> box < EPSILON" wouldn't be
equivalent with "point <@ box", when the point is outside corner of
the box.  Things get more complicated with lines.  Because of these,
we are easily violating basic expectations of the operators:

> regression=# select '{1000,0.01,0}'::line ?|| '{9,0.9,0}'::line;
>
> ?column?
> --
> f
> (1 row)
>
> regression=# select '{9,0.9,0}'::line ?|| '{1000,0.01,0}'::line;
> ?column?
> --
> t
> (1 row)

Another problem is lack of hash and btree operator classes.  In my
experience, the point datatype is by far most used one.  People often
try to use it on DISTINCT, GROUP BY, or ORDER BY clauses and complain
when it doesn't work.  There are many complaints like this on the
archives.  If we get rid of the epsilon, we can easily add those
operator classes.

[1] 
https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-14 Thread Kyotaro HORIGUCHI

Hello,

At Sun, 13 Nov 2016 22:41:01 +0100, Emre Hasegeli  wrote in 

> > We can remove the fuzz factor altogether but I think we also
> > should provide a means usable to do similar things. At least "is
> > a point on a line" might be useless for most cases without any
> > fuzzing feature. (Nevertheless, it is a problem only when it is
> > being used to do that:) If we don't find reasonable policy on
> > fuzzing operations, it would be the proof that we shouldn't
> > change the behavior.
> 
> It was my initial idea to keep the fuzzy comparison behaviour on some
> places, but the more I get into I realised that it is almost
> impossible to get this right.  Instead, I re-implemented some
> operators to keep precision as much as possible.  The previous "is a
> point on a line" operator would *never* give the true result without
> the fuzzy comparison.  The new implementation would return true, when
> precision is not lost.

There's no way to accurately store numbers other than some
special ones in floating point format where mantissa precision is
limited. Even 0.1 is not stored accurately with a binary
mantissa. (It would be concealed by machine rounding for most
cases though.)  The same is said for numeric. It cannot store 1/3
accurately and doesn't even conceal the incaccuracy. Even if they
could store numbers accurately, anyway , say, the constant pi
does not have infinite precision.  As the result, we have a
tradition that equal operator on real numbers should be avoided
generally. Numerics are provided mainly for financial use, on
which finite-but-high precision decimal arithmetic are performed.

> I think this is a behaviour people, who are
> working with floating points, are prepared to deal with.  

What way to deal with it is in your mind? The problem hides
behind operators. To fix it a user should rewrite a expression
using more primitive operators. For example, (line_a # point_a)
should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in
more primitive way. I regared this that the operator # just
become useless.


> By the way, "is a point on a line" operator is quite wrong with
> the fuzzy comparison at the moment [1].

Th bug is a isolate problem from the epsilon behavior. It of
course should be fixed, but in "appropriate" way that may defined
in this discussion.

> > The 0001 patch adds many FP comparison functions individually
> > considering NaN. As the result the sort order logic involving NaN
> > is scattered around into the functions, then, you implement
> > generic comparison function using them. It seems inside-out to
> > me. Defining ordering at one place, then comparison using it
> > seems to be reasonable.
> 
> I agree that it would be simpler to use the comparison function for
> implementing other operators.  I have done it other way around to make
> them more optimised.  They are called very often.  I don't think
> checking exit code of the comparison function would be optimised the
> same way.  I could leave the comparison functions as they are, but
> re-implemented them using the others to keep documentation of NaN
> comparison in the single place.

Regarding optimization, at least gcc generates seemingly not so
different code for the two. The both generally generates extended
code directly calling isnan() and so. Have you measured the
performance of the two implement (with -O2, without
--enable-cassert)?  This kind of hand-optimization gets
legitimacy when we see a siginificant difference, according to
the convention here.. I suppose.

At least the comment you dropped by the patch,

 int
 float4_cmp_internal(float4 a, float4 b)
 {
-   /*
-* We consider all NANs to be equal and larger than any non-NAN. This is
-* somewhat arbitrary; the important thing is to have a consistent sort
-* order.
-*/

seems very significant and should be kept anywhere relevant.


> > If the center somehow goes extremely near to the origin, it could
> > result in a false error.
> >
> >> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
> >> ERROR:  value out of range: underflow
> >
> > I don't think this underflow is an error, and actually it is a
> > change of the current behavior without a reasonable reason. More
> > significant (and maybe unacceptable) side-effect is that it
> > changes the behavior of ordinary operators. I don't think this is
> > acceptable. More consideration is needed.
> >
> >> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
> >> ERROR:  value out of range: underflow
> 
> This is the current behaviour of float datatype.  My patch doesn't
> change that. 

Very sorry for the mistake! I somehow saw float8pl on master and
float8_div with your patch. Pleas forget it.

> This problem would probably also apply to multiplying
> very small values.  I agree that this is not the ideal behaviour.
> Though I am not sure, if we should go to a

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-13 Thread Emre Hasegeli

> We can remove the fuzz factor altogether but I think we also
> should provide a means usable to do similar things. At least "is
> a point on a line" might be useless for most cases without any
> fuzzing feature. (Nevertheless, it is a problem only when it is
> being used to do that:) If we don't find reasonable policy on
> fuzzing operations, it would be the proof that we shouldn't
> change the behavior.

It was my initial idea to keep the fuzzy comparison behaviour on some
places, but the more I get into I realised that it is almost
impossible to get this right.  Instead, I re-implemented some
operators to keep precision as much as possible.  The previous "is a
point on a line" operator would *never* give the true result without
the fuzzy comparison.  The new implementation would return true, when
precision is not lost.  I think this is a behaviour people, who are
working with floating points, are prepared to deal with.  By the way,
"is a point on a line" operator is quite wrong with the fuzzy
comparison at the moment [1].

> The 0001 patch adds many FP comparison functions individually
> considering NaN. As the result the sort order logic involving NaN
> is scattered around into the functions, then, you implement
> generic comparison function using them. It seems inside-out to
> me. Defining ordering at one place, then comparison using it
> seems to be reasonable.

I agree that it would be simpler to use the comparison function for
implementing other operators.  I have done it other way around to make
them more optimised.  They are called very often.  I don't think
checking exit code of the comparison function would be optimised the
same way.  I could leave the comparison functions as they are, but
re-implemented them using the others to keep documentation of NaN
comparison in the single place.

> If the center somehow goes extremely near to the origin, it could
> result in a false error.
>
>> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
>> ERROR:  value out of range: underflow
>
> I don't think this underflow is an error, and actually it is a
> change of the current behavior without a reasonable reason. More
> significant (and maybe unacceptable) side-effect is that it
> changes the behavior of ordinary operators. I don't think this is
> acceptable. More consideration is needed.
>
>> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
>> ERROR:  value out of range: underflow

This is the current behaviour of float datatype.  My patch doesn't
change that.  This problem would probably also apply to multiplying
very small values.  I agree that this is not the ideal behaviour.
Though I am not sure, if we should go to a different direction than
the float datatypes.

I think there is value in making geometric types compatible with the
float.  Users are going to mix them, anyway.  For example, users can
calculate the center of a box manually, and confuse when the built-in
operator behaves differently.

> In regard to fuzzy operations, libgeos seems to have several
> types of this kind of feature. (I haven't looked closer into
> them). Other than reducing precision seems overkill or
> unappliable for PostgreSQL bulitins. As Jim said, can we replace
> the fixed scale fuzz factor by precision reduction? Maybe, with a
> GUC variable (I hear someone's roaring..) to specify the amount
> defaults to fit the current assumption.

I am disinclined to try to implement something complicated for the
geometric types.  I think they are mostly useful for 2 purposes: uses
simple enough to not worth looking for better solutions, and
demonstrating our indexing capabilities.  The inconsistencies harm
both of those.

[1] 
https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-13 Thread Emre Hasegeli

> The reason for that is that you forgot to edit an alternative
> exptect file, which matches for en_US locale.

Now I understand.  Thank you for the explanation.

> But the test doesn't for collation and the only difference
> between the alternative expect files comes from the difference of
> collation for the query. "the workaround" seems to be the right
> way to do it. I recommend rather to leave the workaroud as it is
> and remove select_views_1.out from the "expected" directory.

I will do.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-10 Thread Kyotaro HORIGUCHI

Hello,

> Aside from that, I'd like to comment this patch on other points
> later.

The start of this patch is that the fact that most of but not all
geometric operators use fuzzy comparson. But Tom pointed out that
the fixed fuzz factor is not reasonable but hard to find how to
modify it.

We can remove the fuzz factor altogether but I think we also
should provide a means usable to do similar things. At least "is
a point on a line" might be useless for most cases without any
fuzzing feature. (Nevertheless, it is a problem only when it is
being used to do that:) If we don't find reasonable policy on
fuzzing operations, it would be the proof that we shouldn't
change the behavior.

The 0001 patch adds many FP comparison functions individually
considering NaN. As the result the sort order logic involving NaN
is scattered around into the functions, then, you implement
generic comparison function using them. It seems inside-out to
me. Defining ordering at one place, then comparison using it
seems to be reasonable.

The 0002 patch repalces many native operators for floating point
numbers by functions having sanity check. This seems to be
addressing to Tom's comment. It is reasonable for comparison
operators but I don't think replacing all arithmetics is so. For
example, float8_div checks that

- If both of operands are not INF, result should not be INF.
- If the devident is not exactly zero, the result should not be zero.

The second assumption is wrong by itself. For an extreme case,
4.9e-324 / 1.7e+308 becomes exactly zero (by underflow). We
canont assert it to be wrong but the devedent is not zero. The
validity of the result varies according to its meaning. For the
case of box_cn,

> center->x = float8_div(float8_pl(box->high.x, box->low.x), 2.0);
> center->y = float8_div(float8_pl(box->high.y, box->low.y), 2.0);

If the center somehow goes extremely near to the origin, it could
result in a false error.

> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
> ERROR: value out of range: underflow

I don't think this underflow is an error, and actually it is a
change of the current behavior without a reasonable reason. More
significant (and maybe unacceptable) side-effect is that it
changes the behavior of ordinary operators. I don't think this is
acceptable. More consideration is needed.

> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
> ERROR: value out of range: underflow

In regard to fuzzy operations, libgeos seems to have several
types of this kind of feature. (I haven't looked closer into
them). Other than reducing precision seems overkill or
unappliable for PostgreSQL bulitins. As Jim said, can we replace
the fixed scale fuzz factor by precision reduction? Maybe, with a
GUC variable (I hear someone's roaring..) to specify the amount
defaults to fit the current assumption.

https://apt-browse.org/browse/ubuntu/trusty/universe/i386/libgeos++-dev/3.4.2-4ubuntu1/file/usr/include/geos/geom/BinaryOp.h

/*
* Define this to use PrecisionReduction policy
* in an attempt at by-passing binary operation
* robustness problems (handles TopologyExceptions)
*/
...
* Define this to use TopologyPreserving simplification policy
* in an attempt at by-passing binary operation
* robustness problems (handles TopologyExceptions)
(seems not activated)
...
* Use common bits removal policy.
* If enabled, this would be tried /before/
* Geometry snapping.
...
/*
* Use snapping policy
*/

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-10 Thread Kyotaro HORIGUCHI

Hello,

> > Returning to the issue, the following query should give you the
> > expected result.
> >
> > SELECT name, #thepath  FROM iexit ORDER BY name COLLATE "C", 2;
> 
> Yes, I have worked around it like this.  What I couldn't understand is
> how my patch can cause this regression.  How is it passes on master
> without COLLATE?

Ah, sorry, I understand that *you* added the COLLATE.  Revering
select_views.sql/out to master actually causes a regression error.

The reason for that is that you forgot to edit an alternative
exptect file, which matches for en_US locale.

But the test doesn't for collation and the only difference
between the alternative expect files comes from the difference of
collation for the query. "the workaround" seems to be the right
way to do it. I recommend rather to leave the workaroud as it is
and remove select_views_1.out from the "expected" directory.

Aside from that, I'd like to comment this patch on other points
later.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-10 Thread Emre Hasegeli

> Returning to the issue, the following query should give you the
> expected result.
>
> SELECT name, #thepath  FROM iexit ORDER BY name COLLATE "C", 2;

Yes, I have worked around it like this.  What I couldn't understand is
how my patch can cause this regression.  How is it passes on master
without COLLATE?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-10 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 29 Sep 2016 10:37:30 +0200, Emre Hasegeli  wrote in 

> > regression=# select 'I- 580Ramp' < 'I- 580/I-680
> >   Ramp';
> >  ?column?
> > --
> >  t
> > (1 row)
> 
> on the Linux server I am testing, it is not:
> 
> > regression=# select 'I- 580Ramp' < 'I- 580/I-680
> >   Ramp';
> >  ?column?
> > --
> >  f
> > (1 row)
> 
> The later should be the case on your environment as the test was also
> failing for you.  This is not consistent with the expected test
> result.  Do you know how this test can still pass on the master?

Perhaps you ran the test under the environment LC_COLLATE (or
LANG) of something other than C. The reson for the result is
collation. The following returns expected result.


=# select 'I- 580Ramp' < ('I- 580/I-680 
 Ramp' COLLATE "C");
 ?column? 
--
 t
(1 row)

For a reason I don't know, say, en_US locale behaves in an
unintuitive way.

regression=# select ' ' < ('/' COLLATE "en_US"), ' x' < ('/a' COLLATE "en_US");
 ?column? | ?column? 
--+--
 t| f
(1 row)


Returning to the issue, the following query should give you the
expected result.

SELECT name, #thepath  FROM iexit ORDER BY name COLLATE "C", 2;

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-29 Thread Emre Hasegeli

> Well, those two results are not contradictory -- notice that you
> switched the order of the values in the comparison.  I don't think
> you've really found the explanation yet.

I am sorry I copy-pasted the wrong example:

"select_views" test runs:

> SELECT name, #thepath FROM iexit ORDER BY 1, 2

and expects to get rows in this order:

>  I- 580Ramp |8
>  I- 580/I-680  Ramp |2

with the collation on my laptop, this is actually true:

> regression=# select 'I- 580Ramp' < 'I- 580/I-680  
> Ramp';
>  ?column?
> --
>  t
> (1 row)

on the Linux server I am testing, it is not:

> regression=# select 'I- 580Ramp' < 'I- 580/I-680  
> Ramp';
>  ?column?
> --
>  f
> (1 row)

The later should be the case on your environment as the test was also
failing for you.  This is not consistent with the expected test
result.  Do you know how this test can still pass on the master?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-28 Thread Kevin Grittner

On Wed, Sep 28, 2016 at 2:04 PM, Kevin Grittner  wrote:

> Will take a look and post again.

I am moving this patch to the next CF.  You'll be hearing from me
sometime after this CF is closed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-28 Thread Kevin Grittner

On Wed, Sep 28, 2016 at 12:02 PM, Emre Hasegeli  wrote:

>> `make check` finds differences per the attached.  Please
>> investigate why the regression tests are failing and what the
>> appropriate response is.
>
> I fixed the first one and workaround the second with COLLATE "C".  I
> have how my changes caused this regression.
>
> "select_views" test runs "SELECT name, #thepath FROM iexit ORDER BY 1,
> 2" and expects to get rows in this order:
>
>>  I- 580Ramp |8
>>  I- 580/I-680  Ramp |2
>
> With the collation on my laptop, this is actually true:
>
>> regression=# select 'I- 580/I-680  Ramp' < 'I- 580   
>>  Ramp';
>>  ?column?
>> --
>>  t
>> (1 row)
>
> However, on the Linux server, I am testing it is not:
>
>> regression=# select 'I- 580Ramp' < 'I- 580/I-680 
>>  Ramp';
>>  ?column?
>> --
>>  f
>> (1 row)
>
> Do you know how it is not failing on the master?

Well, those two results are not contradictory -- notice that you
switched the order of the values in the comparison.  I don't think
you've really found the explanation yet.

>> [discussing inline static functions compared to macros for min()/max(), etc.]
>> I suspect that they will be as fast or faster, and they eliminate
>> the hazard of multiple evaluation, where a programmer might not be
>> aware of the multiple evaluation or of some side-effect of an
>> argument.
>
> I reworked the the patches to use inline functions and fixed the
> problems I found.  The new versions are attached.

Will take a look and post again.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-28 Thread Emre Hasegeli

> Emre, are you going to address the above?  It would have to be Real
> Soon Now.

Yes, I am working on it.  I found more problems, replaced more
algorithms.  That took a lot of time.  I will post the new version
really soon.  I wouldn't feel bad, if you wouldn't have enough time to
review it in this commitfest.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-27 Thread Kevin Grittner

On Fri, Sep 9, 2016 at 4:25 PM, Kevin Grittner  wrote:
> On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeli  wrote:

> These patches apply and build on top of 5c609a74 with no problems,
> but `make check` finds differences per the attached.  Please
> investigate why the regression tests are failing and what the
> appropriate response is.

>> I am not much experienced in C.  If you think that inline functions
>> are better suited, I can rework the patches.
>
> I suspect that they will be as fast or faster, and they eliminate
> the hazard of multiple evaluation, where a programmer might not be
> aware of the multiple evaluation or of some side-effect of an
> argument.

Emre, are you going to address the above?  It would have to be Real
Soon Now.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-09 Thread Kevin Grittner

On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeli  wrote:
>> The first patch fails to apply due to bit-rot.  That's easy enough
>> to correct, but then it runs into warnings on make:
>
> Rebased and fixed.

These patches apply and build on top of 5c609a74 with no problems,
but `make check` finds differences per the attached.  Please
investigate why the regression tests are failing and what the
appropriate response is.

>> Something to consider before posting new version -- should we
>> change some of those macros to static inline (in the .h files) to
>> avoid double-evaluation hazards?  They might perform as well or
>> even better that way, and remove a subtle programmer foot-gun.
>
> One reason to keep them as macros is that they are currently
> supporting both float4 and float8.  Currently they look like this:
>
> FLOAT_EQ(val1, val2)
> FLOAT_PL(result, val1, val2)
>
> I guess if we would use inline functions, they would look like:
>
> FLOAT8_EQ(val1, val2)
> result = FLOAT8_PL(val1, val2)
>
> Result would need to be defined again in the function to be checked
> for overflow.  I think this way would be more developer-friendly.  I
> have gone some trouble to allocate the result to be able to use the
> macros.  Do you know if the performance of both would be the same?

In one case where I was concerned that a static inline definition
would not perform as well as a macro, I went so far as to compare
the compiled code for both at a number of call sites in an
optimized build and found that a reasonably current gcc generated
*identical* code for both.  Of course, this will not always be the
case -- in particular for macros which require multiple evaluations
of one or more parameters and they are not simple literals.  In
such cases, the static inline often benchmarks faster because the
results from the potentially expensive expression can be stored on
the stack or in a register, and just referenced again.

> I am not much experienced in C.  If you think that inline functions
> are better suited, I can rework the patches.

I suspect that they will be as fast or faster, and they eliminate
the hazard of multiple evaluation, where a programmer might not be
aware of the multiple evaluation or of some side-effect of an
argument.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-02 Thread Kevin Grittner

On Mon, Jul 18, 2016 at 3:54 PM, Emre Hasegeli  wrote:
> My progress so far is attached as 2 patches.  First one introduces an
> header file for adt/float.c.  Second one refactors the geometric
> operations.

The first patch fails to apply due to bit-rot.  That's easy enough
to correct, but then it runs into warnings on make:

btree_gin.c: In function ‘leftmostvalue_float4’:
btree_gin.c:234:2: error: implicit declaration of function
‘get_float4_infinity’ [-Werror=implicit-function-declaration]
  return Float4GetDatum(-get_float4_infinity());
  ^
btree_gin.c: In function ‘leftmostvalue_float8’:
btree_gin.c:242:2: error: implicit declaration of function
‘get_float8_infinity’ [-Werror=implicit-function-declaration]
  return Float8GetDatum(-get_float8_infinity());
  ^

Please fix.

Something to consider before posting new version -- should we
change some of those macros to static inline (in the .h files) to
avoid double-evaluation hazards?  They might perform as well or
even better that way, and remove a subtle programmer foot-gun.

Changing status to "Waiting on Author".

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-07-07 Thread Robert Haas

On Wed, Jun 1, 2016 at 10:27 AM, Tom Lane  wrote:
> As I understand it, the key problem is that tests like "is point on line"
> would basically never succeed except in the most trivial cases, because of
> roundoff error.  That's not very nice, and it might cascade to larger
> problems like object-containment tests failing unexpectedly.  We would
> need to go through all the geometric operations and figure out where that
> kind of gotcha is significant and what we can do about it.  Seems like a
> fair amount of work :-(.  If somebody's willing to do that kind of
> investigation, then sure, but I don't think just blindly removing these
> macros is going to lead to anything good.

Yeah, it does seem to need some research.

> Also, I suppose this means that Robert promises not to make any of his
> usual complaints about breaking compatibility?  Because we certainly
> would be.

Pot, meet Mr. Kettle!

Obviously, the inconvenience caused by any backward incompatibility
has to be balanced against the fact that the new behavior is
presumably better.  But I stridently object to the accusation that of
the two of us I'm the one more concerned with backward-compatibility.
There may be some instances where I've had a more conservative
judgement than you about breaking user-facing stuff, but you've
blocked dozens of changes to the C API that would have enabled
meaningful extension development on the grounds that somebody might
complain when a future release changes the API!  I think behavior
changes that users will notice are of vastly greater significance than
those which will only be observed by developers.

In this particular case, I think that the current behavior is pretty
stupid, and that the built-in geometric types are barely used,
possibly because they have stupid behavior.  So I would be willing to
bet on a well-thought-out change in this area coming out to a net
positive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Tom Lane

Paul Ramsey  writes:
> One of the things people find annoying about postgis is that
> ST_Intersects(ST_Intersection(a, b), a) can come out as false (a
> derived point at a crossing of lines may not exactly intersect either
> of the input lines), which is a direct result of our use of exact math
> for the boolean intersects test.

That's an interesting comment, because it's more or less exactly the
type of failure that we could expect to get if we remove fuzzy comparisons
from the built-in types.  How much of a problem is it in practice for
PostGIS users?  Do you have any plans to change it?

> Anyways, go forth and do whatever makes sense for PgSQL

I think we're trying to figure out what that is ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Paul Ramsey

On Wed, Jun 1, 2016 at 8:59 AM, Jim Nasby  wrote:
> On 6/1/16 10:03 AM, Paul Ramsey wrote:
>>
>> We don't depend on these, we have our own :/
>> The real answer for a GIS system is to have an explicit tolerance
>> parameter for calculations like distance/touching/containment, but
>> unfortunately we didn't do that so now we have our own
>> compatibility/boil the ocean problem if we ever wanted/were funded to
>> add one.
>
>
> Well it sounds like what's currently happening in Postgres is probably going
> to change, so how might we structure that to help PostGIS? Would simply
> lopping off the last few bits of the significand/mantissa work, or is that
> not enough when different GRSes are involved?

PostGIS doesn't look at all at what the PgSQL geotypes do, so go
forward w/o fear. Tolerance in geo world is more than vertex rounding
though, it's things like saying that when distance(pt,line) < epsilon
then distance(pt,line) == 0, or similarly for shape touching, etc. One
of the things people find annoying about postgis is that

ST_Intersects(ST_Intersection(a, b), a) can come out as false (a
derived point at a crossing of lines may not exactly intersect either
of the input lines), which is a direct result of our use of exact math
for the boolean intersects test.

Anyways, go forth and do whatever makes sense for PgSQL

P

>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Jim Nasby


On 6/1/16 10:03 AM, Paul Ramsey wrote:

We don't depend on these, we have our own :/
The real answer for a GIS system is to have an explicit tolerance
parameter for calculations like distance/touching/containment, but
unfortunately we didn't do that so now we have our own
compatibility/boil the ocean problem if we ever wanted/were funded to
add one.


Well it sounds like what's currently happening in Postgres is probably 
going to change, so how might we structure that to help PostGIS? Would 
simply lopping off the last few bits of the significand/mantissa work, 
or is that not enough when different GRSes are involved?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Joe Conway

On 06/01/2016 07:52 AM, Jim Nasby wrote:
> On 6/1/16 9:27 AM, Tom Lane wrote:
>> Kevin Grittner  writes:
>>> > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas 
>>> wrote:
 >> Figuring out what to do about it is harder.  Your proposal seems
 to be
 >> to remove them except where we need the fuzzy behavior, which
 doesn't
 >> sound unreasonable, but I don't personally understand why we need it
 >> in some places and not others.
>>> > +1
>>> > My first inclination is to remove those macros in version 10, but
>>> > it would be good to hear from some people using these types on what
>>> > the impact of that would be.
>> As I understand it, the key problem is that tests like "is point on line"
>> would basically never succeed except in the most trivial cases,
>> because of
>> roundoff error.  That's not very nice, and it might cascade to larger
>> problems like object-containment tests failing unexpectedly.  We would
>> need to go through all the geometric operations and figure out where that
>> kind of gotcha is significant and what we can do about it.  Seems like a
>> fair amount of work :-(.  If somebody's willing to do that kind of
>> investigation, then sure, but I don't think just blindly removing these
>> macros is going to lead to anything good.
> 
> I suspect another wrinkle here is that in the GIS world a single point
> can be represented it multiple reference/coordinate systems, and it
> would have different values in each of them. AIUI the transforms between
> those systems can be rather complicated if different projection methods
> are involved. I don't know if PostGIS depends on what these macros are
> doing or not. If it doesn't, perhaps it would be sufficient to lop of
> the last few bits of the significand. ISTM that'd be much better than
> what the macros currently do.

IIRC PostGIS uses a function from libgeos to do things like "point
equals" (ST_Equals). I've never looked at that source, but would be
unsurprised to find that it does something similar to this although
probably also more sophisticated.

(looks) yes -- ST_Equals calls GEOSEquals() after some sanity checking...

> BTW, I suspect the macro values were chosen specifically for dealing
> with LAT/LONG.

I think that is it exactly.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Paul Ramsey

On Wed, Jun 1, 2016 at 7:52 AM, Jim Nasby  wrote:

>
> I suspect another wrinkle here is that in the GIS world a single point can
> be represented it multiple reference/coordinate systems, and it would have
> different values in each of them. AIUI the transforms between those systems
> can be rather complicated if different projection methods are involved. I
> don't know if PostGIS depends on what these macros are doing or not. If it
> doesn't, perhaps it would be sufficient to lop of the last few bits of the
> significand. ISTM that'd be much better than what the macros currently do.

We don't depend on these, we have our own :/
The real answer for a GIS system is to have an explicit tolerance
parameter for calculations like distance/touching/containment, but
unfortunately we didn't do that so now we have our own
compatibility/boil the ocean problem if we ever wanted/were funded to
add one.
P.

> BTW, I suspect the macro values were chosen specifically for dealing with
> LAT/LONG.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Jim Nasby

On 6/1/16 9:27 AM, Tom Lane wrote:

Kevin Grittner  writes:

> On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas  wrote:

>> Figuring out what to do about it is harder.  Your proposal seems to be
>> to remove them except where we need the fuzzy behavior, which doesn't
>> sound unreasonable, but I don't personally understand why we need it
>> in some places and not others.

> +1
> My first inclination is to remove those macros in version 10, but
> it would be good to hear from some people using these types on what
> the impact of that would be.

As I understand it, the key problem is that tests like "is point on line"
would basically never succeed except in the most trivial cases, because of
roundoff error.  That's not very nice, and it might cascade to larger
problems like object-containment tests failing unexpectedly.  We would
need to go through all the geometric operations and figure out where that
kind of gotcha is significant and what we can do about it.  Seems like a
fair amount of work :-(.  If somebody's willing to do that kind of
investigation, then sure, but I don't think just blindly removing these
macros is going to lead to anything good.

I suspect another wrinkle here is that in the GIS world a single point 
can be represented it multiple reference/coordinate systems, and it 
would have different values in each of them. AIUI the transforms between 
those systems can be rather complicated if different projection methods 
are involved. I don't know if PostGIS depends on what these macros are 
doing or not. If it doesn't, perhaps it would be sufficient to lop of 
the last few bits of the significand. ISTM that'd be much better than 
what the macros currently do.

BTW, I suspect the macro values were chosen specifically for dealing 
with LAT/LONG.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Tom Lane

Kevin Grittner  writes:
> On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas  wrote:
>> Figuring out what to do about it is harder.  Your proposal seems to be
>> to remove them except where we need the fuzzy behavior, which doesn't
>> sound unreasonable, but I don't personally understand why we need it
>> in some places and not others.

> +1

> My first inclination is to remove those macros in version 10, but
> it would be good to hear from some people using these types on what
> the impact of that would be.

As I understand it, the key problem is that tests like "is point on line"
would basically never succeed except in the most trivial cases, because of
roundoff error.  That's not very nice, and it might cascade to larger
problems like object-containment tests failing unexpectedly.  We would
need to go through all the geometric operations and figure out where that
kind of gotcha is significant and what we can do about it.  Seems like a
fair amount of work :-(.  If somebody's willing to do that kind of
investigation, then sure, but I don't think just blindly removing these
macros is going to lead to anything good.

Also, I suppose this means that Robert promises not to make any of his
usual complaints about breaking compatibility?  Because we certainly
would be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Kevin Grittner

On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas  wrote:
> On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli  wrote:
>> There are those macros defined for the built-in geometric types:
>>
>>> #define EPSILON 1.0E-06
>>
>>> #define FPzero(A)   (fabs(A) <= EPSILON)
>>> #define FPeq(A,B)   (fabs((A) - (B)) <= EPSILON)
>>> #define FPne(A,B)   (fabs((A) - (B)) > EPSILON)
>>> #define FPlt(A,B)   ((B) - (A) > EPSILON)
>>> #define FPle(A,B)   ((A) - (B) <= EPSILON)
>>> #define FPgt(A,B)   ((A) - (B) > EPSILON)
>>> #define FPge(A,B)   ((B) - (A) <= EPSILON)
>>
>> with this warning:
>>
>>>  *XXX These routines were not written by a numerical analyst.
>
> I agree that those macros looks like a pile of suck.

+1

> It's unclear to
> me what purpose they're trying to accomplish, but regardless of what
> it is, it's hard for me to believe that they are accomplishing it.
> Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on
> the scale of the number; e.g. if the values are all like "3" it's
> probably fine but if they are like "37142816124856" it's probably not
> enough fuzz and if they are all like ".0004" it's probably way too
> much fuzz.

Also, it's a pretty lame heuristic.  It doesn't really eliminate
*any* problems; it just aims to make them less frequent by shifting
the edges around.  In doing so, it creates whole new classes of
problems:

test=# \set A '''(1.996,1.996),(1,1)''::box'
test=# \set B '''(2,2),(1,1)''::box'
test=# \set C '''(2.004,2.004),(1,1)''::box'
test=# select :A = :B, :B = :C, :A = :C;
 ?column? | ?column? | ?column?
--+--+--
 t| t| f
(1 row)

Is the benefit we get from the macros worth destroying the
transitive property of the comparison operators on these types?

> Figuring out what to do about it is harder.  Your proposal seems to be
> to remove them except where we need the fuzzy behavior, which doesn't
> sound unreasonable, but I don't personally understand why we need it
> in some places and not others.

+1

My first inclination is to remove those macros in version 10, but
it would be good to hear from some people using these types on what
the impact of that would be.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-06-01 Thread Robert Haas

On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli  wrote:
> There are those macros defined for the built-in geometric types:
>
>> #define EPSILON 1.0E-06
>
>> #define FPzero(A)   (fabs(A) <= EPSILON)
>> #define FPeq(A,B)   (fabs((A) - (B)) <= EPSILON)
>> #define FPne(A,B)   (fabs((A) - (B)) > EPSILON)
>> #define FPlt(A,B)   ((B) - (A) > EPSILON)
>> #define FPle(A,B)   ((A) - (B) <= EPSILON)
>> #define FPgt(A,B)   ((A) - (B) > EPSILON)
>> #define FPge(A,B)   ((B) - (A) <= EPSILON)
>
> with this warning:
>
>>  *XXX These routines were not written by a numerical analyst.

I agree that those macros looks like a pile of suck.  It's unclear to
me what purpose they're trying to accomplish, but regardless of what
it is, it's hard for me to believe that they are accomplishing it.
Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on
the scale of the number; e.g. if the values are all like "3" it's
probably fine but if they are like "37142816124856" it's probably not
enough fuzz and if they are all like ".0004" it's probably way too
much fuzz.

Figuring out what to do about it is harder.  Your proposal seems to be
to remove them except where we need the fuzzy behavior, which doesn't
sound unreasonable, but I don't personally understand why we need it
in some places and not others.  It would be good if some of the people
who are more numerically inclined than I am (and hate floats less, but
then that's everyone) could jump in here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

41 matches

Mail list logo