Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> Got it, but if other people don't agree then this is going nowhere. Yes. As I wrote, I don't particularly care about functions like "is point on line". I can prepare a patch to fix as many problems as possible around those operators by preserving the current epsilon. I though we were arguing about *all* operators. Having containment and placement operators consistent with each other, is the primary thing I am trying to fix. Is removing epsilon from them is acceptable? We can also stay away from changing operators like "~=" to minimise compatibility break, if we keep the epsilon on some places. We can instead document this operator as "close enough", and introduce another symbol for really "the same" operator. That said, there are some places where it is hard to decide to apply the epsilon or not. For example, we can keep the epsilon to check for two lines being parallel, but then should we return the intersection point, or not? Those issues may become more clear when I start working on it, if preserving epsilon for those operators is the way to go forward. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Tue, Jan 31, 2017 at 1:06 PM, Emre Hasegeliwrote: > This is what he wrote: > >> As I understand it, the key problem is that tests like "is point on line" >> would basically never succeed except in the most trivial cases, because of >> roundoff error. That's not very nice, and it might cascade to larger >> problems like object-containment tests failing unexpectedly. We would >> need to go through all the geometric operations and figure out where that >> kind of gotcha is significant and what we can do about it. Seems like a >> fair amount of work :-(. If somebody's willing to do that kind of >> investigation, then sure, but I don't think just blindly removing these >> macros is going to lead to anything good. > > I re-implemented "is point on line" operator on my patch so that it > would, at least, work on the most trivial cases. Right. So you and he do not agree on the goal. He wants it to work in MORE than the most trivial cases, and you wrote it so that it will work in AT LEAST the most trivial cases. Those are not the same. > I listed the problems I am trying to solve in here: > > https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com Those problems boil down to "let's get rid of the tolerance completely", and there's no consensus on that being a good thing to do. In particular, I can't remember anybody but you being unequivocally in favor of it. > We couldn't agree on how we should treat on floating point numbers. I > think we should threat them just like the "float" datatype. Got it, but if other people don't agree then this is going nowhere. Personally, I've got my doubts about how sane it is to make anything work here with the built-in floating point types. The CPU is going to do some kind of black magic, which may differ between machines, and good luck getting consistent semantics that work everywhere. For example it seems entirely plausible (with some implementation) that you might find that (1, 2) is on the line from (0, 0) to (2, 4) but that (1, 3) is not on the line from (0, 0) to (2, 6) because the slope 1/2 can be represented exactly in base 2 and the slope 1/3 cannot. If you try to fix that kind of thing by introducing a tolerance, then you might get it wrong when the line goes from (0, 0) to (10, 21). Now, if we represented values using home-grown arbitrary-precision arithmetic - like numeric does - then it might be possible to get such cases right, especially if you supported exact rather than approximate representations of fractional quantities. Otherwise, I think there are inevitably going to be artifacts, and tinkering with this is just changing which artifacts we get without really solving the problem. But I am notorious for my low opinion of floating-point arithmetic; maybe somebody who understands it better or likes it more can come up with something that's more clearly an improvement. Regardless, we have to have an agreement in order to commit anything here, and I don't see one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> Backing up a bit here, have we lost track of the problem that we're > trying to solve? Tom gave his opinion here: > > https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us > > But I don't see that the latest patch I can find does anything to fix > that. This is what he wrote: > As I understand it, the key problem is that tests like "is point on line" > would basically never succeed except in the most trivial cases, because of > roundoff error. That's not very nice, and it might cascade to larger > problems like object-containment tests failing unexpectedly. We would > need to go through all the geometric operations and figure out where that > kind of gotcha is significant and what we can do about it. Seems like a > fair amount of work :-(. If somebody's willing to do that kind of > investigation, then sure, but I don't think just blindly removing these > macros is going to lead to anything good. I re-implemented "is point on line" operator on my patch so that it would, at least, work on the most trivial cases. This is not very nice, but it is predictable. I have tried to prevent it cascade to larger problems like object-containment tests failing unexpectedly. I have gone through all of the geometric operations and re-implemented the ones with similar problems, too. It is no work at all compared to discussions we have to have :-). My initial idea was to keep the fuzzy behaviour for some operators like "is point on line", but the more I get into it the less value I see in doing so. The same family operators like "is point on line" is currently badly broken: > postgres=# select '(1,0)'::point ## '{1,2,0}'::line; > ?column? > -- > (2,2) > (1 row) This makes me wonder if anybody is ever using those operators. In the end, I don't care about those operators. They are unlikely to be supported by indexes. I can simplify my patch to leave them as they are. > Now maybe that's not the problem that Emre is trying to solve, > but then it is not very clear to me what problem he IS trying to > solve. And I think Kyotaro Horiguchi is trying to solve yet another > problem which is again different. So IMHO the first task here is to > agree on a clear statement of what we'd like to fix, and then, given a > patch, we can judge whether it's fixed. I listed the problems I am trying to solve in here: https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com > Maybe I'm being dumb here and it's clear to you guys what the issues > under discussion are. If so, apologies for that, but the thread has > gotten too theoretical for me and I can't figure out what the > top-level concern is any more. I believe we all agree these macros > are bad, but there seems to be no agreement that I can discern on what > would be better or why. We couldn't agree on how we should treat on floating point numbers. I think we should threat them just like the "float" datatype. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Thu, Jan 26, 2017 at 5:53 AM, Emre Hasegeliwrote: > Though, I know the community is against behaviour changing GUCs. I > will not spend more time on this, before I get positive feedback from > others. As if on cue, let me say that a behavior-changing GUC sounds like a terrible idea to me. It's almost never good when a GUC can cause the same queries to return answers in different sessions, and even worse, it seems like the GUC might have the effect of letting us build indexes that are only valid for the value of the GUC with which they were built. Backing up a bit here, have we lost track of the problem that we're trying to solve? Tom gave his opinion here: https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us But I don't see that the latest patch I can find does anything to fix that. Now maybe that's not the problem that Emre is trying to solve, but then it is not very clear to me what problem he IS trying to solve. And I think Kyotaro Horiguchi is trying to solve yet another problem which is again different. So IMHO the first task here is to agree on a clear statement of what we'd like to fix, and then, given a patch, we can judge whether it's fixed. Maybe I'm being dumb here and it's clear to you guys what the issues under discussion are. If so, apologies for that, but the thread has gotten too theoretical for me and I can't figure out what the top-level concern is any more. I believe we all agree these macros are bad, but there seems to be no agreement that I can discern on what would be better or why. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Thu, Jan 26, 2017 at 9:11 PM, Kyotaro HORIGUCHIwrote: >> Though, I know the community is against behaviour changing GUCs. I >> will not spend more time on this, before I get positive feedback from >> others. > > That's too bad. I'm sorry that I wasn't very helpful.. You make a constructive discussion in a way that you think makes sense by providing feedback, there's nothing bad in that IMO, quite the contrary to be honest. (I am just showing up here to tell that I have moved this patch to CF 2017-03). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, At Thu, 26 Jan 2017 11:53:28 +0100, Emre Hasegeliwrote in > > Even though I'm not sure but I don't see a "natural" (or > > agreeable by many poeple) ordering of geometric types in > > general. Anyway it's quite application (not application program > > but the relationship with the real world) specific. > > We can just define it for point as "ORDER BY point.x, point.y". It's nonsense. Totally for a convenient. Anyone can do so on their own application but PostgreSQL cannot have it as a platform. > > What we should not forget is that PostGIS does the same thing and > > it is widly used (I believe..). This means it not broken at least > > on a certain context. But it is a fact that it also quite > > inconvenient to get performance from, say, indexes. > > I understand from Paul Ramsey's email [1] on this thread that PostGIS > doesn't currently have a tolerance. Thank you for the pointer. (My memory is too small as 8bit CPU) Looking into the source of PostGIS 2.3.0 (Maybe the latest) Surely EPSILON is used is someplaces. FPeq and similars are also defined. liblwgeom_internal.h even defines another tolerance EPSILON_SQLMM.. Paul> The real answer for a GIS system is to have an explicit tolerance Paul> parameter for calculations like distance/touching/containment, but Paul> unfortunately we didn't do that so now we have our own Paul> compatibility/boil the ocean problem if we ever wanted/were funded to Paul> add one. This doesn't seem saying PostGIS doesn't have fixed-amount tolerance. Alhtough we don't necessarily need compatibility with PostGIS, it makes the problem rather complex since we lose an apparent start point for this. It would be good if we make geom comparators to have explicit tolerances as Paul said but maybe it is overdone. So I proposed the varialbe implicit tolerance. > > Yeah, the EPSILON is mere a fuzz factor so we cannot expect any > > kind of stable behavior for differences under the level. But on > > the analogy of comparisons of floating point numbers, I suppose > > that inequality comparison could be done without the tolerance. > > What do you mean exactly? Sorry for poor wording. I'll try in different way. It means comparison on numbers that contains certain amount of error gives unstable result for differences small enough comparing their precision. > >> >> - Some operators are violating commutative property. > >> >> > >> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| > >> >> line_a". > >> > > >> > Whether EPSILON is introduced or not, commutativity cannot be > >> > assured for it from calculation error, I suppose. > >> > >> It can easily be assured by treating both sides of the operator the > >> same. It is actually assured on my patch. > > > > It surely holds for certain cases. Even how many applicable cases > > we guess, finally we cannot proof that it works generally. Just > > three times of 1/3 rotation breakes it. > > It is a different story. We cannot talk about commutative property of > rotation function that we currently don't have, because it would be an > asymmetrical operator. That's wrong. Any shpaes represented by geometric types assumed to get any geometric operatsions such as transision, rotation and others. It is fundamental for geometric types. > The parallel operator is currently marked as commutative. The planner > is free to switch the sides of the operation. Therefore this is not > only a surprise, but a bug. Strictly it is not commutative, but assuming FP error or EPSILON, commutation among them would be acceptable. Having larger tolerance, the defference from commutated expression becomes relatively small for the type's domain. As far as I know, even summation of two floating points is not guaranteed to yield strictly the same result for commutated opeation. But the difference doesn't affect for most cases and programmers make a program so that such differences don't matter in the objective. This is basically the same thing with the case of geo-types. > > Hmm, I have nothing more to say if you don't agree that floating > > point numbers involving any kind of arithmetic is hardly > > deterministic especially not defining its usage. > > The floating point results are not random. There are certain > guarantees. The transitive property of equality is one of them. Our > aim should be making things easier for our users by providing more > guarantees not breaking what is already there. Sorry for repeating the same thing but floating point numbers after getting arithmetic operations must considered that they have fuzziness or error (the same can occur for even just after assigning). They must not be handled as exact numbers. Please study about handling floating point numbers. If such strictness is required and no arithmetic involved, it is proper to store them, say, in a string form. If such a behavior is required but want to
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> Even though I'm not sure but I don't see a "natural" (or > agreeable by many poeple) ordering of geometric types in > general. Anyway it's quite application (not application program > but the relationship with the real world) specific. We can just define it for point as "ORDER BY point.x, point.y". > What we should not forget is that PostGIS does the same thing and > it is widly used (I believe..). This means it not broken at least > on a certain context. But it is a fact that it also quite > inconvenient to get performance from, say, indexes. I understand from Paul Ramsey's email [1] on this thread that PostGIS doesn't currently have a tolerance. > Yeah, the EPSILON is mere a fuzz factor so we cannot expect any > kind of stable behavior for differences under the level. But on > the analogy of comparisons of floating point numbers, I suppose > that inequality comparison could be done without the tolerance. What do you mean exactly? >> >> - Some operators are violating commutative property. >> >> >> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a". >> > >> > Whether EPSILON is introduced or not, commutativity cannot be >> > assured for it from calculation error, I suppose. >> >> It can easily be assured by treating both sides of the operator the >> same. It is actually assured on my patch. > > It surely holds for certain cases. Even how many applicable cases > we guess, finally we cannot proof that it works generally. Just > three times of 1/3 rotation breakes it. It is a different story. We cannot talk about commutative property of rotation function that we currently don't have, because it would be an asymmetrical operator. The parallel operator is currently marked as commutative. The planner is free to switch the sides of the operation. Therefore this is not only a surprise, but a bug. > Hmm, I have nothing more to say if you don't agree that floating > point numbers involving any kind of arithmetic is hardly > deterministic especially not defining its usage. The floating point results are not random. There are certain guarantees. The transitive property of equality is one of them. Our aim should be making things easier for our users by providing more guarantees not breaking what is already there. > The world of the geometric types in PostgreSQL *is* built > so. There is nothing different with that Windows client can make > different results from PostgreSQL on a Linux server for a simple > floating point arithmetics, or even different binaries made by > different version of compilers on the same platoform can. Relying > on such coherency by accident is a bad policy. Yes, the results are not portable. We should only rely on the results being stable on the same build. The epsilon doesn't cure this problem. It arguably makes it worse. > PostGIS or libgeos seems to prove it. They are designed exactly > for this purpose and actually used. Yes, PostGIS is a GIS application. We are not. Geometric types name suggests to me them being useful for general purpose. > So, the union of the two requirements seems to be having such > parameter as a GUC. That sounds doable to me. We can use this opportunity to make all operators consistent. So the epsilon would apply to the ones that it current is not. We can still add btree and hash opclasses, and make them give an error when this GUC is not 0. We can even make this or another GUC apply to floats making whole system more consistent. Though, I know the community is against behaviour changing GUCs. I will not spend more time on this, before I get positive feedback from others. [1] https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, Emre. At Wed, 25 Jan 2017 12:24:18 +0100, Emre Hasegeliwrote in > I am responding both of your emails together. > > > Perhaps I don't understand it. Many opclass are defined for > > btree. But since (ordinary) btree handles only one-dimentional, > > totally-orderable sets, geometric types even point don't fit > > it. Even if we relaxed it by removing EPSILON, multidimentional > > data still doesn't fit. > > Yes, multidimentional data doesn't fit into btree. Let's say we would > have to introduce operators called *<, *<=, *>, *>= to support btree > opclass for point. I agree those operators would be useless, but > btree opclass is used for other purposes too. "ORDER BY point", > merge-joins, btree index support for ~= would be useful. Btree cannot be used unless any kind of total-orderedness is defined but hash index would be usable so. > Lack of ORDER BY, GROUP BY, and DISTINCT support is the major > inconvenience about the geometric types. There are many complaints > about this on the mailing list. Many frameworks and ORMs are not > prepared to deal with getting an error when they use certain types on > those clauses. Even though I'm not sure but I don't see a "natural" (or agreeable by many poeple) ordering of geometric types in general. Anyway it's quite application (not application program but the relationship with the real world) specific. > >> - Only some operators are using the epsilon. > >> > >> I included the list of the ones which doesn't use the epsilon on > >> initial post of this thread. This makes the operators not compatible > >> with each other. For example, you cannot say "if box_a @> box_b and > >> box_b @> point then box_a @> box_b". > > > > (It must be "then box_a @> point".) > > Yes, I am sorry. > > > Both of the operators don't > > seem to use EPSILON so the transitivity holds, but perhaps we > > should change them to more appropriate shape, that is, to use > > EPSILON. Even having the tolerance, we can define the operators > > to keep the transitivity. > > Yes, that would be another way. In my view, this would make things > worse, because I believe epsilon approach is completely broken. The > operators which are not using the epsilon are the only ones people can > sanely use to develop applications. What we should not forget is that PostGIS does the same thing and it is widly used (I believe..). This means it not broken at least on a certain context. But it is a fact that it also quite inconvenient to get performance from, say, indexes. > >> > - The application of epsilon is implementation dependent. > >> > > >> > Epsilon works between two numbers. There is not always a single way > >> > of implementing geometric operators. This is surprising to the users. > >> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= > >> > epsilon". > >> > >> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then > >> ..". I'm not sure off-hand. But I don't see the relationship is > >> significant. > > What I meant was the behaviour being unclear to even people who knows > about the epsilon. If two boxes are overlapping with each other, one > who knows about EPSILON can expect the distance between them to be > less than EPSILON, but this wouldn't be true. Yeah, the EPSILON is mere a fuzz factor so we cannot expect any kind of stable behavior for differences under the level. But on the analogy of comparisons of floating point numbers, I suppose that inequality comparison could be done without the tolerance. > >> - Some operators are violating commutative property. > >> > >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a". > > > > Whether EPSILON is introduced or not, commutativity cannot be > > assured for it from calculation error, I suppose. > > It can easily be assured by treating both sides of the operator the > same. It is actually assured on my patch. It surely holds for certain cases. Even how many applicable cases we guess, finally we cannot proof that it works generally. Just three times of 1/3 rotation breakes it. > >> - Some operators are violating transitive property. > >> > >> For example, you cannot say "if point_a ~= point_b and point_b ~= > >> point_c then point_a ~= point_c". > > > > It holds only in ideal (or mathematical) world, or for similars > > of integer (which are strictly implemented mathematica world on > > computer). Without the EPSILON, two points hardly match by > > floating point error, as I mentioned. I don't have an evidence > > though (sorry), mere equality among three floating point numbers > > is not assured. > > Of course, it is assured. Floating point numbers are deterministic. Hmm, I have nothing more to say if you don't agree that floating point numbers involving any kind of arithmetic is hardly deterministic especially not defining its usage. > >> - The operators with
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
I am responding both of your emails together. > Perhaps I don't understand it. Many opclass are defined for > btree. But since (ordinary) btree handles only one-dimentional, > totally-orderable sets, geometric types even point don't fit > it. Even if we relaxed it by removing EPSILON, multidimentional > data still doesn't fit. Yes, multidimentional data doesn't fit into btree. Let's say we would have to introduce operators called *<, *<=, *>, *>= to support btree opclass for point. I agree those operators would be useless, but btree opclass is used for other purposes too. "ORDER BY point", merge-joins, btree index support for ~= would be useful. Lack of ORDER BY, GROUP BY, and DISTINCT support is the major inconvenience about the geometric types. There are many complaints about this on the mailing list. Many frameworks and ORMs are not prepared to deal with getting an error when they use certain types on those clauses. >> - Only some operators are using the epsilon. >> >> I included the list of the ones which doesn't use the epsilon on >> initial post of this thread. This makes the operators not compatible >> with each other. For example, you cannot say "if box_a @> box_b and >> box_b @> point then box_a @> box_b". > > (It must be "then box_a @> point".) Yes, I am sorry. > Both of the operators don't > seem to use EPSILON so the transitivity holds, but perhaps we > should change them to more appropriate shape, that is, to use > EPSILON. Even having the tolerance, we can define the operators > to keep the transitivity. Yes, that would be another way. In my view, this would make things worse, because I believe epsilon approach is completely broken. The operators which are not using the epsilon are the only ones people can sanely use to develop applications. >> > - The application of epsilon is implementation dependent. >> > >> > Epsilon works between two numbers. There is not always a single way >> > of implementing geometric operators. This is surprising to the users. >> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= >> > epsilon". >> >> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then >> ..". I'm not sure off-hand. But I don't see the relationship is >> significant. What I meant was the behaviour being unclear to even people who knows about the epsilon. If two boxes are overlapping with each other, one who knows about EPSILON can expect the distance between them to be less than EPSILON, but this wouldn't be true. >> - Some operators are violating commutative property. >> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a". > > Whether EPSILON is introduced or not, commutativity cannot be > assured for it from calculation error, I suppose. It can easily be assured by treating both sides of the operator the same. It is actually assured on my patch. >> - Some operators are violating transitive property. >> >> For example, you cannot say "if point_a ~= point_b and point_b ~= >> point_c then point_a ~= point_c". > > It holds only in ideal (or mathematical) world, or for similars > of integer (which are strictly implemented mathematica world on > computer). Without the EPSILON, two points hardly match by > floating point error, as I mentioned. I don't have an evidence > though (sorry), mere equality among three floating point numbers > is not assured. Of course, it is assured. Floating point numbers are deterministic. >> - The operators with epsilon are not compatible with float operators. >> >> This is also surprising for the users. For example, you cannot say >> "if point_a << point_b then point_a[0] < point_b[0]". > > It is because "is strictly left of" just doesn't mean their > x-coordinates are in such a relationship. The difference might be > surprising but is a specification (truely not a bug:). Then what does that mean? Every operator with EPSILON is currently surprising in different way. People use databases to build applications. They often need to implement same operations on the application side. It is very hard to be bug-compatible with our geometric types. >> = Missing Bits on the Operator Classes > This final section seems to mention the application but sorry, it > still don't describe for me what application that this patch > aiming to enable. But I can understand that you want to handle > floating point numbers like integers. The epsilon is surely quite > annoying for the purpose. I will try to itemise what applications I am aiming to enable: - Applications with very little GIS requirement PostGIS can be too much work for an application in s different business domain but has a little requirement to GIS. The built-in geometric types are incomplete to support real world geography. Getting rid of epsilon would make this worse. In contrary, it would allow people to deal with the difficulties on their own. - Applications with geometric objects I believe people could make use the geometric
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Sorry, this is the continuation of the previous comment. At Wed, 18 Jan 2017 17:36:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20170118.173612.13506164.horiguchi.kyot...@lab.ntt.co.jp> > Hello. > > (I apologize in advance for possible inaccurate wording on maths, > which might cause confusion..) > > At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeli wrote > in > > > - Floating point comparisons for gemetric types > > > > > > Comparison related to geometric types is performed by FPeq > > > macro and similars defined in geo_decls.h. This intends to give > > > tolerance to the comparisons. > > > > > > A > > > FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance) > > > FPne: ->| e | e |<- (<- means exclusive) > > > FPlt: | e |<- > > > FPle: |<=e | > > > FPgt: ->| e | > > > FPge: | e=>| > > > > > > These seems reasonable ignoring the tolerance amount issue. > > > > I tried to explain below why I don't find this method reasonable. > > Thank you very much for the explanation. > > > > At least for the point type, (bitmap) index scan is consistent > > > with sequential scan. I remember that the issue was > > > "inconsistency between indexed and non-indexed scans over > > > geometric types" but they seem consistent with each other. > > > > You can see on the Git history that it was a lot of trouble to make > > the indexes consistent with the operators, and this is limiting > > improving indexing of the geometric types. > > Yeah. geo_ops.c has the following comment from its first version > in the current repository. > > | * Relational operators for Points. > | * Since we do have a sense of coordinates being > | * "equal" to a given accuracy (point_vert, point_horiz), > > So, we take it as a requirement for geometric types. All the > difficulties come from the "nature". > > > > You mentioned b-tree, which don't have predefined opclass for > > > geometric types. So the "inconsistency" may be mentioning the > > > difference between geometric values and combinations of plain > > > (first-class) values. But the two are different things and > > > apparently using different operators (~= and = for equality) so > > > the issue is not fit for this. More specifically, "p ~= p0" and > > > "x = x0 and y = y0" are completely different. > > > > Yes, the default btree operator class doesn't have to implement > > standard equality and basic comparison operator symbols. We can > > implement it with different symbols. > > Perhaps I don't understand it. Many opclass are defined for > btree. But since (ordinary) btree handles only one-dimentional, > totally-orderable sets, geometric types even point don't fit > it. Even if we relaxed it by removing EPSILON, multidimentional > data still doesn't fit. > > Still we can implement equality mechanism *over* multiple btree > indexes, but it cannot be a opclass for btree. > > > > Could you let me (or any other guys on this ml) have more precise > > > description on the problem and/or what you want to do with this > > > patch? > > > > I will try to itemize the current problems with the epsilon method: > > > > = Operator Inconsistencies > > > > My current patches address all of them. > > No. It just removes tolerans that is a "requirement" for > coordinates of geometric types. I think we don't have enough > reason to remove it. We could newly define geometric types with > no-tolerance as 'exact_point" or something but it still doesn't > fit btree. > > > - Only some operators are using the epsilon. > > > > I included the list of the ones which doesn't use the epsilon on > > initial post of this thread. This makes the operators not compatible > > with each other. For example, you cannot say "if box_a @> box_b and > > box_b @> point then box_a @> box_b". > > (It must be "then box_a @> point".) Both of the operators don't > seem to use EPSILON so the transitivity holds, but perhaps we > should change them to more appropriate shape, that is, to use > EPSILON. Even having the tolerance, we can define the operators > to keep the transitivity. > > > - The application of epsilon is implementation dependent. > > > > Epsilon works between two numbers. There is not always a single way > > of implementing geometric operators. This is surprising to the users. > > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= > > epsilon". > > Maybe it should be like "if box_a ~= box_b && box_b @< box_a then > ..". I'm not sure off-hand. But I don't see the relationship is > significant. > > > This is also creating a high barrier on developing those operators. > > Fixing bugs and performance regressions are very likely to change the > > results. > > I agree to you. The effect of the EPSILON is quite arbitrary and > difficult
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello. (I apologize in advance for possible inaccurate wording on maths, which might cause confusion..) At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeliwrote in > > - Floating point comparisons for gemetric types > > > > Comparison related to geometric types is performed by FPeq > > macro and similars defined in geo_decls.h. This intends to give > > tolerance to the comparisons. > > > > A > > FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance) > > FPne: ->| e | e |<- (<- means exclusive) > > FPlt: | e |<- > > FPle: |<=e | > > FPgt: ->| e | > > FPge: | e=>| > > > > These seems reasonable ignoring the tolerance amount issue. > > I tried to explain below why I don't find this method reasonable. Thank you very much for the explanation. > > At least for the point type, (bitmap) index scan is consistent > > with sequential scan. I remember that the issue was > > "inconsistency between indexed and non-indexed scans over > > geometric types" but they seem consistent with each other. > > You can see on the Git history that it was a lot of trouble to make > the indexes consistent with the operators, and this is limiting > improving indexing of the geometric types. Yeah. geo_ops.c has the following comment from its first version in the current repository. | * Relational operators for Points. | * Since we do have a sense of coordinates being | * "equal" to a given accuracy (point_vert, point_horiz), So, we take it as a requirement for geometric types. All the difficulties come from the "nature". > > You mentioned b-tree, which don't have predefined opclass for > > geometric types. So the "inconsistency" may be mentioning the > > difference between geometric values and combinations of plain > > (first-class) values. But the two are different things and > > apparently using different operators (~= and = for equality) so > > the issue is not fit for this. More specifically, "p ~= p0" and > > "x = x0 and y = y0" are completely different. > > Yes, the default btree operator class doesn't have to implement > standard equality and basic comparison operator symbols. We can > implement it with different symbols. Perhaps I don't understand it. Many opclass are defined for btree. But since (ordinary) btree handles only one-dimentional, totally-orderable sets, geometric types even point don't fit it. Even if we relaxed it by removing EPSILON, multidimentional data still doesn't fit. Still we can implement equality mechanism *over* multiple btree indexes, but it cannot be a opclass for btree. > > Could you let me (or any other guys on this ml) have more precise > > description on the problem and/or what you want to do with this > > patch? > > I will try to itemize the current problems with the epsilon method: > > = Operator Inconsistencies > > My current patches address all of them. No. It just removes tolerans that is a "requirement" for coordinates of geometric types. I think we don't have enough reason to remove it. We could newly define geometric types with no-tolerance as 'exact_point" or something but it still doesn't fit btree. > - Only some operators are using the epsilon. > > I included the list of the ones which doesn't use the epsilon on > initial post of this thread. This makes the operators not compatible > with each other. For example, you cannot say "if box_a @> box_b and > box_b @> point then box_a @> box_b". (It must be "then box_a @> point".) Both of the operators don't seem to use EPSILON so the transitivity holds, but perhaps we should change them to more appropriate shape, that is, to use EPSILON. Even having the tolerance, we can define the operators to keep the transitivity. > - The application of epsilon is implementation dependent. > > Epsilon works between two numbers. There is not always a single way > of implementing geometric operators. This is surprising to the users. > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= > epsilon". Maybe it should be like "if box_a ~= box_b && box_b @< box_a then ..". I'm not sure off-hand. But I don't see the relationship is significant. > This is also creating a high barrier on developing those operators. > Fixing bugs and performance regressions are very likely to change the > results. I agree to you. The effect of the EPSILON is quite arbitrary and difficult to see their chemistry. > - Some operators are just wrong: > > Line operators are not well maintained. The following are giving > wrong results when neither A or B of lines are not exactly 1: > > * line # line > * line <-> line > * point ## line > * point ## lseg These are not comparison operators so EPSILON is unrelated. And I agree that they needs fix. > They are broken independently from the epsilon, though it is not easy > to fix them without
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> - Floating point comparisons for gemetric types > > Comparison related to geometric types is performed by FPeq > macro and similars defined in geo_decls.h. This intends to give > tolerance to the comparisons. > > A > FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance) > FPne: ->| e | e |<- (<- means exclusive) > FPlt: | e |<- > FPle: |<=e | > FPgt: ->| e | > FPge: | e=>| > > These seems reasonable ignoring the tolerance amount issue. I tried to explain below why I don't find this method reasonable. > At least for the point type, (bitmap) index scan is consistent > with sequential scan. I remember that the issue was > "inconsistency between indexed and non-indexed scans over > geometric types" but they seem consistent with each other. You can see on the Git history that it was a lot of trouble to make the indexes consistent with the operators, and this is limiting improving indexing of the geometric types. > You mentioned b-tree, which don't have predefined opclass for > geometric types. So the "inconsistency" may be mentioning the > difference between geometric values and combinations of plain > (first-class) values. But the two are different things and > apparently using different operators (~= and = for equality) so > the issue is not fit for this. More specifically, "p ~= p0" and > "x = x0 and y = y0" are completely different. Yes, the default btree operator class doesn't have to implement standard equality and basic comparison operator symbols. We can implement it with different symbols. > Could you let me (or any other guys on this ml) have more precise > description on the problem and/or what you want to do with this > patch? I will try to itemize the current problems with the epsilon method: = Operator Inconsistencies My current patches address all of them. - Only some operators are using the epsilon. I included the list of the ones which doesn't use the epsilon on initial post of this thread. This makes the operators not compatible with each other. For example, you cannot say "if box_a @> box_b and box_b @> point then box_a @> box_b". - The application of epsilon is implementation dependent. Epsilon works between two numbers. There is not always a single way of implementing geometric operators. This is surprising to the users. For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= epsilon". This is also creating a high barrier on developing those operators. Fixing bugs and performance regressions are very likely to change the results. - Some operators are just wrong: Line operators are not well maintained. The following are giving wrong results when neither A or B of lines are not exactly 1: * line # line * line <-> line * point ## line * point ## lseg They are broken independently from the epsilon, though it is not easy to fix them without changing the results of the cases that are currently working. - Some operators are violating commutative property. For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a". - Some operators are violating transitive property. For example, you cannot say "if point_a ~= point_b and point_b ~= point_c then point_a ~= point_c". - The operators with epsilon are not compatible with float operators. This is also surprising for the users. For example, you cannot say "if point_a << point_b then point_a[0] < point_b[0]". - The operators are not checking for underflow or overflow. - NaN values are not treated same as the float datatype. Those are independent problems, though checking them with epsilon would create additional set of inconsistencies. Epsilon creates especially more problems around 0. = Missing Bits on the Operator Classes I want to address those in the future, if my patches are accepted. - There is no GiST or SP-GiST support for cross datatype operators other than containment. We can develop nice cross-datatype operator families to support more operators. It would have been easier, if we could rely on some operators to implement others. Geometric types serve the purpose of demonstrating our indexing capabilities. We should make them good examples, not full of special cases with hidden bugs. - There is no BRIN support for types other than the box. BRIN inclusion operator class is datatype agnostic. It uses some operators to implement others which doesn't work for anything more the box vs box operators, because of the inconsistencies. - GROUP BY and DISTINCT are not working. - ORDER BY is not working. There are regular user complaints about this on the mailing list. We can easily provide default hash and btree operator classes that would fix the problem, if we haven't had the epsilon. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, At Fri, 18 Nov 2016 10:58:27 +0100, Emre Hasegeliwrote in > > To keep such kind of integrity, we should deeply consider about > > errors. > > My point is that I don't think we can keep integrity together with the > fuzzy behaviour, or at least I don't have the skills to do that. I > can just leave the more complicated operators like "is a > point on a line" as it is, and only change the basic ones. Do you > think a smaller patch like this would be acceptable? The size of the patch is not a problem. I regret that I haven't made your requirement clear. So as the startpoint of the new discussion, I briefly summarize the current implement of geometric comparisons. - Floating point comparisons for gemetric types Comparison related to geometric types is performed by FPeq macro and similars defined in geo_decls.h. This intends to give tolerance to the comparisons. A FPeq: |<=e-|-e=>|(<= means inclusive, e = epsilon = tolerance) FPne: ->| e | e |<- (<- means exclusive) FPlt: | e |<- FPle: |<=e | FPgt: ->| e | FPge: | e=>| These seems reasonable ignoring the tolerance amount issue. - Consistency between index and non-index scans. GIST supports geometric types. =# create table tpnt1(id int, p point); =# insert into tpnt1 (select i + 200, point(i*1.0e-6 / 100.0, i * 1.0e-6 / 100.0) from generate_series(-200, 200) as i); =# create index on tpnt1 using gist (p); =# set enable_seqscan to false; =# set enable_bitmapscan to true; =# select count(*) from tpnt1 where p ~= point(0, 0); 201 =# select count(*) from tpnt1 where p << point(0, 0); 100 =# set enable_seqscan to true; =# set enable_bitmapscan to false; =# select count(*) from tpnt1 where p ~= point(0, 0); 201 =# select count(*) from tpnt1 where p << point(0, 0); 100 At least for the point type, (bitmap) index scan is consistent with sequential scan. I remember that the issue was "inconsistency between indexed and non-indexed scans over geometric types" but they seem consistent with each other. You mentioned b-tree, which don't have predefined opclass for geometric types. So the "inconsistency" may be mentioning the difference between geometric values and combinations of plain (first-class) values. But the two are different things and apparently using different operators (~= and = for equality) so the issue is not fit for this. More specifically, "p ~= p0" and "x = x0 and y = y0" are completely different. Could you let me (or any other guys on this ml) have more precise description on the problem and/or what you want to do with this patch? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, # Sorry for a trash I emitted before.. Perhaps you don't assume any calculations applied on stored geo-type values. Please be patient to disucuss with me. (Sorry for perhas hard-to-read English..) At Fri, 18 Nov 2016 10:58:27 +0100, Emre Hasegeliwrote in > > To keep such kind of integrity, we should deeply consider about > > errors. > > My point is that I don't think we can keep integrity together with the > fuzzy behaviour, or at least I don't have the skills to do that. I > can just leave the more complicated operators like "is a > point on a line" as it is, and only change the basic ones. Do you > think a smaller patch like this would be acceptable? The size of the patch is not a matter. It broken the current behavior is a matter. It shows strange behavior for some cases but it designed to work reasonable for some cases. > > That's a fate of FP numbers. Btree is uasble for such data since > > it is constructed using inequality operators but hash is almost > > unsuitable without rounding and normalization because of the > > nature of floating point numbers. Range scan on bree will work > > find but on the other hand equality doesn't work even on btree > > index from the same reason to the below. > > Btree is very useful with floats. There is no problem with it. I don't mind btrees on bare floats. It will 'work' if error is properly considered, or error can be omitted for the purpose. Point has some kind of physical context or expected behavior other than that of bare floats which the EPSILON implies. > > Again, FP numbers generally cannot match exactly each other. You > > have to avoid that. > > Again, they do match very well. Floating point numbers are no magic. > They are perfectly deterministic. No magic, yes, so it is always accompanied by error. If you store points, then searching for *any of them*, it works fine. It's no magic. If you obtained a ponit from any physical data source, say GPS receiver, then search identical points from a table, it would be almost impossible to find a "exactly the same" location. It's no magic. Once a calculation has been applied on them, it is no longer guaranteed to work in the same way. It's also no magic. CREATE OR REPLACE FUNCTION rotate_point (integer, float8) RETURNS void AS $$ UPDATE p SET x = x * cos($2) - y * sin($2), y = x * sin($2) + y * cos($2) WHERE i = $1 $$ LANGUAGE SQL; ALTER TABLE p (i int, x float8, y float8); INSERT INTO p VALUES (0, 1.0, 1.0), (1, 1.0, 1.0); SELECT rotate_point(0, pi() * 2.0 / 3.0); SELECT rotate_point(0, pi() * 2.0 / 3.0); SELECT rotate_point(0, pi() * 2.0 / 3.0); SELECT * FROM p WHERE i = 0; i | x | y ---+---+--- 0 | 1 | 0.999 So SELECT p0.x = p1.x AND p0.y = p1.y FROM p p0, p p1 WHERE p0.i = 0 and p1.i = 1; ?column? -- f (1 row) This just repeats 1/3 rotation 3 times. Ideally p0 and p1 are identical but this primitive operation generates visible error that makes the comparison fail. Even though these two points are expected be identical in some geometric viewpoint. This is a part of the meaning of the EPSILON. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> To keep such kind of integrity, we should deeply consider about > errors. My point is that I don't think we can keep integrity together with the fuzzy behaviour, or at least I don't have the skills to do that. I can just leave the more complicated operators like "is a point on a line" as it is, and only change the basic ones. Do you think a smaller patch like this would be acceptable? > That's a fate of FP numbers. Btree is uasble for such data since > it is constructed using inequality operators but hash is almost > unsuitable without rounding and normalization because of the > nature of floating point numbers. Range scan on bree will work > find but on the other hand equality doesn't work even on btree > index from the same reason to the below. Btree is very useful with floats. There is no problem with it. > Again, FP numbers generally cannot match exactly each other. You > have to avoid that. Again, they do match very well. Floating point numbers are no magic. They are perfectly deterministic. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, At Mon, 14 Nov 2016 11:41:52 +0100, Emre Hasegeliwrote in > > What way to deal with it is in your mind? The problem hides > > behind operators. To fix it a user should rewrite a expression > > using more primitive operators. For example, (line_a # point_a) > > should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in > > more primitive way. I regared this that the operator # just > > become useless. > > Simple equations like this works well with the current algorithm: > > > select '(0.1,0.1)'::point @ '(0,0),(1,1)'::line; That's too simple for this discussion. If it satisfies someone's requirement, a geometiric type with integer would go. Any trigonometric calculations (by other than right angles) would break it. > The operator does what you expect from it. Users can use something > like you described to get fuzzy behaviour with an epsilon they choose. > > > Regarding optimization, at least gcc generates seemingly not so > > different code for the two. The both generally generates extended > > code directly calling isnan() and so. Have you measured the > > performance of the two implement (with -O2, without > > --enable-cassert)? This kind of hand-optimization gets > > legitimacy when we see a siginificant difference, according to > > the convention here.. I suppose. > > I tested it with this program: > > > int main() > > { > >double i, > >j; > >int result = 0; > > > >for (i = 0.1; i < 1.0; i += 1.0) > >for (j = 0.1; j < 1.0; j += 1.0) > >if (float8_lt(i, j)) > >result = (result + 1) % 10; > > > >return result; > > } > > The one calling cmp() was noticeable slower. > > ./test1 0.74s user 0.00s system 99% cpu 0.748 total > ./test2 0.89s user 0.00s system 99% cpu 0.897 total > > This would probably be not much noticeable by calling SQL functions > which are doing a few comparisons only, but it may be necessary to do > many more comparisons on some places. I don't find the optimised > versions less clear than calling the cmp(). I can change it the other > way, if you find it more clear. Thanks for the measurment. I had similar numbers (14:13) for separate code from Pg, but probably this difference is buried under other steps. The law of stay-stupid says that this kind of optimization should be consider only after it found to be problematic, I think. > > At least the comment you dropped by the patch, > > > > int > > float4_cmp_internal(float4 a, float4 b) > > { > > - /* > > -* We consider all NANs to be equal and larger than any non-NAN. > > This is > > -* somewhat arbitrary; the important thing is to have a consistent > > sort > > -* order. > > -*/ > > > > seems very significant and should be kept anywhere relevant. > > I will add it back on the next version. > > > I seached pgsql-general ML but counldn't find a complaint about > > the current behavior. Even though I'm not familar with PostGIS, I > > found that it uses exactly the same EPSILON method with > > PostgreSQL. > > Is it? I understood from Paul Ramsey's comment on this thread [1] > that they don't. I saw the repository of PostGIS by myself, but only in box2d.c. Other objects have their own EPSILONs, EPSILOG_SQLMM, DBL_EPSLON, FLT_EPSILON... Maybe that's all | * Copyright (C) 2008-2011 Paul Ramsey | * Copyright (C) 2008 Mark Cave-Ayland | | #ifndef EPSILON | #define EPSILON1.0E-06 | #endif | #ifndef FPeq | #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON) | #endif I agree that we should have our own tolerance method, and it migh be an explicit one. But anyway we need any kind of tolerance on comparing FP numbers. Removing such tolerance as preparation for implementing another one is quite natural, but released version should have any kind of tolerance. > > If we had an apparent plan to use them for other than > > earth-scale(?) geometric usage, we could design what they should > > be. But without such a plan it is just a breakage of the current > > usage. > > We give no promises about the geometric types being useful in earth scale. So, we shouldn't just remove it. > > About What kind of (needless) complication you are saying? The > > fuzziness seems to me essential for geometric comparisons to work > > practically. Addition to that, I don't think that we're not > > allowed to change the behavior in such area of released versions > > the time after time. > > Even when it is a total mess? Yes. The seemingly 'total-mess' is intended for a use with lon/lat values and probably gave useful result at the time of development. The mess is rather better than just ruined. > > I don't think index scan and tolerant comparison are not > > contradicting. Could you let me have an description about the > > indexing capabilities and the
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> What way to deal with it is in your mind? The problem hides > behind operators. To fix it a user should rewrite a expression > using more primitive operators. For example, (line_a # point_a) > should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in > more primitive way. I regared this that the operator # just > become useless. Simple equations like this works well with the current algorithm: > select '(0.1,0.1)'::point @ '(0,0),(1,1)'::line; The operator does what you expect from it. Users can use something like you described to get fuzzy behaviour with an epsilon they choose. > Regarding optimization, at least gcc generates seemingly not so > different code for the two. The both generally generates extended > code directly calling isnan() and so. Have you measured the > performance of the two implement (with -O2, without > --enable-cassert)? This kind of hand-optimization gets > legitimacy when we see a siginificant difference, according to > the convention here.. I suppose. I tested it with this program: > int main() > { >double i, >j; >int result = 0; > >for (i = 0.1; i < 1.0; i += 1.0) >for (j = 0.1; j < 1.0; j += 1.0) >if (float8_lt(i, j)) >result = (result + 1) % 10; > >return result; > } The one calling cmp() was noticeable slower. ./test1 0.74s user 0.00s system 99% cpu 0.748 total ./test2 0.89s user 0.00s system 99% cpu 0.897 total This would probably be not much noticeable by calling SQL functions which are doing a few comparisons only, but it may be necessary to do many more comparisons on some places. I don't find the optimised versions less clear than calling the cmp(). I can change it the other way, if you find it more clear. > At least the comment you dropped by the patch, > > int > float4_cmp_internal(float4 a, float4 b) > { > - /* > -* We consider all NANs to be equal and larger than any non-NAN. This > is > -* somewhat arbitrary; the important thing is to have a consistent > sort > -* order. > -*/ > > seems very significant and should be kept anywhere relevant. I will add it back on the next version. > I seached pgsql-general ML but counldn't find a complaint about > the current behavior. Even though I'm not familar with PostGIS, I > found that it uses exactly the same EPSILON method with > PostgreSQL. Is it? I understood from Paul Ramsey's comment on this thread [1] that they don't. > If we had an apparent plan to use them for other than > earth-scale(?) geometric usage, we could design what they should > be. But without such a plan it is just a breakage of the current > usage. We give no promises about the geometric types being useful in earth scale. > About What kind of (needless) complication you are saying? The > fuzziness seems to me essential for geometric comparisons to work > practically. Addition to that, I don't think that we're not > allowed to change the behavior in such area of released versions > the time after time. Even when it is a total mess? > I don't think index scan and tolerant comparison are not > contradicting. Could you let me have an description about the > indexing capabilities and the inconsistencies? The first problem is that some operators are not using the epsilon. This requires special treatment while developing index support for operators. I have tried to support point for BRIN using the box operators, and failed because of that. Comparing differences with epsilon is not applicable the same way to every operator. Even with simple operators like "point in box" it covers different distances outside the box depending on where the point is. For example, "point <-> box < EPSILON" wouldn't be equivalent with "point <@ box", when the point is outside corner of the box. Things get more complicated with lines. Because of these, we are easily violating basic expectations of the operators: > regression=# select '{1000,0.01,0}'::line ?|| '{9,0.9,0}'::line; > > ?column? > -- > f > (1 row) > > regression=# select '{9,0.9,0}'::line ?|| '{1000,0.01,0}'::line; > ?column? > -- > t > (1 row) Another problem is lack of hash and btree operator classes. In my experience, the point datatype is by far most used one. People often try to use it on DISTINCT, GROUP BY, or ORDER BY clauses and complain when it doesn't work. There are many complaints like this on the archives. If we get rid of the epsilon, we can easily add those operator classes. [1] https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, At Sun, 13 Nov 2016 22:41:01 +0100, Emre Hasegeliwrote in > > We can remove the fuzz factor altogether but I think we also > > should provide a means usable to do similar things. At least "is > > a point on a line" might be useless for most cases without any > > fuzzing feature. (Nevertheless, it is a problem only when it is > > being used to do that:) If we don't find reasonable policy on > > fuzzing operations, it would be the proof that we shouldn't > > change the behavior. > > It was my initial idea to keep the fuzzy comparison behaviour on some > places, but the more I get into I realised that it is almost > impossible to get this right. Instead, I re-implemented some > operators to keep precision as much as possible. The previous "is a > point on a line" operator would *never* give the true result without > the fuzzy comparison. The new implementation would return true, when > precision is not lost. There's no way to accurately store numbers other than some special ones in floating point format where mantissa precision is limited. Even 0.1 is not stored accurately with a binary mantissa. (It would be concealed by machine rounding for most cases though.) The same is said for numeric. It cannot store 1/3 accurately and doesn't even conceal the incaccuracy. Even if they could store numbers accurately, anyway , say, the constant pi does not have infinite precision. As the result, we have a tradition that equal operator on real numbers should be avoided generally. Numerics are provided mainly for financial use, on which finite-but-high precision decimal arithmetic are performed. > I think this is a behaviour people, who are > working with floating points, are prepared to deal with. What way to deal with it is in your mind? The problem hides behind operators. To fix it a user should rewrite a expression using more primitive operators. For example, (line_a # point_a) should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in more primitive way. I regared this that the operator # just become useless. > By the way, "is a point on a line" operator is quite wrong with > the fuzzy comparison at the moment [1]. Th bug is a isolate problem from the epsilon behavior. It of course should be fixed, but in "appropriate" way that may defined in this discussion. > > The 0001 patch adds many FP comparison functions individually > > considering NaN. As the result the sort order logic involving NaN > > is scattered around into the functions, then, you implement > > generic comparison function using them. It seems inside-out to > > me. Defining ordering at one place, then comparison using it > > seems to be reasonable. > > I agree that it would be simpler to use the comparison function for > implementing other operators. I have done it other way around to make > them more optimised. They are called very often. I don't think > checking exit code of the comparison function would be optimised the > same way. I could leave the comparison functions as they are, but > re-implemented them using the others to keep documentation of NaN > comparison in the single place. Regarding optimization, at least gcc generates seemingly not so different code for the two. The both generally generates extended code directly calling isnan() and so. Have you measured the performance of the two implement (with -O2, without --enable-cassert)? This kind of hand-optimization gets legitimacy when we see a siginificant difference, according to the convention here.. I suppose. At least the comment you dropped by the patch, int float4_cmp_internal(float4 a, float4 b) { - /* -* We consider all NANs to be equal and larger than any non-NAN. This is -* somewhat arbitrary; the important thing is to have a consistent sort -* order. -*/ seems very significant and should be kept anywhere relevant. > > If the center somehow goes extremely near to the origin, it could > > result in a false error. > > > >> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)'; > >> ERROR: value out of range: underflow > > > > I don't think this underflow is an error, and actually it is a > > change of the current behavior without a reasonable reason. More > > significant (and maybe unacceptable) side-effect is that it > > changes the behavior of ordinary operators. I don't think this is > > acceptable. More consideration is needed. > > > >> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0; > >> ERROR: value out of range: underflow > > This is the current behaviour of float datatype. My patch doesn't > change that. Very sorry for the mistake! I somehow saw float8pl on master and float8_div with your patch. Pleas forget it. > This problem would probably also apply to multiplying > very small values. I agree that this is not the ideal behaviour. > Though I am not sure, if we should go to a
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> We can remove the fuzz factor altogether but I think we also > should provide a means usable to do similar things. At least "is > a point on a line" might be useless for most cases without any > fuzzing feature. (Nevertheless, it is a problem only when it is > being used to do that:) If we don't find reasonable policy on > fuzzing operations, it would be the proof that we shouldn't > change the behavior. It was my initial idea to keep the fuzzy comparison behaviour on some places, but the more I get into I realised that it is almost impossible to get this right. Instead, I re-implemented some operators to keep precision as much as possible. The previous "is a point on a line" operator would *never* give the true result without the fuzzy comparison. The new implementation would return true, when precision is not lost. I think this is a behaviour people, who are working with floating points, are prepared to deal with. By the way, "is a point on a line" operator is quite wrong with the fuzzy comparison at the moment [1]. > The 0001 patch adds many FP comparison functions individually > considering NaN. As the result the sort order logic involving NaN > is scattered around into the functions, then, you implement > generic comparison function using them. It seems inside-out to > me. Defining ordering at one place, then comparison using it > seems to be reasonable. I agree that it would be simpler to use the comparison function for implementing other operators. I have done it other way around to make them more optimised. They are called very often. I don't think checking exit code of the comparison function would be optimised the same way. I could leave the comparison functions as they are, but re-implemented them using the others to keep documentation of NaN comparison in the single place. > If the center somehow goes extremely near to the origin, it could > result in a false error. > >> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)'; >> ERROR: value out of range: underflow > > I don't think this underflow is an error, and actually it is a > change of the current behavior without a reasonable reason. More > significant (and maybe unacceptable) side-effect is that it > changes the behavior of ordinary operators. I don't think this is > acceptable. More consideration is needed. > >> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0; >> ERROR: value out of range: underflow This is the current behaviour of float datatype. My patch doesn't change that. This problem would probably also apply to multiplying very small values. I agree that this is not the ideal behaviour. Though I am not sure, if we should go to a different direction than the float datatypes. I think there is value in making geometric types compatible with the float. Users are going to mix them, anyway. For example, users can calculate the center of a box manually, and confuse when the built-in operator behaves differently. > In regard to fuzzy operations, libgeos seems to have several > types of this kind of feature. (I haven't looked closer into > them). Other than reducing precision seems overkill or > unappliable for PostgreSQL bulitins. As Jim said, can we replace > the fixed scale fuzz factor by precision reduction? Maybe, with a > GUC variable (I hear someone's roaring..) to specify the amount > defaults to fit the current assumption. I am disinclined to try to implement something complicated for the geometric types. I think they are mostly useful for 2 purposes: uses simple enough to not worth looking for better solutions, and demonstrating our indexing capabilities. The inconsistencies harm both of those. [1] https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> The reason for that is that you forgot to edit an alternative > exptect file, which matches for en_US locale. Now I understand. Thank you for the explanation. > But the test doesn't for collation and the only difference > between the alternative expect files comes from the difference of > collation for the query. "the workaround" seems to be the right > way to do it. I recommend rather to leave the workaroud as it is > and remove select_views_1.out from the "expected" directory. I will do. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, > Aside from that, I'd like to comment this patch on other points > later. The start of this patch is that the fact that most of but not all geometric operators use fuzzy comparson. But Tom pointed out that the fixed fuzz factor is not reasonable but hard to find how to modify it. We can remove the fuzz factor altogether but I think we also should provide a means usable to do similar things. At least "is a point on a line" might be useless for most cases without any fuzzing feature. (Nevertheless, it is a problem only when it is being used to do that:) If we don't find reasonable policy on fuzzing operations, it would be the proof that we shouldn't change the behavior. The 0001 patch adds many FP comparison functions individually considering NaN. As the result the sort order logic involving NaN is scattered around into the functions, then, you implement generic comparison function using them. It seems inside-out to me. Defining ordering at one place, then comparison using it seems to be reasonable. The 0002 patch repalces many native operators for floating point numbers by functions having sanity check. This seems to be addressing to Tom's comment. It is reasonable for comparison operators but I don't think replacing all arithmetics is so. For example, float8_div checks that - If both of operands are not INF, result should not be INF. - If the devident is not exactly zero, the result should not be zero. The second assumption is wrong by itself. For an extreme case, 4.9e-324 / 1.7e+308 becomes exactly zero (by underflow). We canont assert it to be wrong but the devedent is not zero. The validity of the result varies according to its meaning. For the case of box_cn, > center->x = float8_div(float8_pl(box->high.x, box->low.x), 2.0); > center->y = float8_div(float8_pl(box->high.y, box->low.y), 2.0); If the center somehow goes extremely near to the origin, it could result in a false error. > =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)'; > ERROR: value out of range: underflow I don't think this underflow is an error, and actually it is a change of the current behavior without a reasonable reason. More significant (and maybe unacceptable) side-effect is that it changes the behavior of ordinary operators. I don't think this is acceptable. More consideration is needed. > =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0; > ERROR: value out of range: underflow In regard to fuzzy operations, libgeos seems to have several types of this kind of feature. (I haven't looked closer into them). Other than reducing precision seems overkill or unappliable for PostgreSQL bulitins. As Jim said, can we replace the fixed scale fuzz factor by precision reduction? Maybe, with a GUC variable (I hear someone's roaring..) to specify the amount defaults to fit the current assumption. https://apt-browse.org/browse/ubuntu/trusty/universe/i386/libgeos++-dev/3.4.2-4ubuntu1/file/usr/include/geos/geom/BinaryOp.h /* * Define this to use PrecisionReduction policy * in an attempt at by-passing binary operation * robustness problems (handles TopologyExceptions) */ ... * Define this to use TopologyPreserving simplification policy * in an attempt at by-passing binary operation * robustness problems (handles TopologyExceptions) (seems not activated) ... * Use common bits removal policy. * If enabled, this would be tried /before/ * Geometry snapping. ... /* * Use snapping policy */ regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, > > Returning to the issue, the following query should give you the > > expected result. > > > > SELECT name, #thepath FROM iexit ORDER BY name COLLATE "C", 2; > > Yes, I have worked around it like this. What I couldn't understand is > how my patch can cause this regression. How is it passes on master > without COLLATE? Ah, sorry, I understand that *you* added the COLLATE. Revering select_views.sql/out to master actually causes a regression error. The reason for that is that you forgot to edit an alternative exptect file, which matches for en_US locale. But the test doesn't for collation and the only difference between the alternative expect files comes from the difference of collation for the query. "the workaround" seems to be the right way to do it. I recommend rather to leave the workaroud as it is and remove select_views_1.out from the "expected" directory. Aside from that, I'd like to comment this patch on other points later. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> Returning to the issue, the following query should give you the > expected result. > > SELECT name, #thepath FROM iexit ORDER BY name COLLATE "C", 2; Yes, I have worked around it like this. What I couldn't understand is how my patch can cause this regression. How is it passes on master without COLLATE? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Hello, At Thu, 29 Sep 2016 10:37:30 +0200, Emre Hasegeliwrote in > > regression=# select 'I- 580Ramp' < 'I- 580/I-680 > > Ramp'; > > ?column? > > -- > > t > > (1 row) > > on the Linux server I am testing, it is not: > > > regression=# select 'I- 580Ramp' < 'I- 580/I-680 > > Ramp'; > > ?column? > > -- > > f > > (1 row) > > The later should be the case on your environment as the test was also > failing for you. This is not consistent with the expected test > result. Do you know how this test can still pass on the master? Perhaps you ran the test under the environment LC_COLLATE (or LANG) of something other than C. The reson for the result is collation. The following returns expected result. =# select 'I- 580Ramp' < ('I- 580/I-680 Ramp' COLLATE "C"); ?column? -- t (1 row) For a reason I don't know, say, en_US locale behaves in an unintuitive way. regression=# select ' ' < ('/' COLLATE "en_US"), ' x' < ('/a' COLLATE "en_US"); ?column? | ?column? --+-- t| f (1 row) Returning to the issue, the following query should give you the expected result. SELECT name, #thepath FROM iexit ORDER BY name COLLATE "C", 2; regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> Well, those two results are not contradictory -- notice that you > switched the order of the values in the comparison. I don't think > you've really found the explanation yet. I am sorry I copy-pasted the wrong example: "select_views" test runs: > SELECT name, #thepath FROM iexit ORDER BY 1, 2 and expects to get rows in this order: > I- 580Ramp |8 > I- 580/I-680 Ramp |2 with the collation on my laptop, this is actually true: > regression=# select 'I- 580Ramp' < 'I- 580/I-680 > Ramp'; > ?column? > -- > t > (1 row) on the Linux server I am testing, it is not: > regression=# select 'I- 580Ramp' < 'I- 580/I-680 > Ramp'; > ?column? > -- > f > (1 row) The later should be the case on your environment as the test was also failing for you. This is not consistent with the expected test result. Do you know how this test can still pass on the master? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Sep 28, 2016 at 2:04 PM, Kevin Grittnerwrote: > Will take a look and post again. I am moving this patch to the next CF. You'll be hearing from me sometime after this CF is closed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Sep 28, 2016 at 12:02 PM, Emre Hasegeliwrote: >> `make check` finds differences per the attached. Please >> investigate why the regression tests are failing and what the >> appropriate response is. > > I fixed the first one and workaround the second with COLLATE "C". I > have how my changes caused this regression. > > "select_views" test runs "SELECT name, #thepath FROM iexit ORDER BY 1, > 2" and expects to get rows in this order: > >> I- 580Ramp |8 >> I- 580/I-680 Ramp |2 > > With the collation on my laptop, this is actually true: > >> regression=# select 'I- 580/I-680 Ramp' < 'I- 580 >> Ramp'; >> ?column? >> -- >> t >> (1 row) > > However, on the Linux server, I am testing it is not: > >> regression=# select 'I- 580Ramp' < 'I- 580/I-680 >> Ramp'; >> ?column? >> -- >> f >> (1 row) > > Do you know how it is not failing on the master? Well, those two results are not contradictory -- notice that you switched the order of the values in the comparison. I don't think you've really found the explanation yet. >> [discussing inline static functions compared to macros for min()/max(), etc.] >> I suspect that they will be as fast or faster, and they eliminate >> the hazard of multiple evaluation, where a programmer might not be >> aware of the multiple evaluation or of some side-effect of an >> argument. > > I reworked the the patches to use inline functions and fixed the > problems I found. The new versions are attached. Will take a look and post again. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> Emre, are you going to address the above? It would have to be Real > Soon Now. Yes, I am working on it. I found more problems, replaced more algorithms. That took a lot of time. I will post the new version really soon. I wouldn't feel bad, if you wouldn't have enough time to review it in this commitfest. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Fri, Sep 9, 2016 at 4:25 PM, Kevin Grittnerwrote: > On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeli wrote: > These patches apply and build on top of 5c609a74 with no problems, > but `make check` finds differences per the attached. Please > investigate why the regression tests are failing and what the > appropriate response is. >> I am not much experienced in C. If you think that inline functions >> are better suited, I can rework the patches. > > I suspect that they will be as fast or faster, and they eliminate > the hazard of multiple evaluation, where a programmer might not be > aware of the multiple evaluation or of some side-effect of an > argument. Emre, are you going to address the above? It would have to be Real Soon Now. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeliwrote: >> The first patch fails to apply due to bit-rot. That's easy enough >> to correct, but then it runs into warnings on make: > > Rebased and fixed. These patches apply and build on top of 5c609a74 with no problems, but `make check` finds differences per the attached. Please investigate why the regression tests are failing and what the appropriate response is. >> Something to consider before posting new version -- should we >> change some of those macros to static inline (in the .h files) to >> avoid double-evaluation hazards? They might perform as well or >> even better that way, and remove a subtle programmer foot-gun. > > One reason to keep them as macros is that they are currently > supporting both float4 and float8. Currently they look like this: > > FLOAT_EQ(val1, val2) > FLOAT_PL(result, val1, val2) > > I guess if we would use inline functions, they would look like: > > FLOAT8_EQ(val1, val2) > result = FLOAT8_PL(val1, val2) > > Result would need to be defined again in the function to be checked > for overflow. I think this way would be more developer-friendly. I > have gone some trouble to allocate the result to be able to use the > macros. Do you know if the performance of both would be the same? In one case where I was concerned that a static inline definition would not perform as well as a macro, I went so far as to compare the compiled code for both at a number of call sites in an optimized build and found that a reasonably current gcc generated *identical* code for both. Of course, this will not always be the case -- in particular for macros which require multiple evaluations of one or more parameters and they are not simple literals. In such cases, the static inline often benchmarks faster because the results from the potentially expensive expression can be stored on the stack or in a register, and just referenced again. > I am not much experienced in C. If you think that inline functions > are better suited, I can rework the patches. I suspect that they will be as fast or faster, and they eliminate the hazard of multiple evaluation, where a programmer might not be aware of the multiple evaluation or of some side-effect of an argument. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company regression.diffs Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Mon, Jul 18, 2016 at 3:54 PM, Emre Hasegeliwrote: > My progress so far is attached as 2 patches. First one introduces an > header file for adt/float.c. Second one refactors the geometric > operations. The first patch fails to apply due to bit-rot. That's easy enough to correct, but then it runs into warnings on make: btree_gin.c: In function ‘leftmostvalue_float4’: btree_gin.c:234:2: error: implicit declaration of function ‘get_float4_infinity’ [-Werror=implicit-function-declaration] return Float4GetDatum(-get_float4_infinity()); ^ btree_gin.c: In function ‘leftmostvalue_float8’: btree_gin.c:242:2: error: implicit declaration of function ‘get_float8_infinity’ [-Werror=implicit-function-declaration] return Float8GetDatum(-get_float8_infinity()); ^ Please fix. Something to consider before posting new version -- should we change some of those macros to static inline (in the .h files) to avoid double-evaluation hazards? They might perform as well or even better that way, and remove a subtle programmer foot-gun. Changing status to "Waiting on Author". -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 10:27 AM, Tom Lanewrote: > As I understand it, the key problem is that tests like "is point on line" > would basically never succeed except in the most trivial cases, because of > roundoff error. That's not very nice, and it might cascade to larger > problems like object-containment tests failing unexpectedly. We would > need to go through all the geometric operations and figure out where that > kind of gotcha is significant and what we can do about it. Seems like a > fair amount of work :-(. If somebody's willing to do that kind of > investigation, then sure, but I don't think just blindly removing these > macros is going to lead to anything good. Yeah, it does seem to need some research. > Also, I suppose this means that Robert promises not to make any of his > usual complaints about breaking compatibility? Because we certainly > would be. Pot, meet Mr. Kettle! Obviously, the inconvenience caused by any backward incompatibility has to be balanced against the fact that the new behavior is presumably better. But I stridently object to the accusation that of the two of us I'm the one more concerned with backward-compatibility. There may be some instances where I've had a more conservative judgement than you about breaking user-facing stuff, but you've blocked dozens of changes to the C API that would have enabled meaningful extension development on the grounds that somebody might complain when a future release changes the API! I think behavior changes that users will notice are of vastly greater significance than those which will only be observed by developers. In this particular case, I think that the current behavior is pretty stupid, and that the built-in geometric types are barely used, possibly because they have stupid behavior. So I would be willing to bet on a well-thought-out change in this area coming out to a net positive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Paul Ramseywrites: > One of the things people find annoying about postgis is that > ST_Intersects(ST_Intersection(a, b), a) can come out as false (a > derived point at a crossing of lines may not exactly intersect either > of the input lines), which is a direct result of our use of exact math > for the boolean intersects test. That's an interesting comment, because it's more or less exactly the type of failure that we could expect to get if we remove fuzzy comparisons from the built-in types. How much of a problem is it in practice for PostGIS users? Do you have any plans to change it? > Anyways, go forth and do whatever makes sense for PgSQL I think we're trying to figure out what that is ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 8:59 AM, Jim Nasbywrote: > On 6/1/16 10:03 AM, Paul Ramsey wrote: >> >> We don't depend on these, we have our own :/ >> The real answer for a GIS system is to have an explicit tolerance >> parameter for calculations like distance/touching/containment, but >> unfortunately we didn't do that so now we have our own >> compatibility/boil the ocean problem if we ever wanted/were funded to >> add one. > > > Well it sounds like what's currently happening in Postgres is probably going > to change, so how might we structure that to help PostGIS? Would simply > lopping off the last few bits of the significand/mantissa work, or is that > not enough when different GRSes are involved? PostGIS doesn't look at all at what the PgSQL geotypes do, so go forward w/o fear. Tolerance in geo world is more than vertex rounding though, it's things like saying that when distance(pt,line) < epsilon then distance(pt,line) == 0, or similarly for shape touching, etc. One of the things people find annoying about postgis is that ST_Intersects(ST_Intersection(a, b), a) can come out as false (a derived point at a crossing of lines may not exactly intersect either of the input lines), which is a direct result of our use of exact math for the boolean intersects test. Anyways, go forth and do whatever makes sense for PgSQL P > > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On 6/1/16 10:03 AM, Paul Ramsey wrote: We don't depend on these, we have our own :/ The real answer for a GIS system is to have an explicit tolerance parameter for calculations like distance/touching/containment, but unfortunately we didn't do that so now we have our own compatibility/boil the ocean problem if we ever wanted/were funded to add one. Well it sounds like what's currently happening in Postgres is probably going to change, so how might we structure that to help PostGIS? Would simply lopping off the last few bits of the significand/mantissa work, or is that not enough when different GRSes are involved? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On 06/01/2016 07:52 AM, Jim Nasby wrote: > On 6/1/16 9:27 AM, Tom Lane wrote: >> Kevin Grittnerwrites: >>> > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas >>> wrote: >> Figuring out what to do about it is harder. Your proposal seems to be >> to remove them except where we need the fuzzy behavior, which doesn't >> sound unreasonable, but I don't personally understand why we need it >> in some places and not others. >>> > +1 >>> > My first inclination is to remove those macros in version 10, but >>> > it would be good to hear from some people using these types on what >>> > the impact of that would be. >> As I understand it, the key problem is that tests like "is point on line" >> would basically never succeed except in the most trivial cases, >> because of >> roundoff error. That's not very nice, and it might cascade to larger >> problems like object-containment tests failing unexpectedly. We would >> need to go through all the geometric operations and figure out where that >> kind of gotcha is significant and what we can do about it. Seems like a >> fair amount of work :-(. If somebody's willing to do that kind of >> investigation, then sure, but I don't think just blindly removing these >> macros is going to lead to anything good. > > I suspect another wrinkle here is that in the GIS world a single point > can be represented it multiple reference/coordinate systems, and it > would have different values in each of them. AIUI the transforms between > those systems can be rather complicated if different projection methods > are involved. I don't know if PostGIS depends on what these macros are > doing or not. If it doesn't, perhaps it would be sufficient to lop of > the last few bits of the significand. ISTM that'd be much better than > what the macros currently do. IIRC PostGIS uses a function from libgeos to do things like "point equals" (ST_Equals). I've never looked at that source, but would be unsurprised to find that it does something similar to this although probably also more sophisticated. (looks) yes -- ST_Equals calls GEOSEquals() after some sanity checking... > BTW, I suspect the macro values were chosen specifically for dealing > with LAT/LONG. I think that is it exactly. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 7:52 AM, Jim Nasbywrote: > > I suspect another wrinkle here is that in the GIS world a single point can > be represented it multiple reference/coordinate systems, and it would have > different values in each of them. AIUI the transforms between those systems > can be rather complicated if different projection methods are involved. I > don't know if PostGIS depends on what these macros are doing or not. If it > doesn't, perhaps it would be sufficient to lop of the last few bits of the > significand. ISTM that'd be much better than what the macros currently do. We don't depend on these, we have our own :/ The real answer for a GIS system is to have an explicit tolerance parameter for calculations like distance/touching/containment, but unfortunately we didn't do that so now we have our own compatibility/boil the ocean problem if we ever wanted/were funded to add one. P. > BTW, I suspect the macro values were chosen specifically for dealing with > LAT/LONG. > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On 6/1/16 9:27 AM, Tom Lane wrote: Kevin Grittnerwrites: > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas wrote: >> Figuring out what to do about it is harder. Your proposal seems to be >> to remove them except where we need the fuzzy behavior, which doesn't >> sound unreasonable, but I don't personally understand why we need it >> in some places and not others. > +1 > My first inclination is to remove those macros in version 10, but > it would be good to hear from some people using these types on what > the impact of that would be. As I understand it, the key problem is that tests like "is point on line" would basically never succeed except in the most trivial cases, because of roundoff error. That's not very nice, and it might cascade to larger problems like object-containment tests failing unexpectedly. We would need to go through all the geometric operations and figure out where that kind of gotcha is significant and what we can do about it. Seems like a fair amount of work :-(. If somebody's willing to do that kind of investigation, then sure, but I don't think just blindly removing these macros is going to lead to anything good. I suspect another wrinkle here is that in the GIS world a single point can be represented it multiple reference/coordinate systems, and it would have different values in each of them. AIUI the transforms between those systems can be rather complicated if different projection methods are involved. I don't know if PostGIS depends on what these macros are doing or not. If it doesn't, perhaps it would be sufficient to lop of the last few bits of the significand. ISTM that'd be much better than what the macros currently do. BTW, I suspect the macro values were chosen specifically for dealing with LAT/LONG. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Kevin Grittnerwrites: > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas wrote: >> Figuring out what to do about it is harder. Your proposal seems to be >> to remove them except where we need the fuzzy behavior, which doesn't >> sound unreasonable, but I don't personally understand why we need it >> in some places and not others. > +1 > My first inclination is to remove those macros in version 10, but > it would be good to hear from some people using these types on what > the impact of that would be. As I understand it, the key problem is that tests like "is point on line" would basically never succeed except in the most trivial cases, because of roundoff error. That's not very nice, and it might cascade to larger problems like object-containment tests failing unexpectedly. We would need to go through all the geometric operations and figure out where that kind of gotcha is significant and what we can do about it. Seems like a fair amount of work :-(. If somebody's willing to do that kind of investigation, then sure, but I don't think just blindly removing these macros is going to lead to anything good. Also, I suppose this means that Robert promises not to make any of his usual complaints about breaking compatibility? Because we certainly would be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 8:08 AM, Robert Haaswrote: > On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli wrote: >> There are those macros defined for the built-in geometric types: >> >>> #define EPSILON 1.0E-06 >> >>> #define FPzero(A) (fabs(A) <= EPSILON) >>> #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON) >>> #define FPne(A,B) (fabs((A) - (B)) > EPSILON) >>> #define FPlt(A,B) ((B) - (A) > EPSILON) >>> #define FPle(A,B) ((A) - (B) <= EPSILON) >>> #define FPgt(A,B) ((A) - (B) > EPSILON) >>> #define FPge(A,B) ((B) - (A) <= EPSILON) >> >> with this warning: >> >>> *XXX These routines were not written by a numerical analyst. > > I agree that those macros looks like a pile of suck. +1 > It's unclear to > me what purpose they're trying to accomplish, but regardless of what > it is, it's hard for me to believe that they are accomplishing it. > Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on > the scale of the number; e.g. if the values are all like "3" it's > probably fine but if they are like "37142816124856" it's probably not > enough fuzz and if they are all like ".0004" it's probably way too > much fuzz. Also, it's a pretty lame heuristic. It doesn't really eliminate *any* problems; it just aims to make them less frequent by shifting the edges around. In doing so, it creates whole new classes of problems: test=# \set A '''(1.996,1.996),(1,1)''::box' test=# \set B '''(2,2),(1,1)''::box' test=# \set C '''(2.004,2.004),(1,1)''::box' test=# select :A = :B, :B = :C, :A = :C; ?column? | ?column? | ?column? --+--+-- t| t| f (1 row) Is the benefit we get from the macros worth destroying the transitive property of the comparison operators on these types? > Figuring out what to do about it is harder. Your proposal seems to be > to remove them except where we need the fuzzy behavior, which doesn't > sound unreasonable, but I don't personally understand why we need it > in some places and not others. +1 My first inclination is to remove those macros in version 10, but it would be good to hear from some people using these types on what the impact of that would be. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeliwrote: > There are those macros defined for the built-in geometric types: > >> #define EPSILON 1.0E-06 > >> #define FPzero(A) (fabs(A) <= EPSILON) >> #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON) >> #define FPne(A,B) (fabs((A) - (B)) > EPSILON) >> #define FPlt(A,B) ((B) - (A) > EPSILON) >> #define FPle(A,B) ((A) - (B) <= EPSILON) >> #define FPgt(A,B) ((A) - (B) > EPSILON) >> #define FPge(A,B) ((B) - (A) <= EPSILON) > > with this warning: > >> *XXX These routines were not written by a numerical analyst. I agree that those macros looks like a pile of suck. It's unclear to me what purpose they're trying to accomplish, but regardless of what it is, it's hard for me to believe that they are accomplishing it. Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on the scale of the number; e.g. if the values are all like "3" it's probably fine but if they are like "37142816124856" it's probably not enough fuzz and if they are all like ".0004" it's probably way too much fuzz. Figuring out what to do about it is harder. Your proposal seems to be to remove them except where we need the fuzzy behavior, which doesn't sound unreasonable, but I don't personally understand why we need it in some places and not others. It would be good if some of the people who are more numerically inclined than I am (and hate floats less, but then that's everyone) could jump in here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers