Hello, At Sun, 13 Nov 2016 22:41:01 +0100, Emre Hasegeli <e...@hasegeli.com> wrote in <cae2gyzymeqxggmhu1vc35dpugwfrd-qrk3bm-6tgg0rwhcd...@mail.gmail.com> > > We can remove the fuzz factor altogether but I think we also > > should provide a means usable to do similar things. At least "is > > a point on a line" might be useless for most cases without any > > fuzzing feature. (Nevertheless, it is a problem only when it is > > being used to do that:) If we don't find reasonable policy on > > fuzzing operations, it would be the proof that we shouldn't > > change the behavior. > > It was my initial idea to keep the fuzzy comparison behaviour on some > places, but the more I get into I realised that it is almost > impossible to get this right. Instead, I re-implemented some > operators to keep precision as much as possible. The previous "is a > point on a line" operator would *never* give the true result without > the fuzzy comparison. The new implementation would return true, when > precision is not lost.

There's no way to accurately store numbers other than some special ones in floating point format where mantissa precision is limited. Even 0.1 is not stored accurately with a binary mantissa. (It would be concealed by machine rounding for most cases though.) The same is said for numeric. It cannot store 1/3 accurately and doesn't even conceal the incaccuracy. Even if they could store numbers accurately, anyway , say, the constant pi does not have infinite precision. As the result, we have a tradition that equal operator on real numbers should be avoided generally. Numerics are provided mainly for financial use, on which finite-but-high precision decimal arithmetic are performed. > I think this is a behaviour people, who are > working with floating points, are prepared to deal with. What way to deal with it is in your mind? The problem hides behind operators. To fix it a user should rewrite a expression using more primitive operators. For example, (line_a # point_a) should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in more primitive way. I regared this that the operator # just become useless. > By the way, "is a point on a line" operator is quite wrong with > the fuzzy comparison at the moment [1]. Th bug is a isolate problem from the epsilon behavior. It of course should be fixed, but in "appropriate" way that may defined in this discussion. > > The 0001 patch adds many FP comparison functions individually > > considering NaN. As the result the sort order logic involving NaN > > is scattered around into the functions, then, you implement > > generic comparison function using them. It seems inside-out to > > me. Defining ordering at one place, then comparison using it > > seems to be reasonable. > > I agree that it would be simpler to use the comparison function for > implementing other operators. I have done it other way around to make > them more optimised. They are called very often. I don't think > checking exit code of the comparison function would be optimised the > same way. I could leave the comparison functions as they are, but > re-implemented them using the others to keep documentation of NaN > comparison in the single place. Regarding optimization, at least gcc generates seemingly not so different code for the two. The both generally generates extended code directly calling isnan() and so. Have you measured the performance of the two implement (with -O2, without --enable-cassert)? This kind of hand-optimization gets legitimacy when we see a siginificant difference, according to the convention here.. I suppose. At least the comment you dropped by the patch, int float4_cmp_internal(float4 a, float4 b) { - /* - * We consider all NANs to be equal and larger than any non-NAN. This is - * somewhat arbitrary; the important thing is to have a consistent sort - * order. - */ seems very significant and should be kept anywhere relevant. > > If the center somehow goes extremely near to the origin, it could > > result in a false error. > > > >> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)'; > >> ERROR: value out of range: underflow > > > > I don't think this underflow is an error, and actually it is a > > change of the current behavior without a reasonable reason. More > > significant (and maybe unacceptable) side-effect is that it > > changes the behavior of ordinary operators. I don't think this is > > acceptable. More consideration is needed. > > > >> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0; > >> ERROR: value out of range: underflow > > This is the current behaviour of float datatype. My patch doesn't > change that. Very sorry for the mistake! I somehow saw float8pl on master and float8_div with your patch. Pleas forget it. > This problem would probably also apply to multiplying > very small values. I agree that this is not the ideal behaviour. > Though I am not sure, if we should go to a different direction than > the float datatypes. Yes, perhaps the behavior of geometric arithmetic should be considered differently from bare float calculation, but it would not be the topic for now. > I think there is value in making geometric types compatible with the > float. Users are going to mix them, anyway. For example, users can > calculate the center of a box manually, and confuse when the built-in > operator behaves differently. That doesn't resolve the problem that some comparison operators are practically useless, and means that we (maybe) should create another geometirc data type based on numeric. But it also would be a crap from the same reason with float-based. At a glance, by no-tolerance comparison, the following geometric operators seem to be hardly usable. #, &&, ?#, ?-, ?|, ?-|, ?||, @>, <@, ~= And including(le, ge) and exclusing(lt, gt) comparisons seem to lose their distinction practically. <<, >>, &<, &>, <<|, |>>, &<|, |&>, <^, >^ I seached pgsql-general ML but counldn't find a complaint about the current behavior. Even though I'm not familar with PostGIS, I found that it uses exactly the same EPSILON method with PostgreSQL. And libgeos also takes tolerance on comparison by several ways. From the facts, I think that the users of geometric types requires any kind of tolerance/fuzzyness factor on comparison. If we had an apparent plan to use them for other than earth-scale(?) geometric usage, we could design what they should be. But without such a plan it is just a breakage of the current usage. # By the way, 2D geometric functions are inaccurate for # earth-scale spherical figures , but we ignoring that? > > In regard to fuzzy operations, libgeos seems to have several > > types of this kind of feature. (I haven't looked closer into > > them). Other than reducing precision seems overkill or > > unappliable for PostgreSQL bulitins. As Jim said, can we replace > > the fixed scale fuzz factor by precision reduction? Maybe, with a > > GUC variable (I hear someone's roaring..) to specify the amount > > defaults to fit the current assumption. > > I am disinclined to try to implement something complicated for the > geometric types. > I think they are mostly useful for 2 purposes: uses > simple enough to not worth looking for better solutions, and > demonstrating our indexing capabilities. The inconsistencies harm > both of those. About What kind of (needless) complication you are saying? The fuzziness seems to me essential for geometric comparisons to work practically. Addition to that, I don't think that we're not allowed to change the behavior in such area of released versions the time after time. I don't think index scan and tolerant comparison are not contradicting. Could you let me have an description about the indexing capabilities and the inconsistencies? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers