Re: Float comparison

Andy Seaborne Wed, 19 Aug 2020 07:08:53 -0700



On 19/08/2020 10:44, Marco Neumann wrote:

Andy, yes I would agree xsd:float can lead to some funky behavior here due
to precision. While you are at it this could also explain why ?y is bound
to ?x in the example below on blazegraph but still "correctly" mapped in
Jena. Simply a bug in wikidata/blazegraph that doesn't throw an error and
is not caught on the server side.


Blazegraph has inlined the value as a float so the lexical form is lost.

In Jena, in memory, it is keeping the lexical form around but as soon asyou touch it that's lost.


See the example:

qexpr '"0.1111234521234567"^^xsd:float+0'

qexpr '"0.1111234591234567"^^xsd:float+0'


What happens is:

In F&O arithmetic is in one of integer, decimal, float or double.Anything else is cast to one of those before


1/ Get float value of the LHS - precision lost.
2/ Cast 0 to float, the least promotion
3/ Do floating point add.

You now have a FP number. It has about 6 digits of precision.

So X+0 != X.

Don't exact compare floats or doubles!


Try this in Java:

        float f =  12345.6789f ;
        System.out.println(f);

==>
        12345.679

6 digits of precision then noise.

Now try TDB2 with

PREFIX : <http://example/>
PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

:s :p "12345.678987654321"^^xsd:float .

    Andy


PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>

SELECT ?x ?y ?z WHERE{
values ?x { "100123456.01"^^xsd:float }
values ?y { "100123459.01"^^xsd:float }
values ?z { "100123451.01"^^xsd:float}
}

Blazegraph

https://query.wikidata.org/#PREFIX%20xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fx%20%3Fy%20%3Fz%20WHERE%7B%0Avalues%20%3Fx%20%7B%20%22100123456.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fy%20%7B%20%22100123459.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fz%20%7B%20%22100123451.01%22%5E%5Exsd%3Afloat%7D%0A%7D


x                      y                      z
100123456.01 100123456.01 100123451.01

Jena 3.15

http://www.lotico.com:3030/lotico/sparql?query=PREFIX+xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0A%0D%0ASELECT+%3Fx+%3Fy+%3Fz+WHERE%7B%0D%0Avalues+%3Fx+%7B+%22100123456.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fy+%7B+%22100123459.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fz+%7B+%22100123451.01%22%5E%5Exsd%3Afloat%7D%0D%0A%7D&output=text


x                                             y
                z
100123456.01                     100123459.01
100123451.01


On Wed, Aug 19, 2020 at 10:13 AM Andy Seaborne <[email protected]> wrote:



On 18/08/2020 22:17, Dr. Chavdar Ivanov wrote:

Andy, Richard,
Thank you for the feedback.

In the graph  I have the 2 values as xsd:float so this is how the data

is coming


In the SPAQL query I tried to cast the float to decimal by using
FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).

I am not sure if this is correct way, but I am now seeing a difference

in the comparison result


0.1001244561 Is different from 0.1001234590 which is OK

           ^^ typo?

But these are reported as same 100123456.1     and  100123459.0



100123456.1 is not a floating point number. It has more precision than
xsf:float can represent.

It's "1.00123456E8"^^xsd:float


(Please copy and paste expressions into email.)

xsd:decimal(?value1)

is:
evaluate ?value1 to get an xsd:float.

which is

'"0.1001244561"^^xsd:float

Using Jena's expression evaluator:

qexpr '"0.1001244561"^^xsd:float+0'
   ==>
"0.100124456"^^xsd:float

See? Already lost precision.

Then turn it into a deciminal.

it is different to:
xsd:decimal(str(?value1))

which takes the lexical form, not the floating point value, of ?value1.

If I get the value before the comparison is executed the xsd:decimal of

the two values appears to be the same 100123456.0 so this is why != does
not reports the difference.

Here the decimal does not seem to help,


Because precision was lost making the decimal.  Start with a decimal.

xsd:decimal("0.1001244561")
   or "0.1001244561"^^xsd:decimal
   or 0.1001244561   (in Turtle and SPARQL).

but I guess this falls in the same category that large absolute values

are less precise. So same effect as for xsd:float.


Best regards
Chavdar



-----Original Message-----
From: Andy Seaborne <[email protected]>
Sent: Tuesday, 18 August, 2020 19:07
To: [email protected]
Subject: Re: Float comparison



On 18/08/2020 10:31, Richard Cyganiak wrote:

The xsd:float datatype represents IEEE 754 single-precision floating

point numbers.


As with any floating-point datatype, the precision depends on the size

of the number. Numbers close to zero are very precise. Numbers with a large
absolute value (large positive or large negative) are less precise. For the
gory details see for example here:


https://en.wikipedia.org/wiki/Single-precision_floating-point_format#P
recision_limitations_on_decimal_values_in_[1,_16777216]

There is rarely a good reason to use xsd:float in RDF. xsd:double is

much more precise at a small increase of storage cost (4 more bytes, which
is negligible given the total size of an RDF triple). xsd:decimal provides
arbitrary precision (in theory), but is more expensive in storage and
computation.


My general view is that if storage size and performance of mathematical

computations are a major concern for the application, RDF is probably not
the best choice—RDF optimises for other concerns. Therefore the best choice
for representing non-integer numbers in RDF is usually xsd:decimal—more
expensive, but no issues with precision.


Richard


xsd:decimal can record any decimal precision but division may loose

precision - otherwise "1/3" is infinite storage.


Jena uses 24 digit precision for division for inexact results like 1/3.

On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov <[email protected]>

wrote:


Hello



I posted the message below to the TopBraid users mailing list and
already clarified that as sh:equals is based on RDF node equality,
values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
So I am keeping this for the interest of others in the list


SPARQL has both comparisons.

The "sameTerm()" operator for RDF termequality, and SPARQL "=" for value

comparison (by op:numeric-equal):


       Andy




But on SPARQL float comparison I got an advise to check in this

mailing list for other opinions.


I understand that SPARQL comparison is mathematically based so 1.0

should be equal to 1. However below in item 2 you will see the numbers I
compared and I am getting confused. Take into account that in the data
graph the 2 compared properties are typed literals with datatype float.


I wanted to know what is the precision when float is compared. So I
have 2 questions

*       What is the precision? - is it 6th decimal and is it OK to

compare different forms of float, i.e. one is in scientific form

*       Why I am getting wrong comparison result for bigger values

such as    100123456.1     and  100123459     which are found as same




Best regards

Chavdar





========





Dear all,



I have a very basic question...

I need to compare literals that are floats and tried to use two ways.
1) using sh:equals to compare 2 properties and 2) using SPARQL where
I filter != different values



For the filter I tried using

FILTER (xsd:float(?value1)!=xsd:float(?value1)).

or

FILTER (?value1!=?value1).

Both give the same outcome.



Below I listed a summary of the tests I did



I think sh:equals treats the literals as strings even though they are

floats. It also gives 2 results. I thing this looks like according to the
SHACL spec although I didn't if the sh:equals ignores the datatype.




However In some cases the result form the SPARQL is kind of strange.

It looks like the precision is 10-6, but for the big numbers  and when
scientific form on float number is used we have something different.




What is followed to define the difference?

If I use google calculator

100123456.1-100.123459E+06=-2.90000000596



Normally it should be OK to compare different forms of float.





1) using sh:equals in the property shape

Value1 ; value 2  ; comparisson result

1.123456 ; 1.123456 ; same

1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)

31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)

30    ;      30.0000001 ; different (sh:equals reports it twice)

30     ;      30.000001 ; different (sh:equals reports it twice)

100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)

100123456.0  ; 100123456.0 ; same

100123456    ;  100.123456E6 ; different (sh:equals reports it twice)

100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)

-0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
twice)

-0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
twice)

100123456.1    ;  100.123456E+06  ; different (sh:equals reports it

twice)


100123456.1     ;   100.123459E+06 ; different (sh:equals reports it

twice)


100123456.1     ;  100123459      ; different (sh:equals reports it

twice)


100123456.1     ;  100123459.0    ; different (sh:equals reports it

twice)




2) using SPARQL (in the property shape)

1.123456 ; 1.123456 ; same

1.1234560 ; 1.1234561 ; different

31.1234560 ; 31.1234561 ;different

30    ;      30.0000001 ; same

30     ;      30.000001 ; different

100123456.0  ; 100123456.1 ; same

100123456.0  ; 100123456.0 ; same

100123456    ;  100.123456E6 ; same

100123456    ;  100.123456E+06 ; same

-0.123456789  ;  -123.456789E-3 ; same

-0.123456789  ;  -123.456789E-03 ; same

100123456.1    ;  100.123456E+06  ; same

100123456.1     ;   100.123459E+06 ; same

100123456.1     ;  100123459      ; same

100123456.1     ;  100123459.0    ; same



Best regards

Chavdar

Re: Float comparison

Reply via email to