Andy, Richard,
Thanks a lot.
Indeed this xsd:float as a starting point makes the problem together with the 
fact that it is better that the float is in the scientific form
I think all these explanations help a lot to also put some requirements at the 
data source side.

Best regards
Chavdar

-----Original Message-----
From: Marco Neumann <[email protected]>
Sent: Wednesday, 19 August, 2020 11:45
To: [email protected]
Subject: Re: Float comparison

Andy, yes I would agree xsd:float can lead to some funky behavior here due to 
precision. While you are at it this could also explain why ?y is bound to ?x in 
the example below on blazegraph but still "correctly" mapped in Jena. Simply a 
bug in wikidata/blazegraph that doesn't throw an error and is not caught on the 
server side.

PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>

SELECT ?x ?y ?z WHERE{
values ?x { "100123456.01"^^xsd:float }
values ?y { "100123459.01"^^xsd:float }
values ?z { "100123451.01"^^xsd:float}
}

Blazegraph

https://query.wikidata.org/#PREFIX%20xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fx%20%3Fy%20%3Fz%20WHERE%7B%0Avalues%20%3Fx%20%7B%20%22100123456.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fy%20%7B%20%22100123459.01%22%5E%5Exsd%3Afloat%20%7D%0Avalues%20%3Fz%20%7B%20%22100123451.01%22%5E%5Exsd%3Afloat%7D%0A%7D


x                      y                      z
100123456.01 100123456.01 100123451.01

Jena 3.15

http://www.lotico.com:3030/lotico/sparql?query=PREFIX+xsd%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0A%0D%0ASELECT+%3Fx+%3Fy+%3Fz+WHERE%7B%0D%0Avalues+%3Fx+%7B+%22100123456.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fy+%7B+%22100123459.01%22%5E%5Exsd%3Afloat+%7D%0D%0Avalues+%3Fz+%7B+%22100123451.01%22%5E%5Exsd%3Afloat%7D%0D%0A%7D&output=text


x                                             y
               z
100123456.01                     100123459.01
100123451.01


On Wed, Aug 19, 2020 at 10:13 AM Andy Seaborne <[email protected]> wrote:

>
>
> On 18/08/2020 22:17, Dr. Chavdar Ivanov wrote:
> > Andy, Richard,
> > Thank you for the feedback.
> >
> > In the graph  I have the 2 values as xsd:float so this is how the
> > data
> is coming
> >
> > In the SPAQL query I tried to cast the float to decimal by using
> > FILTER (xsd:decimal(?value1)!=xsd:decimal(?value1)).
> >
> > I am not sure if this is correct way, but I am now seeing a
> > difference
> in the comparison result
> >
> > 0.1001244561 Is different from 0.1001234590 which is OK
>           ^^ typo?
>
>
> > But these are reported as same 100123456.1     and  100123459.0
>
>
> 100123456.1 is not a floating point number. It has more precision than
> xsf:float can represent.
>
> It's "1.00123456E8"^^xsd:float
>
>
> (Please copy and paste expressions into email.)
>
> xsd:decimal(?value1)
>
> is:
> evaluate ?value1 to get an xsd:float.
>
> which is
>
> '"0.1001244561"^^xsd:float
>
> Using Jena's expression evaluator:
>
> qexpr '"0.1001244561"^^xsd:float+0'
>   ==>
> "0.100124456"^^xsd:float
>
> See? Already lost precision.
>
> Then turn it into a deciminal.
>
> it is different to:
> xsd:decimal(str(?value1))
>
> which takes the lexical form, not the floating point value, of ?value1.
>
> > If I get the value before the comparison is executed the xsd:decimal
> > of
> the two values appears to be the same 100123456.0 so this is why !=
> does not reports the difference.
> > Here the decimal does not seem to help,
>
> Because precision was lost making the decimal.  Start with a decimal.
>
> xsd:decimal("0.1001244561")
>   or "0.1001244561"^^xsd:decimal
>   or 0.1001244561   (in Turtle and SPARQL).
>
> > but I guess this falls in the same category that large absolute
> > values
> are less precise. So same effect as for xsd:float.
> >
> > Best regards
> > Chavdar
> >
> >
> >
> > -----Original Message-----
> > From: Andy Seaborne <[email protected]>
> > Sent: Tuesday, 18 August, 2020 19:07
> > To: [email protected]
> > Subject: Re: Float comparison
> >
> >
> >
> > On 18/08/2020 10:31, Richard Cyganiak wrote:
> >> The xsd:float datatype represents IEEE 754 single-precision
> >> floating
> point numbers.
> >>
> >> As with any floating-point datatype, the precision depends on the
> >> size
> of the number. Numbers close to zero are very precise. Numbers with a
> large absolute value (large positive or large negative) are less
> precise. For the gory details see for example here:
> >>
> >> https://en.wikipedia.org/wiki/Single-precision_floating-point_forma
> >> t#P recision_limitations_on_decimal_values_in_[1,_16777216]
> >>
> >> There is rarely a good reason to use xsd:float in RDF. xsd:double
> >> is
> much more precise at a small increase of storage cost (4 more bytes,
> which is negligible given the total size of an RDF triple).
> xsd:decimal provides arbitrary precision (in theory), but is more
> expensive in storage and computation.
> >>
> >> My general view is that if storage size and performance of
> >> mathematical
> computations are a major concern for the application, RDF is probably
> not the best choice—RDF optimises for other concerns. Therefore the
> best choice for representing non-integer numbers in RDF is usually
> xsd:decimal—more expensive, but no issues with precision.
> >>
> >> Richard
> >
> > xsd:decimal can record any decimal precision but division may loose
> precision - otherwise "1/3" is infinite storage.
> >
> > Jena uses 24 digit precision for division for inexact results like 1/3.
> >
> >>
> >>
> >>> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov
> >>> <[email protected]>
> wrote:
> >>>
> >>> Hello
> >>>
> >>>
> >>>
> >>> I posted the message below to the TopBraid users mailing list and
> >>> already clarified that as sh:equals is based on RDF node equality,
> >>> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
> >>> So I am keeping this for the interest of others in the list
> >
> > SPARQL has both comparisons.
> >
> > The "sameTerm()" operator for RDF termequality, and SPARQL "=" for
> > value
> comparison (by op:numeric-equal):
> >
> >       Andy
> >
> >>>
> >>>
> >>>
> >>> But on SPARQL float comparison I got an advise to check in this
> mailing list for other opinions.
> >>>
> >>> I understand that SPARQL comparison is mathematically based so 1.0
> should be equal to 1. However below in item 2 you will see the numbers
> I compared and I am getting confused. Take into account that in the
> data graph the 2 compared properties are typed literals with datatype float.
> >>>
> >>> I wanted to know what is the precision when float is compared. So
> >>> I have 2 questions
> >>>
> >>> *       What is the precision? - is it 6th decimal and is it OK to
> compare different forms of float, i.e. one is in scientific form
> >>> *       Why I am getting wrong comparison result for bigger values
> such as    100123456.1     and  100123459     which are found as same
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Chavdar
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ========
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Dear all,
> >>>
> >>>
> >>>
> >>> I have a very basic question...
> >>>
> >>> I need to compare literals that are floats and tried to use two ways.
> >>> 1) using sh:equals to compare 2 properties and 2) using SPARQL
> >>> where I filter != different values
> >>>
> >>>
> >>>
> >>> For the filter I tried using
> >>>
> >>> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
> >>>
> >>> or
> >>>
> >>> FILTER (?value1!=?value1).
> >>>
> >>> Both give the same outcome.
> >>>
> >>>
> >>>
> >>> Below I listed a summary of the tests I did
> >>>
> >>>
> >>>
> >>> I think sh:equals treats the literals as strings even though they
> >>> are
> floats. It also gives 2 results. I thing this looks like according to
> the SHACL spec although I didn't if the sh:equals ignores the datatype.
> >>>
> >>>
> >>>
> >>> However In some cases the result form the SPARQL is kind of strange.
> It looks like the precision is 10-6, but for the big numbers  and when
> scientific form on float number is used we have something different.
> >>>
> >>>
> >>>
> >>> What is followed to define the difference?
> >>>
> >>> If I use google calculator
> >>>
> >>> 100123456.1-100.123459E+06=-2.90000000596
> >>>
> >>>
> >>>
> >>> Normally it should be OK to compare different forms of float.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 1) using sh:equals in the property shape
> >>>
> >>> Value1 ; value 2  ; comparisson result
> >>>
> >>> 1.123456 ; 1.123456 ; same
> >>>
> >>> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
> >>>
> >>> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
> >>>
> >>> 30    ;      30.0000001 ; different (sh:equals reports it twice)
> >>>
> >>> 30     ;      30.000001 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456.0  ; 100123456.1 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> 100123456.0  ; 100123456.0 ; same
> >>>
> >>> 100123456    ;  100.123456E6 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456    ;  100.123456E+06 ; different (sh:equals reports it twice)
> >>>
> >>> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> 100123456.1    ;  100.123456E+06  ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;   100.123459E+06 ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;  100123459      ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1     ;  100123459.0    ; different (sh:equals reports it
> twice)
> >>>
> >>>
> >>>
> >>> 2) using SPARQL (in the property shape)
> >>>
> >>> 1.123456 ; 1.123456 ; same
> >>>
> >>> 1.1234560 ; 1.1234561 ; different
> >>>
> >>> 31.1234560 ; 31.1234561 ;different
> >>>
> >>> 30    ;      30.0000001 ; same
> >>>
> >>> 30     ;      30.000001 ; different
> >>>
> >>> 100123456.0  ; 100123456.1 ; same
> >>>
> >>> 100123456.0  ; 100123456.0 ; same
> >>>
> >>> 100123456    ;  100.123456E6 ; same
> >>>
> >>> 100123456    ;  100.123456E+06 ; same
> >>>
> >>> -0.123456789  ;  -123.456789E-3 ; same
> >>>
> >>> -0.123456789  ;  -123.456789E-03 ; same
> >>>
> >>> 100123456.1    ;  100.123456E+06  ; same
> >>>
> >>> 100123456.1     ;   100.123459E+06 ; same
> >>>
> >>> 100123456.1     ;  100123459      ; same
> >>>
> >>> 100123456.1     ;  100123459.0    ; same
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Chavdar
> >>>
> >>>
> >>>
> >>
>


--


---
Marco Neumann
KONA

Reply via email to