Sorry, make that "user" feedback.  ;-)

Mike

Mike Pogue wrote:
> 
> I suspect that converting to a double/float type also does *other* validation 
> that a lexical min/max
> routine wouldn't, such as detecting whether any invalid-in-this-context 
> characters were used.
> 
> So, a comparison of performance of "lexical min/max" vs. "the full-blown try 
> to convert it to
> double/float" is probably an apples-to-oranges comparison.
> 
> My experience with customer feedback so far is that they do not want lexical 
> types.
> They want direct mappings to built-in types (double, float, etc.), and 
> accessors for them.
> 
> Mike
> 
> "Arnold, Curt" wrote:
> >
> > Some background:
> >
> > I've been lobbying the XML Schema working group to reestablish the "real" 
> > datatype that had been in Datatypes until the 17 Dec 1999 draft.  Prior to 
> > the 17 Dec draft, there had been a "real" datatype
> > that was a arbitrary range and precision floating point value.  There was a 
> > minAbsoluteValue facet that I didn't like (for details
> > see:(http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0024.html).
> >
> > In the 17 Dec draft, the minAbsoluteValue facet disappeared, the double and 
> > float datatypes were added as primitive datatypes (corresponding to IEEE) 
> > and the "real" datatype was removed.  I don't have
> > any problems with the first two, but I still think that there is 
> > substantial justification for a arbitrary range and precision floating 
> > point.
> >
> > One of the points (for others see 
> > http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0133.html),
> >  I tried to make was that a lexical-based comparision for evaluation of min 
> > and max
> > constraints could be substantially faster than conversion to double and 
> > then IEEE floating point comparision.  And that unless you really desired 
> > to mimic the rounding effects of IEEE, you should not
> > have to pay the penalty.  Basically that a lexical-based comparision would 
> > see 1.00000000000000000000000001 as greater than one and would accept it if 
> > you had a <minExclusive value="1.0"/>.  A IEEE
> > based comparision would have to realize that 1.00000000000000000001 rounds 
> > to 1.0 and that 1.0 is not greater than 1.0.
> >
> > Unfortunately, I could not provide benchmarks to quantify the difference.  
> > So today, I finally did some benchmarks.  Hopefully, these can be useful 
> > independent of the schema debate.  All times are
> > reported in GetTickCount() values for a Pentium II 400 running Windows 2000 
> > Profession with code compiled using VC6 with the default Release settings.
> >
> > First, was a timing of 600000 floating point comparisions against 0 (0 was 
> > converted or parsed outside the loop).  All values were within the range 
> > and precision of double and no NaN's, Infinity's or
> > illegal lexicals were in the test set.
> >
> > a) 600000 Unicode strings compared using a home-grown lexical comparision : 
> > 300 ticks
> > b) Equivalent ASCII strings converted using atof() and compared: 5308 ticks
> > c) Equivalent ASCII strings converted using sscanf() and compared: 6379 
> > ticks
> > d) Unicode strings converted using VarR8FromStr (COM Automation support 
> > routine): 1552 ticks
> >
> > I wasn't successful in the time I had allotted to benchmark conversion 
> > using std::basic_istream<XMLCh>.
> >
> > So conversion was between 5-20 times slower than a lexically-based 
> > comparision.  VarR8FromStr is much more efficient than the C RTL's atof() 
> > function, but at the cost of platform independence.
> >
> > atof() or sscanf() will also not give you consistent results if C++'s 
> > double is not IEEE on the particular platform where the lexical comparision 
> > should give identical results on all platforms.
> >
> > To put this in perspective to parsing time, I benchmarked reading a data 
> > file for one of our programs (262000 numbers compared against a 
> > preconverted 0) in various modes.
> >
> > Expat, no numeric comparision: 2610
> > Expat, lexical comparision: 2804
> > Expat: VarR8FromStr: 3525
> > Expat, atof comparision: 5384
> >
> > Xerces (1_1_0_d05), non-validating, no comparision: 6679
> > Xerces, validating, no comparision: 6789
> > Xerces: Non-Validating, lexical range: 8800
> > Xerces: Validating, lexical: 9000
> > Xerces: Validating, VarR8: 9900
> > Xerces: Non-Validating, VarR8:  9850
> >
> > So, using conversion for bound checking can almost than double the parse 
> > time for a numerically intensive file.
> >
> > I'd like to see:
> >
> > 1) Putting a real type back in, make float, double and decimal derived from 
> > real
> > 2) Using lexical validation for real
> > 3) Use lexical validation as a first (and typically only) phase for 
> > validation of float and double types.  From the lexical validation, you can 
> > detect if you are potentially in an area where rounding
> > could give you difference answers.  Since rounding should affect only a 
> > tiny fraction of comparisions, then a conversion-based comparision would 
> > rarely ever be needed.  Alternative, min and max
> > constraints could be explicitly stated in the schema draft to be performed 
> > lexically.
> > 4) Don't depend on C RTL for conversion of double for type-aware DOM's etc.
> >
> > I'll make my lexical comparision code available to the Apache project, if 
> > anyone is interested.

Reply via email to