I suspect that converting to a double/float type also does *other* validation that a lexical min/max routine wouldn't, such as detecting whether any invalid-in-this-context characters were used.
So, a comparison of performance of "lexical min/max" vs. "the full-blown try to convert it to double/float" is probably an apples-to-oranges comparison. My experience with customer feedback so far is that they do not want lexical types. They want direct mappings to built-in types (double, float, etc.), and accessors for them. Mike "Arnold, Curt" wrote: > > Some background: > > I've been lobbying the XML Schema working group to reestablish the "real" > datatype that had been in Datatypes until the 17 Dec 1999 draft. Prior to > the 17 Dec draft, there had been a "real" datatype > that was a arbitrary range and precision floating point value. There was a > minAbsoluteValue facet that I didn't like (for details > see:(http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0024.html). > > In the 17 Dec draft, the minAbsoluteValue facet disappeared, the double and > float datatypes were added as primitive datatypes (corresponding to IEEE) and > the "real" datatype was removed. I don't have > any problems with the first two, but I still think that there is substantial > justification for a arbitrary range and precision floating point. > > One of the points (for others see > http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0133.html), > I tried to make was that a lexical-based comparision for evaluation of min > and max > constraints could be substantially faster than conversion to double and then > IEEE floating point comparision. And that unless you really desired to mimic > the rounding effects of IEEE, you should not > have to pay the penalty. Basically that a lexical-based comparision would > see 1.00000000000000000000000001 as greater than one and would accept it if > you had a <minExclusive value="1.0"/>. A IEEE > based comparision would have to realize that 1.00000000000000000001 rounds to > 1.0 and that 1.0 is not greater than 1.0. > > Unfortunately, I could not provide benchmarks to quantify the difference. So > today, I finally did some benchmarks. Hopefully, these can be useful > independent of the schema debate. All times are > reported in GetTickCount() values for a Pentium II 400 running Windows 2000 > Profession with code compiled using VC6 with the default Release settings. > > First, was a timing of 600000 floating point comparisions against 0 (0 was > converted or parsed outside the loop). All values were within the range and > precision of double and no NaN's, Infinity's or > illegal lexicals were in the test set. > > a) 600000 Unicode strings compared using a home-grown lexical comparision : > 300 ticks > b) Equivalent ASCII strings converted using atof() and compared: 5308 ticks > c) Equivalent ASCII strings converted using sscanf() and compared: 6379 ticks > d) Unicode strings converted using VarR8FromStr (COM Automation support > routine): 1552 ticks > > I wasn't successful in the time I had allotted to benchmark conversion using > std::basic_istream<XMLCh>. > > So conversion was between 5-20 times slower than a lexically-based > comparision. VarR8FromStr is much more efficient than the C RTL's atof() > function, but at the cost of platform independence. > > atof() or sscanf() will also not give you consistent results if C++'s double > is not IEEE on the particular platform where the lexical comparision should > give identical results on all platforms. > > To put this in perspective to parsing time, I benchmarked reading a data file > for one of our programs (262000 numbers compared against a preconverted 0) in > various modes. > > Expat, no numeric comparision: 2610 > Expat, lexical comparision: 2804 > Expat: VarR8FromStr: 3525 > Expat, atof comparision: 5384 > > Xerces (1_1_0_d05), non-validating, no comparision: 6679 > Xerces, validating, no comparision: 6789 > Xerces: Non-Validating, lexical range: 8800 > Xerces: Validating, lexical: 9000 > Xerces: Validating, VarR8: 9900 > Xerces: Non-Validating, VarR8: 9850 > > So, using conversion for bound checking can almost than double the parse time > for a numerically intensive file. > > I'd like to see: > > 1) Putting a real type back in, make float, double and decimal derived from > real > 2) Using lexical validation for real > 3) Use lexical validation as a first (and typically only) phase for > validation of float and double types. From the lexical validation, you can > detect if you are potentially in an area where rounding > could give you difference answers. Since rounding should affect only a tiny > fraction of comparisions, then a conversion-based comparision would rarely > ever be needed. Alternative, min and max > constraints could be explicitly stated in the schema draft to be performed > lexically. > 4) Don't depend on C RTL for conversion of double for type-aware DOM's etc. > > I'll make my lexical comparision code available to the Apache project, if > anyone is interested.