Sorry, make that "user" feedback. ;-) Mike
Mike Pogue wrote: > > I suspect that converting to a double/float type also does *other* validation > that a lexical min/max > routine wouldn't, such as detecting whether any invalid-in-this-context > characters were used. > > So, a comparison of performance of "lexical min/max" vs. "the full-blown try > to convert it to > double/float" is probably an apples-to-oranges comparison. > > My experience with customer feedback so far is that they do not want lexical > types. > They want direct mappings to built-in types (double, float, etc.), and > accessors for them. > > Mike > > "Arnold, Curt" wrote: > > > > Some background: > > > > I've been lobbying the XML Schema working group to reestablish the "real" > > datatype that had been in Datatypes until the 17 Dec 1999 draft. Prior to > > the 17 Dec draft, there had been a "real" datatype > > that was a arbitrary range and precision floating point value. There was a > > minAbsoluteValue facet that I didn't like (for details > > see:(http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0024.html). > > > > In the 17 Dec draft, the minAbsoluteValue facet disappeared, the double and > > float datatypes were added as primitive datatypes (corresponding to IEEE) > > and the "real" datatype was removed. I don't have > > any problems with the first two, but I still think that there is > > substantial justification for a arbitrary range and precision floating > > point. > > > > One of the points (for others see > > http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0133.html), > > I tried to make was that a lexical-based comparision for evaluation of min > > and max > > constraints could be substantially faster than conversion to double and > > then IEEE floating point comparision. And that unless you really desired > > to mimic the rounding effects of IEEE, you should not > > have to pay the penalty. Basically that a lexical-based comparision would > > see 1.00000000000000000000000001 as greater than one and would accept it if > > you had a <minExclusive value="1.0"/>. A IEEE > > based comparision would have to realize that 1.00000000000000000001 rounds > > to 1.0 and that 1.0 is not greater than 1.0. > > > > Unfortunately, I could not provide benchmarks to quantify the difference. > > So today, I finally did some benchmarks. Hopefully, these can be useful > > independent of the schema debate. All times are > > reported in GetTickCount() values for a Pentium II 400 running Windows 2000 > > Profession with code compiled using VC6 with the default Release settings. > > > > First, was a timing of 600000 floating point comparisions against 0 (0 was > > converted or parsed outside the loop). All values were within the range > > and precision of double and no NaN's, Infinity's or > > illegal lexicals were in the test set. > > > > a) 600000 Unicode strings compared using a home-grown lexical comparision : > > 300 ticks > > b) Equivalent ASCII strings converted using atof() and compared: 5308 ticks > > c) Equivalent ASCII strings converted using sscanf() and compared: 6379 > > ticks > > d) Unicode strings converted using VarR8FromStr (COM Automation support > > routine): 1552 ticks > > > > I wasn't successful in the time I had allotted to benchmark conversion > > using std::basic_istream<XMLCh>. > > > > So conversion was between 5-20 times slower than a lexically-based > > comparision. VarR8FromStr is much more efficient than the C RTL's atof() > > function, but at the cost of platform independence. > > > > atof() or sscanf() will also not give you consistent results if C++'s > > double is not IEEE on the particular platform where the lexical comparision > > should give identical results on all platforms. > > > > To put this in perspective to parsing time, I benchmarked reading a data > > file for one of our programs (262000 numbers compared against a > > preconverted 0) in various modes. > > > > Expat, no numeric comparision: 2610 > > Expat, lexical comparision: 2804 > > Expat: VarR8FromStr: 3525 > > Expat, atof comparision: 5384 > > > > Xerces (1_1_0_d05), non-validating, no comparision: 6679 > > Xerces, validating, no comparision: 6789 > > Xerces: Non-Validating, lexical range: 8800 > > Xerces: Validating, lexical: 9000 > > Xerces: Validating, VarR8: 9900 > > Xerces: Non-Validating, VarR8: 9850 > > > > So, using conversion for bound checking can almost than double the parse > > time for a numerically intensive file. > > > > I'd like to see: > > > > 1) Putting a real type back in, make float, double and decimal derived from > > real > > 2) Using lexical validation for real > > 3) Use lexical validation as a first (and typically only) phase for > > validation of float and double types. From the lexical validation, you can > > detect if you are potentially in an area where rounding > > could give you difference answers. Since rounding should affect only a > > tiny fraction of comparisions, then a conversion-based comparision would > > rarely ever be needed. Alternative, min and max > > constraints could be explicitly stated in the schema draft to be performed > > lexically. > > 4) Don't depend on C RTL for conversion of double for type-aware DOM's etc. > > > > I'll make my lexical comparision code available to the Apache project, if > > anyone is interested.