[v8-dev] Re: StringToDouble rewritten not using String::Get and memory allocations.... (issue1096002)

Sergey Ryazanov Mon, 22 Mar 2010 11:53:20 -0700

It seems that strtod rounds a decimal to a closest number
representable in double ("24414062505131250" parses as
24414062505131248 and "24414062505131250.0.....01" 24414062505131252).
As I get from GNU source code it uses "multipercision numbers" for
exact representation of numbers.


Any double is representable as d*2^p where 0 <= d < 2^53 and ... < p
<= -1074 (considering subnormal numbers). Any positive double x has
exact decimal representation:
1) if 0 < x < 1: not more than ~770 significant digits (2^53 * 5^1074)
2) 2^53 < x < DBL_MAX: not more than 308 significant digits since it's
an integer and DBL_MAX ≈ 1.79769 × 10^308.
3) 1 < x < 2^53: not more than 60 significant digits.
(significant digits doesn't include leading and trailing zeros).

Let's we have a decimal with more than 770 significant digits. We want
to find a double closest to our number. Dropping other digits (as well
as changing them to any other digits) would give us right result
unless our number lays exactly between 2 adjacent doubles. Mean of 2
adjacent doubles may have not more that 771 digits (all other would be
zoros). If the first digits of our number are equal to the digits of
that number the result of rounding would depend on if the rest of
digits are zeros.

So conclusion is following: If we preserve at least 771 significant
digits and replace any nonzero tail by '1' we would never change
behavior of strtod.


On Sat, Mar 20, 2010 at 7:56 PM,  <[email protected]> wrote:
> I will discuss the
> 100000000000000000000000.0000000000000000000000000000000000000000000001
> issue
> with the V8 team on monday.
> The way I see it we have two options:
> 1. Follow ECMA-262 and round down, thus being incompatible with older
> versions.
> 2. Fallback to a more expensive reading when there are more than 20 digits.
>
> Pros/Cons for 1:
> Pro: Basically nothing to do. That's what we have now.
> Cons: Incompatible and we might numbers the "wrong" way. On the other hand
> these
> numbers have to be written by hand (toString/toExponential/toFixed will
> never
> produce a number that would make such problems). Therefore they are
> extremely
> rare.
>
> Pros/Cons for 2:
> Pro: Compatible with older variants of V8. Reading is correct. Might
> slightly
> simplify the fast case: the exponent would need to be in range -999 to 999.
> Cons: we would need to keep/add a fallback method. Maybe a template taking
> either a fixed-size buffer or a dynamic vector would do the trick, though.
>
>
>
> http://codereview.chromium.org/1096002/diff/3002/4003
> File src/conversions.cc (right):
>
> http://codereview.chromium.org/1096002/diff/3002/4003#newcode109
> src/conversions.cc:109: bool operator != (EndMarker const& m) const {
> return !(*this == m); }
> On 2010/03/19 15:46:12, SeRya wrote:
>>
>> On 2010/03/18 20:34:22, Erik Corry wrote:
>> > Some funky C++ here :-).  return !end_; seems simpler, but perhaps
>
> this is
>>
>> > somehow better?
>
>> Just a canonical form of != which simplifies maintenance (IMHO).
>
> I'm with Erik here.
> I still don't understand how this actually types. (Although I'm by no
> means a C++ expert).
> Also operator-overloading should be rare in Google code.
> Why not Peek(), AtEnd(), etc?
> This said, I'm not very familiar with V8 coding practices.
>
> http://codereview.chromium.org/1096002/diff/3002/4003#newcode496
> src/conversions.cc:496: const int max_exponent = INT_MAX / 2;
> On 2010/03/19 15:46:12, SeRya wrote:
>>
>> On 2010/03/19 13:39:43, Florian Loitsch wrote:
>> > This seems to be too complicated. A decimal number without leading
>
> 0s may only
>>
>> > have a decimal exponent of ~-400 to ~+400 before ending up being
>
> infinite or
>>
>> 0.
>
>> 1<1000 zeros>e-1000 == 1.
>
> Right you are.
>
> http://codereview.chromium.org/1096002/diff/3002/4003#newcode519
> src/conversions.cc:519: if (exponent != 0) {
> On 2010/03/19 15:46:12, SeRya wrote:
>>
>> On 2010/03/19 13:39:43, Florian Loitsch wrote:
>> > not that it really matters, but you could copy the exponent
>
> characters while
>>
>> > reading them, and just stop after 4 digits.
>> > This way you could avoid this part here.
>
>> It would mean another chunk of code that which drop leading zeros and
>
> check for
>>
>> junk tail. I'd prefer to simplify for now and may be add this
>
> optimization
>>
>> later.
>
> my comment was based on the assumption that the read exponent was in
> range -400 to +400. So disregard it.
>
> http://codereview.chromium.org/1096002/diff/21004/27003#newcode298
> src/conversions.cc:298: // 1. currnet == end (other ops are not
> allowed), current != end.
> Are we sure there is at least one character?
> If yes assert it.
> If not, and it is legal to access current[0] of empty string, explain.
>
> http://codereview.chromium.org/1096002/diff/21004/27003#newcode330
> src/conversions.cc:330: buffer[buffer_pos++] = '-';
> It might make sense to move the hexadecimal reading into a separate
> function.
>
> http://codereview.chromium.org/1096002/diff/21004/27003#newcode405
> src/conversions.cc:405: if (current == end) return signed_zero;
> I think it makes more sense to structure as follows:
> if (current == end) {
>  if (significant_digits == 0 && !leading_zero) {
>    // String was ".".
>    return JUNK_STRING_VALUE;
>  } else {
>    goto parsing_done;
>  }
> }
> if (significant_digits == 0) {
>  octal = false;
>  ...
> }
>
> http://codereview.chromium.org/1096002/diff/21004/27003#newcode451
> src/conversions.cc:451: }
> How should "123e" be parsed when "trailing junk is enabled?
> as "123" or as JUNK_STRING_VALUE?
> If it's the latter, then this is fine.
>
> http://codereview.chromium.org/1096002/diff/21004/27003#newcode456
> src/conversions.cc:456: ++current;
> As before: should 123e+ be parsed as 123 or JUNK_STRING_VALUE when
> trailing junk is enabled.
>
> http://codereview.chromium.org/1096002
>

-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

To unsubscribe from this group, send email to 
v8-dev+unsubscribegooglegroups.com or reply to this email with the words 
"REMOVE ME" as the subject.

[v8-dev] Re: StringToDouble rewritten not using String::Get and memory allocations.... (issue1096002)

Reply via email to