Ok, I think I know what happens here.  We indeed get n^2 due to the
way stringinputbuffer works: it reads data in batches of 1024 bytes
(for this particular case).  In this case we have a long cons string
which branches to the right (build s += c).   So we spend a lot of
time just traveling down the tree to pick new bytes to serialize.  The
proof: if flatten the string before reading data, everything gets back
to normal.

So I am changing those toCString methods to use plain WriteToFlat methods, ok?

yours,
anton.

On Tue, Oct 20, 2009 at 4:49 PM, Anton Muhin <[email protected]> wrote:
> I agree with you, Erik, I just want now to understand why it is so slower.
>
> yours,
> anton.
>
> On Tue, Oct 20, 2009 at 4:47 PM, Erik Corry <[email protected]> wrote:
>> 2009/10/20 Erik Corry <[email protected]>:
>>> 2009/10/20 Anton Muhin <[email protected]>:
>>>>
>>>> On Tue, Oct 20, 2009 at 3:56 PM, William Hesse <[email protected]> wrote:
>>>>> I guess since I am the one interested, I'll look deeper into it when I 
>>>>> have
>>>>> the time.  I guess the bug can be closed.  Are we sure that the 9 second
>>>>> time we have now is comparable to the time they had before the slowdown?
>>>>
>>>> I am looking into it in a background mode.  Actually using
>>>> stringinputbeffer the way ToWideCString uses it is incredibly slow: if
>>>> I change this loop into WriteToFlat time goes from 33 secs to .5 secs
>>>> (sic!).
>>>
>>>
>>> I had stupidly assumed that ToWideCString used WriteToFlat already.
>>> It is almost certainly worth changing it so it does.  ToCString can
>>> also use WriteToFlat if the input is an ASCII string.  If we have to
>>> do UTF-8 comparisons we have to use StringInputBuffer.
>>
>> I meant conversions, not comparisons!
>>
>>>
>>>>
>>>> I suspect that decode char might be very expensive.  Don't think there
>>>> is any n^2 lurking around, but only investigation will show.
>>>>
>>>> And yes, I guess that is the number: I compared against the version
>>>> which flattens string first.
>>>>
>>>> yours,
>>>> anton.
>>>>
>>>>>
>>>>> On Tue, Oct 20, 2009 at 12:35 PM, Anton Muhin <[email protected]> wrote:
>>>>>>
>>>>>> On Tue, Oct 20, 2009 at 2:17 PM, William Hesse <[email protected]>
>>>>>> wrote:
>>>>>> > It looks to me like WriteToFlat is called once, to write to the 
>>>>>> > external
>>>>>> > resource, in the fixed case, and that
>>>>>> > it is called twice, once for the external resource and once for the
>>>>>> > ToCString in the debug assert, in the broken case.
>>>>>> > If that is the dominant cost, then why is the time increasing by a
>>>>>> > factor of
>>>>>> > 4 or more?  The time used should at most double.  I think the memcmp 
>>>>>> > and
>>>>>> > the
>>>>>> > allocation should not be that significant.  So I am wondering if all 
>>>>>> > the
>>>>>> > extra time is used in that call to ToCString, and if so, why is it so
>>>>>> > much
>>>>>> > slower than converting to external, which has to do the same work.  The
>>>>>> > whole thing is only 4 trials, on strings of 500,000 characters, and it
>>>>>> > takes
>>>>>> > 50 seconds in debug mode in the broken case, and only 9 seconds in the
>>>>>> > fixed
>>>>>> > case.  So how can doubling the number of calls to WriteToFlat increase
>>>>>> > the
>>>>>> > time by a factor of 4.5?
>>>>>>
>>>>>> Bill, do you want me to investigate that further?  I definitely would
>>>>>> if you like, just not sure this problem of top priority.
>>>>>>
>>>>>> My thoughts.  memcmp is cheap, but is not free.  And I'd expect
>>>>>> writing the string out to be somewhat more expensive than
>>>>>> WriteToString---additional checks, etc.  That's, for sure, just
>>>>>> speculations, so if you think I should look deeper, I'd do.
>>>>>>
>>>>>> yours,
>>>>>> anton.
>>>>>>
>>>>>> >
>>>>>> > On Tue, Oct 20, 2009 at 11:08 AM, Anton Muhin <[email protected]> 
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Bill, my stance: if one often converts a string, or randomly access
>>>>>> >> chars, etc., it is a good idea to flatten string.  However that
>>>>>> >> shouldn't be a default behaviour if you only want to serialize a
>>>>>> >> string into external buffer.
>>>>>> >>
>>>>>> >> And I don't think it has n^2 behaviour: you just move/compare tons of
>>>>>> >> chars several times.
>>>>>> >>
>>>>>> >> yours,
>>>>>> >> anton.
>>>>>> >>
>>>>>> >> On Tue, Oct 20, 2009 at 1:04 PM,  <[email protected]> wrote:
>>>>>> >> > LGTM, but is there still a problem with converting cons strings 
>>>>>> >> > using
>>>>>> >> > ToCString?
>>>>>> >> >  This method should not be this slow, so avoiding the call to it 
>>>>>> >> > just
>>>>>> >> > targets
>>>>>> >> > the symptoms, not the disease.  Does this have n^2 behavior?
>>>>>> >> >
>>>>>> >> > http://codereview.chromium.org/294019
>>>>>> >> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > William Hesse
>>>>>> > Software Engineer
>>>>>> > [email protected]
>>>>>> >
>>>>>> > Google Denmark ApS
>>>>>> > Frederiksborggade 20B, 1 sal
>>>>>> > 1360 København K
>>>>>> > Denmark
>>>>>> > CVR nr. 28 86 69 84
>>>>>> >
>>>>>> > If you received this communication by mistake, please don't forward it
>>>>>> > to
>>>>>> > anyone else (it may contain confidential or privileged information),
>>>>>> > please
>>>>>> > erase all copies of it, including all attachments, and please let the
>>>>>> > sender
>>>>>> > know it went to the wrong person. Thanks.
>>>>>> >
>>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> William Hesse
>>>>> Software Engineer
>>>>> [email protected]
>>>>>
>>>>> Google Denmark ApS
>>>>> Frederiksborggade 20B, 1 sal
>>>>> 1360 København K
>>>>> Denmark
>>>>> CVR nr. 28 86 69 84
>>>>>
>>>>> If you received this communication by mistake, please don't forward it to
>>>>> anyone else (it may contain confidential or privileged information), 
>>>>> please
>>>>> erase all copies of it, including all attachments, and please let the 
>>>>> sender
>>>>> know it went to the wrong person. Thanks.
>>>>>
>>>>>
>>>>
>>>> >>>>
>>>>
>>>
>>
>

--~--~---------~--~----~------------~-------~--~----~
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to