Ok, I think I know what happens here. We indeed get n^2 due to the way stringinputbuffer works: it reads data in batches of 1024 bytes (for this particular case). In this case we have a long cons string which branches to the right (build s += c). So we spend a lot of time just traveling down the tree to pick new bytes to serialize. The proof: if flatten the string before reading data, everything gets back to normal.
So I am changing those toCString methods to use plain WriteToFlat methods, ok? yours, anton. On Tue, Oct 20, 2009 at 4:49 PM, Anton Muhin <[email protected]> wrote: > I agree with you, Erik, I just want now to understand why it is so slower. > > yours, > anton. > > On Tue, Oct 20, 2009 at 4:47 PM, Erik Corry <[email protected]> wrote: >> 2009/10/20 Erik Corry <[email protected]>: >>> 2009/10/20 Anton Muhin <[email protected]>: >>>> >>>> On Tue, Oct 20, 2009 at 3:56 PM, William Hesse <[email protected]> wrote: >>>>> I guess since I am the one interested, I'll look deeper into it when I >>>>> have >>>>> the time. I guess the bug can be closed. Are we sure that the 9 second >>>>> time we have now is comparable to the time they had before the slowdown? >>>> >>>> I am looking into it in a background mode. Actually using >>>> stringinputbeffer the way ToWideCString uses it is incredibly slow: if >>>> I change this loop into WriteToFlat time goes from 33 secs to .5 secs >>>> (sic!). >>> >>> >>> I had stupidly assumed that ToWideCString used WriteToFlat already. >>> It is almost certainly worth changing it so it does. ToCString can >>> also use WriteToFlat if the input is an ASCII string. If we have to >>> do UTF-8 comparisons we have to use StringInputBuffer. >> >> I meant conversions, not comparisons! >> >>> >>>> >>>> I suspect that decode char might be very expensive. Don't think there >>>> is any n^2 lurking around, but only investigation will show. >>>> >>>> And yes, I guess that is the number: I compared against the version >>>> which flattens string first. >>>> >>>> yours, >>>> anton. >>>> >>>>> >>>>> On Tue, Oct 20, 2009 at 12:35 PM, Anton Muhin <[email protected]> wrote: >>>>>> >>>>>> On Tue, Oct 20, 2009 at 2:17 PM, William Hesse <[email protected]> >>>>>> wrote: >>>>>> > It looks to me like WriteToFlat is called once, to write to the >>>>>> > external >>>>>> > resource, in the fixed case, and that >>>>>> > it is called twice, once for the external resource and once for the >>>>>> > ToCString in the debug assert, in the broken case. >>>>>> > If that is the dominant cost, then why is the time increasing by a >>>>>> > factor of >>>>>> > 4 or more? The time used should at most double. I think the memcmp >>>>>> > and >>>>>> > the >>>>>> > allocation should not be that significant. So I am wondering if all >>>>>> > the >>>>>> > extra time is used in that call to ToCString, and if so, why is it so >>>>>> > much >>>>>> > slower than converting to external, which has to do the same work. The >>>>>> > whole thing is only 4 trials, on strings of 500,000 characters, and it >>>>>> > takes >>>>>> > 50 seconds in debug mode in the broken case, and only 9 seconds in the >>>>>> > fixed >>>>>> > case. So how can doubling the number of calls to WriteToFlat increase >>>>>> > the >>>>>> > time by a factor of 4.5? >>>>>> >>>>>> Bill, do you want me to investigate that further? I definitely would >>>>>> if you like, just not sure this problem of top priority. >>>>>> >>>>>> My thoughts. memcmp is cheap, but is not free. And I'd expect >>>>>> writing the string out to be somewhat more expensive than >>>>>> WriteToString---additional checks, etc. That's, for sure, just >>>>>> speculations, so if you think I should look deeper, I'd do. >>>>>> >>>>>> yours, >>>>>> anton. >>>>>> >>>>>> > >>>>>> > On Tue, Oct 20, 2009 at 11:08 AM, Anton Muhin <[email protected]> >>>>>> > wrote: >>>>>> >> >>>>>> >> Bill, my stance: if one often converts a string, or randomly access >>>>>> >> chars, etc., it is a good idea to flatten string. However that >>>>>> >> shouldn't be a default behaviour if you only want to serialize a >>>>>> >> string into external buffer. >>>>>> >> >>>>>> >> And I don't think it has n^2 behaviour: you just move/compare tons of >>>>>> >> chars several times. >>>>>> >> >>>>>> >> yours, >>>>>> >> anton. >>>>>> >> >>>>>> >> On Tue, Oct 20, 2009 at 1:04 PM, <[email protected]> wrote: >>>>>> >> > LGTM, but is there still a problem with converting cons strings >>>>>> >> > using >>>>>> >> > ToCString? >>>>>> >> > This method should not be this slow, so avoiding the call to it >>>>>> >> > just >>>>>> >> > targets >>>>>> >> > the symptoms, not the disease. Does this have n^2 behavior? >>>>>> >> > >>>>>> >> > http://codereview.chromium.org/294019 >>>>>> >> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > William Hesse >>>>>> > Software Engineer >>>>>> > [email protected] >>>>>> > >>>>>> > Google Denmark ApS >>>>>> > Frederiksborggade 20B, 1 sal >>>>>> > 1360 København K >>>>>> > Denmark >>>>>> > CVR nr. 28 86 69 84 >>>>>> > >>>>>> > If you received this communication by mistake, please don't forward it >>>>>> > to >>>>>> > anyone else (it may contain confidential or privileged information), >>>>>> > please >>>>>> > erase all copies of it, including all attachments, and please let the >>>>>> > sender >>>>>> > know it went to the wrong person. Thanks. >>>>>> > >>>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> William Hesse >>>>> Software Engineer >>>>> [email protected] >>>>> >>>>> Google Denmark ApS >>>>> Frederiksborggade 20B, 1 sal >>>>> 1360 København K >>>>> Denmark >>>>> CVR nr. 28 86 69 84 >>>>> >>>>> If you received this communication by mistake, please don't forward it to >>>>> anyone else (it may contain confidential or privileged information), >>>>> please >>>>> erase all copies of it, including all attachments, and please let the >>>>> sender >>>>> know it went to the wrong person. Thanks. >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >> > --~--~---------~--~----~------------~-------~--~----~ v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev -~----------~----~----~----~------~----~------~--~---
