Re: [v8-dev] Worrying JIT bug - don't know how to report or bisect

sebp . mueller Fri, 22 Jul 2016 03:00:52 -0700

Hi Yang,

thank you for trying to read my description - I know it's very hard to 
understand just from reading my description - I have some screenshots and 
diagrams that could explain better what I tried to describe in my original 
mail and I will see whether I can and how to share them. 
I know that "callee" isn't in the spec and we are not using it in our 
application code, however Chrome does implement that feature and I was 
using it to debug the state of the stack - for all other stack elements the 
value of "arguments.callee" was consistently the function being called in 
the parent frame, but not for the erroneous stack frame. I'm not saying the 
bug is that "callee" has the wrong value. It's just that "callee" having 
the wrong value backs up my thesis. All the closure variables in the call 
stack have the wrong values in the debugger in this case.


Here is structurally very similar code to the code that fails:

function wrap(inner, stuff){
 if (true){
    var _inner = inner;
    var _stuff = stuff;
    
    var res = function(){
        var ar = arguments[arguments.length - 1];
        if (_stuff !== ar)
           console.log("arg is wrong" + ar + " expected " + _stuff)
      _inner.apply(this, arguments);
    };
    
    return res;
 }
}

function doTest(count){
  function someFunc(o, nmb){
    nmb !== 42 && console.log("someFunc broken");
    o.foo(nmb + 1);
  }

  var someFunc2 = wrap(someFunc, 42);

  function innerInner(nmb){
    nmb !== 43 && console.log("innerInner broken");
  }

  var o = {foo: wrap(innerInner, 43)};

  for (var i = 0; i < count; i++){
   someFunc2(o,42);
  }
}

...however with that code I cannot reproduce the problem and I am sure you 
won't, either, so it's probably not worth executing :-). 
The code wraps two functions passing the expected value to the wrapping 
function which stores the value in the closure of the wrapping function 
("res")
Now in my application it sometimes happens that for the inner function the 
check for the outer function is actually executed - the invocation is 
correct, however "_stuff" has the wrong value at the time of the 
invocation. Both jacket functions share the same code but have different 
closure variables and for some reason sometimes the second jacket function 
suddenly becomes the first or the second jacket function uses the wrong 
closure context.

I cannot just upload my larger test-case publicly because it contains code 
parts that I cannot easily publish due to licensing restrictions. Is there 
a way I can non-publicly share that code with the V8 team?

kind regards - Sebastian

Am Freitag, 22. Juli 2016 11:11:48 UTC+2 schrieb Yang Guo:
>
> Hi Sebastian,
>
> thanks for consulting with us!
>
> It seems that arguments.callee is not even specified in the latest 
> iteration of the spec (https://tc39.github.io/ecma262/). Therefore I 
> would not be surprised if it does not behave as expected. Arguments being 
> an exotic object has its quirks.
>
> That being said, it would be nice if you could provide us with concrete 
> examples or a repro. It's very hard to figure out what you mean by just 
> reading a textual description of what your code is supposed to do.
>
> Cheers,
>
> Yang
>
> On Fri, Jul 22, 2016 at 10:51 AM <[email protected] <javascript:>> 
> wrote:
>
>> Hi all,
>>
>> I am very certain I found an extremely nasty JIT compiler bug in the 
>> current version of Chrome (which is 51.0.2704.103 m for me at the time 
>> of writing):
>> The problem is that I am having a really hard time finding a test-case 
>> that would be acceptable in a CRBug issue. The issue to me only happens in 
>> about 1 of 30 page reloads. With some of my colleages it happens more often 
>> and customers have reported it happening every 5 to 10 times, but it only 
>> ever happens on a large code base (about 2 to 10 Megabytes of minified 
>> Javascript code) and stripping down the problem is extremely difficult 
>> because of the non-deterministic reproducibility: If I reduce the code and 
>> the problem goes away, I cannot be sure whether this is just bad luck (and 
>> I need to reload 10 or 50 more times) or I actually need to bisect in the 
>> other direction and the problem really went away. How do you guys approach 
>> such a problem? 
>>
>> Here is what I found:
>> In our Javascript library we offer the possibility to perform additional 
>> runtime type checks for all the public API members: For each API member we 
>> dynamically create a wrapper function that accesses separate meta-data that 
>> it uses to perform type checks on the arguments of the function. The result 
>> is that for every function we have defined, there is a another function 
>> which ultimately delegates to the original function using "call(this, 
>> arguments)" but only after it has checked the arguments for valididity. The 
>> meta data for each check is found in the closure of the function, so the 
>> source code is shared between all wrapper functions, but each instance of 
>> the function has a different closure context that keeps references to the 
>> types and argument counts. 
>> Now some of the type checks are more complex and it can happen that 
>> during a type check we descend recursively into some parts of our API and 
>> so the same typechecking source code may be entered recursively (but with a 
>> different context) - the typechecking framework itself does in fact 
>> type-check itself. This works perfectly in 95% of the time in Chrome and it 
>> works flawlessley without problems in older versions of Chrome (48?) and 
>> all other browsers. However with recent version updates we found that 
>> sometimes we were getting stack overflows (maximum call stack size 
>> exceeded) and the closer analysis showed that the following is happening: 
>> Sometimes when we call a function (which is actually the type check 
>> wrapped variant) and then inside the original function another type check 
>> is performed - the closure context of the inner type check suddenly gets 
>> the values from the outer frame and if the values are fine from the 
>> typecheck point of view this of course results in an endless recursion. 
>> Stackoverflows aren't caught by ("break on exception") so inspecting the 
>> stack was not possible for us. However we found a second exception that 
>> happened directly *after* the stack overflow when more code was executed 
>> and this time we were able to "break on exception" and inspect the stack 
>> frames and this is what we found:
>> In the stack we can see that one method call results in a type check and 
>> inside the type check another (different) type checked method is called, 
>> however if we look at "arguments.callee" and if we inspect the closure 
>> variables in the inner type check we get the values from the ancestor 
>> stackframes: so even though one method is called, inside that very same 
>> method in the next stackframe "arguments.callee" is different from the 
>> reference at the call site. This should actually be an invariant if I'm not 
>> very much mistaken: when I call a method, inside that method 
>> "arguments.callee" should be the referential same entity that I am calling, 
>> but this is not the case in test when the other exception happens. I only 
>> get an exception because my type check complains that it is being invoked 
>> with the wrong arguments (because the type meta data is wrong and the check 
>> is performed in a different method). If I didn't get that exception the 
>> code would happily continue to execute, but working with the wrong 
>> data/closure context. This is what worries me: It's very hard to debug 
>> because it only happens sometimes after a reload (probably when JIT kicks 
>> in for the first time or something like that) and often time it will not 
>> result in an exception but you will "just" get data corruption. 
>> That's why I think this bug is a really severe bug that we should try to 
>> fix as soon as possible, because this could cause thousands of hours of 
>> Javascript debugging being wasted for a bug that just isn't in the 
>> javascript, but in the JIT (I believe). And that's why I would like to 
>> report that bug, but my current test-case is several megabytes in size and 
>> executes for one or two seconds before it fails and this only happens every 
>> few dozen times. Do you have an idea how to debug this? How could I trim 
>> down the test-case? Would a memory heap help here? I took a heap dump once 
>> I was in that bad state to inspect the closure/context values of the 
>> functions in question, but I was not able to view the optimized code with 
>> the Chrome developer tools.
>>
>> Any help or advice you might have would be greatly appreciated. I would 
>> hate it to know that there is a bug like this in Chrome that is sitting 
>> there and can break applications or corrupt data almost randomly and 
>> knowing that you cannot avoid it and there is no way to fix it.
>>
>> Thanks - Sebastian
>>
>> -- 
>> -- 
>> v8-dev mailing list
>> [email protected] <javascript:>
>> http://groups.google.com/group/v8-dev
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [v8-dev] Worrying JIT bug - don't know how to report or bisect

Reply via email to