Re: Using rr to track down intermittent test failures
The Timelapse project is cool but this thread got derailed. We have a long list of future improvements to make to rr, and improving support for JS debugging is on that list. The point of this thread is that if you're debugging intermittent test failures and you don't need much JS debugging then you should try rr if you can. People may also find that for general Gecko debugging, using rr is better than gdb alone. I generally do. Rob -- Jtehsauts tshaei dS,o n Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On Wed, Apr 16, 2014 at 1:52 AM, Ehsan Akhgari ehsan.akhg...@gmail.comwrote: On 2014-04-15, 7:14 PM, Gijs Kruitbosch wrote: On 16/04/2014 00:05, Robert O'Callahan wrote: We just released rr 1.2 and I think this would be a good time for people to try to use it for one of the tasks it was designed for: debugging intermittent test failures. This is awesome! Three questions: 1) Is anyone working on something similar that works for frontend code (particularly, chrome JS)? I realize we have a JS debugger, but sometimes activating the debugger at the wrong time makes the bug go away, and then there's timing issues, and that the debugger doesn't stop all the event loops and so stopping at a breakpoint sometimes still has other code execute in the same context... AIUI your post, because the replay will replay the same Firefox actions, firing up the JS debugger is impossible because you can't make the process do anything. I think you want to consult the spidermonkey hackers to see how feasible that will be... At the JS work week in Toronto in late March, we discussed this. Unfortunately, that was one of the relatively few sessions for which no protocol exists. :( The gist of the results was that, sadly, this is incredibly hard to pull off, if at all. Pretty much any JS program of meaningful size has complex interactions with the DOM and the network. For those, we can't prove whether it's possible to replay instructions without completely changing the outcome. I guess if we were to implement a complete recording/replaying shim of the DOM and network APIs, we could overcome this issue. It's certainly a very different project from rr, though. (Just for the record: I would *love* having this capability, and was very disappointed when the conversation converged on this conclusion.) ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On 04/15/2014 07:14 PM, Gijs Kruitbosch wrote: 1) Is anyone working on something similar that works for frontend code (particularly, chrome JS)? I realize we have a JS debugger, but sometimes activating the debugger at the wrong time makes the bug go away, and then there's timing issues, and that the debugger doesn't stop all the event loops and so stopping at a breakpoint sometimes still has other code execute in the same context... AIUI your post, because the replay will replay the same Firefox actions, firing up the JS debugger is impossible because you can't make the process do anything. While clearly unsuitable for debugging Firefox chrome or Firefox-specific issues since it's based on safari/webkit, it's worth calling out Brian Burg's timelapse project as prior art in this field: https://github.com/burg/timelapse/ http://homes.cs.washington.edu/~burg/projects/timelapse/ Andrew ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
Building a debugger for a high-level language on top of a low-level recording is unexplored territory but it's definitely possible and it would have some nice features. However, you can't get much leverage from any existing debugging support built into a VM. We could build some JS debugging support on top of rr like this: 1) Add support to extract JS program state from a stopped Firefox process, by reading memory and register contents. Basically, completely passive stack walking and value/heap inspection. There are two sub-approaches that can be mixed and matched here: * Duplicating some functionality of the JS engine into the debugging layer so it can run outside the stopped Firefox process. * Add a way to interpret JS engine machine code *as if* it was running in the context of the stopped process. Side effects (e.g. memory writes) would be buffered temporarily and thrown away once we've got the results we need. Basically, this would let us support running user code for debugging purposes during an rr replay. 2) Add support for JS breakpoints. Map a JS breakpoint to a conditional breakpoint in the interpreter C++ code, or one (or more) locations in compiled code. This mapping may be possible just by inspecting VM state, or we may need to monitor VM execution (e.g. compilation) to maintain this mapping during replay. I think if we can accept some heuristics and approximations here, it could work OK. E.g., just setting breakpoints at the beginning of non-inlined functions and at loop heads would go quite a long way. That's makeshift. Ultimately you want to take a completely different approach to debugging both JS and C++, e.g. by building something like Chronicle + Chronomancer on top of rr. But, baby steps. Rob -- Jtehsauts tshaei dS,o n Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On Tue, Apr 15, 2014 at 4:14 PM, Gijs Kruitbosch gijskruitbo...@gmail.com wrote: On 16/04/2014 00:05, Robert O'Callahan wrote: We just released rr 1.2 and I think this would be a good time for people to try to use it for one of the tasks it was designed for: debugging intermittent test failures. This is awesome! Three questions: 1) Is anyone working on something similar that works for frontend code (particularly, chrome JS)? I realize we have a JS debugger, but sometimes activating the debugger at the wrong time makes the bug go away, and then there's timing issues, and that the debugger doesn't stop all the event loops and so stopping at a breakpoint sometimes still has other code execute in the same context... AIUI your post, because the replay will replay the same Firefox actions, firing up the JS debugger is impossible because you can't make the process do anything. 2) Is anyone working on making this available on our TBPL infra for try pushes? 3) Is support for other platforms than Linux/gdb (thinking Mac/lldb particularly) planned? ~ Gijs ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform I definitely recall attending a brownbag presentation from a researcher who had built a record and replay scheme for JS at the office in MV sometime in the last few years. - Kyle ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On 16/04/2014 00:05, Robert O'Callahan wrote: We just released rr 1.2 and I think this would be a good time for people to try to use it for one of the tasks it was designed for: debugging intermittent test failures. This is awesome! Three questions: 1) Is anyone working on something similar that works for frontend code (particularly, chrome JS)? I realize we have a JS debugger, but sometimes activating the debugger at the wrong time makes the bug go away, and then there's timing issues, and that the debugger doesn't stop all the event loops and so stopping at a breakpoint sometimes still has other code execute in the same context... AIUI your post, because the replay will replay the same Firefox actions, firing up the JS debugger is impossible because you can't make the process do anything. 2) Is anyone working on making this available on our TBPL infra for try pushes? 3) Is support for other platforms than Linux/gdb (thinking Mac/lldb particularly) planned? ~ Gijs ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On Wed, Apr 16, 2014 at 11:05 AM, Robert O'Callahan rob...@ocallahan.orgwrote: 4) Create a script somewhere called rr-record that does exec ~/rr/bin/rr record $* Sorry; this script, of course, needs to exec rr from wherever it was installed. Rob -- Jtehsauts tshaei dS,o n Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On 2014-04-15, 7:14 PM, Gijs Kruitbosch wrote: On 16/04/2014 00:05, Robert O'Callahan wrote: We just released rr 1.2 and I think this would be a good time for people to try to use it for one of the tasks it was designed for: debugging intermittent test failures. This is awesome! Three questions: 1) Is anyone working on something similar that works for frontend code (particularly, chrome JS)? I realize we have a JS debugger, but sometimes activating the debugger at the wrong time makes the bug go away, and then there's timing issues, and that the debugger doesn't stop all the event loops and so stopping at a breakpoint sometimes still has other code execute in the same context... AIUI your post, because the replay will replay the same Firefox actions, firing up the JS debugger is impossible because you can't make the process do anything. I think you want to consult the spidermonkey hackers to see how feasible that will be... 2) Is anyone working on making this available on our TBPL infra for try pushes? Bug 996910. :-) 3) Is support for other platforms than Linux/gdb (thinking Mac/lldb particularly) planned? rr relies heavily on Linux. I think an x86-64 port is feasible, but I don't think we can ever get it to work on a non-FOSS OS. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On Wed, Apr 16, 2014 at 11:05 AM, Robert O'Callahan rob...@ocallahan.orgwrote: 1) Install rr on a Westmere-or-later Linux system (or VM with performance counters virtualized), build 32-bit Firefox (opt or debug) and verify that recording Firefox works for you. If it doesn't, please file a github issue or check https://github.com/mozilla/rr/issues/973 if your system is Haswell-based. As a rule of thumb, anything Core i3/i5/i7 or later will work, and anything Core 2 or older will not work. BTW our support starts with Nehalem which is actually older than Westmere. See https://software.intel.com/en-us/articles/intel-architecture-and-processor-identification-with-cpuid-model-and-family-numbers Rob -- Jtehsauts tshaei dS,o n Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On Wed, Apr 16, 2014 at 11:14 AM, Gijs Kruitbosch gijskruitbo...@gmail.comwrote: 1) Is anyone working on something similar that works for frontend code (particularly, chrome JS)? I realize we have a JS debugger, but sometimes activating the debugger at the wrong time makes the bug go away, and then there's timing issues, and that the debugger doesn't stop all the event loops and so stopping at a breakpoint sometimes still has other code execute in the same context... AIUI your post, because the replay will replay the same Firefox actions, firing up the JS debugger is impossible because you can't make the process do anything. Your understanding is correct. Implementing some kind of proper JS debugging on top of rr is technically feasible, and would actually be super awesome for our front-end contributors and also Web developers --- but it would be a lot of work. It's possible, but rather painful, to figure out what JS is doing using gdb. More and better gdb helper scripts would improve that situation. So rr isn't a great solution for JS developers at this time ... unless you're really desperate. Looking at browser-chrome, maybe we should be desperate :-). If you are desperate, try using rr to record and replay bugs that matter to you. If that works, it may be worth investing to make JS debugging through rr+gdb a bit more palatable. 2) Is anyone working on making this available on our TBPL infra for try pushes? Having a set of rr-enabled tests running on TBPL would be good but there are a few issues: -- Some bugs might not reproduce at all under rr. So it would be unwise to stop running our not-under-rr tests. -- The test slaves running rr would need to meet rr's specs and be bare-metal or VMs with perf counter virtualization enabled. -- We would need a story for debugging test failures recorded by rr in our test farm. ssh into the box is the easiest, technically. We haven't done any work to make traces portable and generally that's going to be either fragile or slow to replay. 3) Is support for other platforms than Linux/gdb (thinking Mac/lldb particularly) planned? Mac might be feasible but would be a massive amount of work. Other projects would be more valuable (e.g. x86-64, Android, improved debugging features). So, no. Rob -- Jtehsauts tshaei dS,o n Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On 4/15/2014 7:05 PM, Robert O'Callahan wrote: The steps to get started doing this are roughly as follows: 1) Install rr on a Westmere-or-later Linux system (or VM with performance counters virtualized), build 32-bit Firefox (opt or debug) and verify that recording Firefox works for you. If it doesn't, please file a github issue or check https://github.com/mozilla/rr/issues/973 if your system is Haswell-based. Do you have any idea on a timeframe for x86-64 support? I have a 64-bit Ubuntu install, and historically it's a bit of a pain to get 32-bit Firefox running. (Alternately, if someone wants to figure out instructions for getting a 32-bit Firefox built and running on Ubuntu, that would also be helpful.) -Ted ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
- Original Message - Do you have any idea on a timeframe for x86-64 support? I have a 64-bit Ubuntu install, and historically it's a bit of a pain to get 32-bit Firefox running. (Alternately, if someone wants to figure out instructions for getting a 32-bit Firefox built and running on Ubuntu, that would also be helpful.) Your wish is granted: https://github.com/padenot/fx-32-on-64.sh -Nathan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using rr to track down intermittent test failures
On Wed, Apr 16, 2014 at 12:24 PM, Ted Mielczarek t...@mielczarek.org wrote: Do you have any idea on a timeframe for x86-64 support? It's technically not that hard, but it's a reasonably large project so it's not going to happen right away. Rob -- Jtehsauts tshaei dS,o n Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform