Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
Hi Paul, On Sun, 6 Aug 2017, Paul Wise wrote: On Sun, 2017-08-06 at 11:51 +0200, Yury V. Zaytsev wrote: We have a patch that you could try, it would be great if you could tell whether this fixes the problem for you: http://midnight-commander.org/attachment/ticket/3846/3846_mcview_hook.patch Thanks, that fixes the problem. I appreciate you confirming this, just landed in master. Denis, do you think you could find time to update the package to 4.8.19? I don't know when we'll make a 4.8.20 release, probably in the coming months, but it would be good to get bugfixes in 4.8.19 out... -- Sincerely yours, Yury V. Zaytsev
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
On Sun, 2017-08-06 at 11:51 +0200, Yury V. Zaytsev wrote: > We have a patch that you could try, it would be great if you could tell > whether this fixes the problem for you: > > http://midnight-commander.org/attachment/ticket/3846/3846_mcview_hook.patch Thanks, that fixes the problem. > The initial report lacked a critical piece of information, and > specifically that you have pressed "Ctrl+x q" during panel initialization > and this caused the crash, and not merely observed a "random crash on > startup". Yeah, initially I thought it was a random crash. It was only after some more investigation that I discovered it was actually reproducible. > This is the reason why I originally assumed that either the pointer wasn't > set to NULL after getting freed, or something else overwrote it to point > to unitialized memory. I see. > Yes, this log has been really useful and now it is more or less clear > what's going on. Good to hear, thanks for the info. > I think that you are misinterpreting your valgrind report. Yeah, that is clear from your description of the problem. > If you run mc under valgrind ... which we routinely do ... > we do use static analysis tools like cppcheck ... > we have been using Coverity for the last 5 years or so Good to hear :) > as fuzzing is concerned ... I see, thanks for the info. -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
Hi Paul, We have a patch that you could try, it would be great if you could tell whether this fixes the problem for you: http://midnight-commander.org/attachment/ticket/3846/3846_mcview_hook.patch On Sat, 5 Aug 2017, Paul Wise wrote: On Fri, 2017-08-04 at 23:20 +0200, Yury V. Zaytsev wrote: The most obvious reason for the segfault would have been that somebody has freed `view->filename_vpath` without setting it to NULL, but a quick grep for `->filename_vpath` doesn't reveal anything suspicious :-( Discussion on IRC made us think that it was using memory that has been allocated but not initialised, because the MALLOC_PERTURB_=254 MALLOC_CHECK_=2 combination causes glibc to poison new memory allocations with 0xFE and if you then see 0xFE anywhere, that means that the program is buggy and interacting with uninitialised memory. The initial report lacked a critical piece of information, and specifically that you have pressed "Ctrl+x q" during panel initialization and this caused the crash, and not merely observed a "random crash on startup". This is the reason why I originally assumed that either the pointer wasn't set to NULL after getting freed, or something else overwrote it to point to unitialized memory. The issue is easily reproducible for me and I think you could reproduce it by introducing a 5 second sleep call just before the panel setup. So far, I wasn't able reproduce it, at least not that easily. PS: you might be interested in the valgrind output for one of the sessions where I was able to press Ctrl+x q early enough. The log clearly shows that valgrind found a number of invalid reads. https://people.debian.org/~pabs/tmp/valgrind-mc.gz Yes, this log has been really useful and now it is more or less clear what's going on. Specifically, a dialog started processing an event you triggered by pressing "Ctrl+x q" before it was fully initialized, and this lead to the crash you observed. Instead, events should be ignored during the initialization phase, or queued for later processing. A similar issue with another dialog (much easier to reproduce) has been fixed in 4.8.19 as I said earlier. Unfortunately, such issues haven't been consistently considered by past developers, so many more of the same type of problems might be lurking around the widget system. PS: I would suggest running the mc code through some static analysis tools, either via check-all-the-things or by manually running the tools listed in these two configuration files: I think that you are misinterpreting your valgrind report. All of the very many invalid reads you observed are caused by the same underlying cause, and specifically you triggering an event processing before the dialog has been fully initialized. If you run mc under valgrind operating normally which we routinely do, the report should be clean. And yes, we do use static analysis tools like cppcheck, experimental compiler warnings (which have been actually much more useful than the former), Google Sanitizers and the like. If you are fine with using proprietary services to improve free code, you could look at Coverity. You could also get Google to do some fuzzing of the mc codebase. Indeed, we have been using Coverity for the last 5 years or so. In as far as fuzzing is concerned, unfortunately, mc hasn't been programmed to be robust against invalid inputs to begin with and is written in a very unforgiving language, so a lot of work is still to be done for fuzzing to be actually useful instead of simply revealing the truth above. -- Sincerely yours, Yury V. Zaytsev
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
On Fri, 2017-08-04 at 23:20 +0200, Yury V. Zaytsev wrote: > The most obvious reason for the segfault would have been that somebody has > freed `view->filename_vpath` without setting it to NULL, but a quick grep > for `->filename_vpath` doesn't reveal anything suspicious :-( Discussion on IRC made us think that it was using memory that has been allocated but not initialised, because the MALLOC_PERTURB_=254 MALLOC_CHECK_=2 combination causes glibc to poison new memory allocations with 0xFE and if you then see 0xFE anywhere, that means that the program is buggy and interacting with uninitialised memory. https://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html http://udrepper.livejournal.com/11429.html > So this makes me think that maybe something else corrupted the memory > indirectly, but then I can't see how this can be reasonably investigated > without a sanitizer like report or even better a reproducer... The issue is easily reproducible for me and I think you could reproduce it by introducing a 5 second sleep call just before the panel setup. PS: you might be interested in the valgrind output for one of the sessions where I was able to press Ctrl+x q early enough. The log clearly shows that valgrind found a number of invalid reads. https://people.debian.org/~pabs/tmp/valgrind-mc.gz PS: I would suggest running the mc code through some static analysis tools, either via check-all-the-things or by manually running the tools listed in these two configuration files: https://anonscm.debian.org/cgit/collab-maint/check-all-the-things.git/tree/data/c.ini https://anonscm.debian.org/cgit/collab-maint/check-all-the-things.git/tree/data/elf.ini If you are fine with using proprietary services to improve free code, you could look at Coverity. You could also get Google to do some fuzzing of the mc codebase. https://scan.coverity.com/ https://www.linux.com/blog/2017/7/googles-oss-fuzz-tool-helps-secure-open-source-projects -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
Oops its ctrl+x q of course :) Message d'origine De: Yury V. Zaytsev Envoyé: samedi 5 août 2017 10:09 À: Denis Briand Cc: bug Objet: Re: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV) On Sat, 5 Aug 2017, Denis Briand wrote: > On Sat, Aug 05, 2017 at 12:07:39AM +0200, Yury V. Zaytsev wrote: >> On Fri, 4 Aug 2017, Denis Briand wrote: >> >>> No I can't reproduce it with git master master and the sid version, >>> because the start is too quick... ^^ My valgrind report is clean. I >>> attached the Paul valgrind report on the upstream bug ticket. >> >> Well, it claims that the memory that viewer filename is pointing to is freed >> in set_display_type(), but I have no idea why this is happening, and why >> viewer is even involved here in the first place. Also, do you have a clue as >> to what Ctrl+x g is supposed to do? > > Yes Ctrl+x g open a quick view panel. Hmmm, it's Ctrl+x q for me, but now at least it start making *some* sense. Are you sure it's really Ctrl+x g, maybe pabs is using some customized keymap layout? Anyways, it sounds very much like http://www.midnight-commander.org/ticket/3700 which has been fixed in 4.8.19, and as you can see, ASan traces a much nicer than what comes out of Valgrind, but whatever. So, would be good to know if this still appears in 4.8.19 and/or git master, since it seems that pabs is able to reproduce it. If this doesn't appear anymore, then it's #3700 and we don't have to do anything. Also, when forwarding bugs, please set version field correctly. By setting it to master, you are claiming that you could reproduce it on master, whereas, in fact, it's happening on a version that is 1 year old. -- Sincerely yours, Yury V. Zaytsev
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
On Sat, 5 Aug 2017, Denis Briand wrote: On Sat, Aug 05, 2017 at 12:07:39AM +0200, Yury V. Zaytsev wrote: On Fri, 4 Aug 2017, Denis Briand wrote: No I can't reproduce it with git master master and the sid version, because the start is too quick... ^^ My valgrind report is clean. I attached the Paul valgrind report on the upstream bug ticket. Well, it claims that the memory that viewer filename is pointing to is freed in set_display_type(), but I have no idea why this is happening, and why viewer is even involved here in the first place. Also, do you have a clue as to what Ctrl+x g is supposed to do? Yes Ctrl+x g open a quick view panel. Hmmm, it's Ctrl+x q for me, but now at least it start making *some* sense. Are you sure it's really Ctrl+x g, maybe pabs is using some customized keymap layout? Anyways, it sounds very much like http://www.midnight-commander.org/ticket/3700 which has been fixed in 4.8.19, and as you can see, ASan traces a much nicer than what comes out of Valgrind, but whatever. So, would be good to know if this still appears in 4.8.19 and/or git master, since it seems that pabs is able to reproduce it. If this doesn't appear anymore, then it's #3700 and we don't have to do anything. Also, when forwarding bugs, please set version field correctly. By setting it to master, you are claiming that you could reproduce it on master, whereas, in fact, it's happening on a version that is 1 year old. -- Sincerely yours, Yury V. Zaytsev
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
On Sat, Aug 05, 2017 at 12:07:39AM +0200, Yury V. Zaytsev wrote: > On Fri, 4 Aug 2017, Denis Briand wrote: > > >No I can't reproduce it with git master master and the sid version, > >because the start is too quick... ^^ My valgrind report is clean. I > >attached the Paul valgrind report on the upstream bug ticket. > > Well, it claims that the memory that viewer filename is pointing to is freed > in set_display_type(), but I have no idea why this is happening, and why > viewer is even involved here in the first place. Also, do you have a clue as > to what Ctrl+x g is supposed to do? Yes Ctrl+x g open a quick view panel. pgp4v2mRXtnDN.pgp Description: PGP signature
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
On Fri, 4 Aug 2017, Denis Briand wrote: No I can't reproduce it with git master master and the sid version, because the start is too quick... ^^ My valgrind report is clean. I attached the Paul valgrind report on the upstream bug ticket. Well, it claims that the memory that viewer filename is pointing to is freed in set_display_type(), but I have no idea why this is happening, and why viewer is even involved here in the first place. Also, do you have a clue as to what Ctrl+x g is supposed to do? Under valgrind, mc starts rather slowly for me in /usr/bin, but I can't reproduce it on 4.8.19... -- Sincerely yours, Yury V. Zaytsev
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
On Fri, Aug 04, 2017 at 11:29:43PM +0200, Yury V. Zaytsev wrote: > On Fri, 4 Aug 2017, Denis Briand wrote: > > >Step to reproduce: > >$ MALLOC_PERTURB_=254 MALLOC_CHECK_=2 mc > >then during the startup: hit "Ctrl+x g" keys combo > > > >Important: you must hit "Ctrl+x g" keys before the panels are displayed. > >To reproduce this long start, you have got 370 sub-directories in the > >current directory. > > Hi Denis, > > Ah, so you've got a reproducer, great! Are you able to reproduce it on git > master? Can you easily switch ASan on? > > -- > Sincerely yours, > Yury V. Zaytsev Hi Yury, No I can't reproduce it with git master master and the sid version, because the start is too quick... ^^ My valgrind report is clean. I attached the Paul valgrind report on the upstream bug ticket. Denis pgpl30bPOGwRP.pgp Description: PGP signature
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
On Fri, 4 Aug 2017, Denis Briand wrote: Step to reproduce: $ MALLOC_PERTURB_=254 MALLOC_CHECK_=2 mc then during the startup: hit "Ctrl+x g" keys combo Important: you must hit "Ctrl+x g" keys before the panels are displayed. To reproduce this long start, you have got 370 sub-directories in the current directory. Hi Denis, Ah, so you've got a reproducer, great! Are you able to reproduce it on git master? Can you easily switch ASan on? -- Sincerely yours, Yury V. Zaytsev
Bug#870728: [Pkg-mc-devel] Bug#870728: mc: random crash on startup (SIGSEGV)
Hi Paul, Thanks for the report! To be very honest, I'm not sure how actionable it is though :-/ The most obvious reason for the segfault would have been that somebody has freed `view->filename_vpath` without setting it to NULL, but a quick grep for `->filename_vpath` doesn't reveal anything suspicious :-( So this makes me think that maybe something else corrupted the memory indirectly, but then I can't see how this can be reasonably investigated without a sanitizer like report or even better a reproducer... -- Sincerely yours, Yury V. Zaytsev