Re: cannot destroy faulty zvol
Hi. On 23.07.2017 0:28, Eugene M. Zheganin wrote: Hi, On 22.07.2017 17:08, Eugene M. Zheganin wrote: is this weird error "cannot destroy: already exists" related to the fact that the zvol is faulty ? Does it indicate that metadata is probably faulty too ? Anyway, is there a way to destroy this dataset ? Follow-up: I sent a similar zvol of the thexactly same size into the faulty one, zpool errors are gone, still cannot destroy the zvol. Is this a zfs bug ? Seems like it. zdb shows some "invisible" dataset not shown by zfs list -t all, one of them is a child dataset of the one I'm trying to destroy: # zdb -d zfsroot Dataset mos [META], ID 0, cr_txg 4, 60.7M, 1078 objects Dataset zfsroot/usr/src [ZPL], ID 1181, cr_txg 59832, 1.03G, 158964 objects Dataset zfsroot/usr/home [ZPL], ID 1197, cr_txg 59911, 21.7M, 551 objects Dataset zfsroot/usr/ports [ZPL], ID 1189, cr_txg 59881, 887M, 335670 objects Dataset zfsroot/usr [ZPL], ID 1173, cr_txg 59829, 23.0K, 7 objects Dataset zfsroot/tmp [ZPL], ID 1253, cr_txg 59931, 6.66G, 480610 objects Dataset zfsroot/userdata/worker226 [ZVOL], ID 940, cr_txg 59580, 12.0K, 2 objects Dataset zfsroot/userdata/worker251 [ZVOL], ID 948, cr_txg 59583, 132M, 2 objects Dataset zfsroot/userdata/worker152 [ZVOL], ID 924, cr_txg 59551, 928M, 2 objects Dataset zfsroot/userdata/worker125 [ZVOL], ID 932, cr_txg 59566, 997M, 2 objects Dataset zfsroot/userdata/worker158 [ZVOL], ID 916, cr_txg 59536, 498M, 2 objects Dataset zfsroot/userdata/worker214 [ZVOL], ID 908, cr_txg 59530, 736M, 2 objects Dataset zfsroot/userdata/worker160 [ZVOL], ID 900, cr_txg 59524, 774M, 2 objects Dataset zfsroot/userdata/worker184 [ZVOL], ID 892, cr_txg 59518, 609M, 2 objects Dataset zfsroot/userdata/worker235 [ZVOL], ID 1012, cr_txg 59663, 1.62G, 2 objects Dataset zfsroot/userdata/worker242 [ZVOL], ID 1021, cr_txg 59674, 96.1M, 2 objects Dataset zfsroot/userdata/worker248 [ZVOL], ID 1004, cr_txg 59660, 153M, 2 objects Dataset zfsroot/userdata/worker141 [ZVOL], ID 988, cr_txg 59631, 1014M, 2 objects Dataset zfsroot/userdata/worker136 [ZVOL], ID 996, cr_txg 59646, 995M, 2 objects Dataset zfsroot/userdata/worker207 [ZVOL], ID 980, cr_txg 59617, 577M, 2 objects Dataset zfsroot/userdata/worker179 [ZVOL], ID 972, cr_txg 59602, 801M, 2 objects Dataset zfsroot/userdata/worker197 [ZVOL], ID 964, cr_txg 59595, 383M, 2 objects Dataset zfsroot/userdata/worker173 [ZVOL], ID 956, cr_txg 59586, 1.26G, 2 objects Dataset zfsroot/userdata/worker190 [ZVOL], ID 1085, cr_txg 59757, 236M, 2 objects Dataset zfsroot/userdata/worker174 [ZVOL], ID 1077, cr_txg 59743, 2.11G, 2 objects Dataset zfsroot/userdata/worker200 [ZVOL], ID 1069, cr_txg 59732, 260M, 2 objects Dataset zfsroot/userdata/worker131 [ZVOL], ID 1053, cr_txg 59711, 792M, 2 objects Dataset zfsroot/userdata/worker146 [ZVOL], ID 1061, cr_txg 59725, 418M, 2 objects Dataset zfsroot/userdata/worker245 [ZVOL], ID 1037, cr_txg 59692, 208M, 2 objects Dataset zfsroot/userdata/worker232 [ZVOL], ID 1045, cr_txg 59695, 527M, 2 objects Dataset zfsroot/userdata/worker238 [ZVOL], ID 1029, cr_txg 59677, 1.94G, 2 objects Dataset zfsroot/userdata/worker167 [ZVOL], ID 1165, cr_txg 59823, 4.43G, 2 objects Dataset zfsroot/userdata/worker189 [ZVOL], ID 1157, cr_txg 59817, 326M, 2 objects Dataset zfsroot/userdata/worker183 [ZVOL], ID 1149, cr_txg 59811, 1.18G, 2 objects Dataset zfsroot/userdata/worker219 [ZVOL], ID 1141, cr_txg 59808, 12.0K, 2 objects Dataset zfsroot/userdata/worker213 [ZVOL], ID 1133, cr_txg 59802, 1.04G, 2 objects Dataset zfsroot/userdata/worker122 [ZVOL], ID 1117, cr_txg 59782, 1.05G, 2 objects Dataset zfsroot/userdata/worker155 [ZVOL], ID 1125, cr_txg 59790, 963M, 2 objects Dataset zfsroot/userdata/worker128 [ZVOL], ID 1109, cr_txg 59769, 1.67G, 2 objects Dataset zfsroot/userdata/worker256 [ZVOL], ID 1093, cr_txg 59763, 602K, 2 objects Dataset zfsroot/userdata/worker221 [ZVOL], ID 1101, cr_txg 59766, 12.0K, 2 objects Dataset zfsroot/userdata/worker126 [ZVOL], ID 666, cr_txg 59194, 781M, 2 objects Dataset zfsroot/userdata/worker151 [ZVOL], ID 674, cr_txg 59205, 435M, 2 objects Dataset zfsroot/userdata/worker252 [ZVOL], ID 650, cr_txg 59188, 127M, 2 objects Dataset zfsroot/userdata/worker225 [ZVOL], ID 658, cr_txg 59191, 12.0K, 2 objects Dataset zfsroot/userdata/worker187 [ZVOL], ID 642, cr_txg 59171, 2.55G, 2 objects Dataset zfsroot/userdata/worker169 [ZVOL], ID 634, cr_txg 59157, 359M, 2 objects Dataset zfsroot/userdata/worker163 [ZVOL], ID 626, cr_txg 59139, 2.30G, 2 objects Dataset zfsroot/userdata/worker217 [ZVOL], ID 618, cr_txg 59136, 12.0K, 2 objects Dataset zfsroot/userdata/worker135 [ZVOL], ID 731, cr_txg 59301, 1.36G, 2 objects Dataset zfsroot/userdata/worker142 [ZVOL], ID 739, cr_txg 59315, 468M, 2 objects Dataset zfsroot/userdata/worker148 [ZVOL], ID 723, cr_txg 59288, 1.21G, 2 objects Dataset zfsroot/userdata/worker241 [ZVOL], ID 707, cr_txg 59265, 758M, 2 objects Dataset zfsroot/userdata/worker236 [ZVOL], ID
Re: Trouble with SM961 in SuperMicro X11
On Sun, Jul 23, 2017 at 10:47 PM, Terry Kennedywrote: > My offer of a test system with the card is still open, if someone wants > to pick this up again. I will note that in my case, it only happens in my > Supermicro systems (but again, Linux works well with it on those boxes). > The SM961 in the same adapter works fine in a Dell system, and an Optane > card also works fine in the Supermicro system. Sounds like the bad old days when I had fun stuff like a video card incompatible with a HD controller. -- brandon s allbery kf8nh sine nomine associates allber...@gmail.com ballb...@sinenomine.net unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Trouble with SM961 in SuperMicro X11
> I bought this card but never could get it to fail, despite trying them in a > number of different systems :(. So lighten up please... They are obviously working for some people, but enough people are seeing the same problem over and over that there is definitely something wrong. I spent a good deal of time gathering the requested traces for the de- veloper who was working on it with me, going so far as to purchase a dif- ferent board (same model, different firmware) and a second adapter card, and tried all of the above in multiple systems. I then sent off the re- quested info, re-iterated my offer of remote access to one of the systems showing the problem, and heard... nothing. A follow-up some months later also got no response, while more and more people are running into the is- sue and reporting it either on the lists or in the forums. My offer of a test system with the card is still open, if someone wants to pick this up again. I will note that in my case, it only happens in my Supermicro systems (but again, Linux works well with it on those boxes). The SM961 in the same adapter works fine in a Dell system, and an Optane card also works fine in the Supermicro system. Terry Kennedy http://www.glaver.org New York, NY USA ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Trouble with SM961 in SuperMicro X11
On Jul 23, 2017 7:05 PM, "Terry Kennedy"wrote: > It's an SM961, not PM951. Welcome to the club! 8-{ See PR211723 - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713 as well as the forums: https://forums.freebsd.org/threads/58170/#post-334061 I (and others) have offered developers remote console access to systems that exhibit the problem, as well as confirming it works on the same hard- ware using Linux, gathered requested traces and so on, and then things just sort of... died. You should probably pile onto both the forum discussion and the PR with a "me too!" so it becomes more and more obvious that this is affecting a larger number of people as time goes on and these modules become more pop- ular. I bought this card but never could get it to fail, despite trying them in a number of different systems :(. So lighten up please... Warner Terry Kennedy http://www.glaver.org New York, NY USA ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching
On Thu, Jul 20, 2017 at 03:45:39PM +0200, Mark Martinec wrote: > 2017-07-20 02:03, Mark Johnston wrote: > > One thing to try at this point would be to disable EARLY_AP_STARTUP in > > the kernel config. That is, take a configuration with which you're able > > to reproduce the hang during boot, and remove "options > > EARLY_AP_STARTUP". > > Done. And it avoids the problem altogether! Thanks. > Tried a reboot several times and it succeeds every time. Thanks. Sorry for the delayed follow-up. > > Here is all that I had in a config file for building a kernel, > i.e. I took away the 'options DDB' which also seemingly avoided > the problem: >include GENERIC >ident NELI >nooptions EARLY_AP_STARTUP Could you try re-enabling EARLY_AP_STARTUP, applying the patch at the end of this email, and see if the message "sleeping before eventtimer init" appears in the boot output? If it does, it'll be followed by a backtrace that might be useful for tracking down the hang. It might produce false positives, but we'll see. > > > This feature has a fairly large impact on the bootup process and has > > had a few problems that manifested as hangs during boot. There was at > > least one other case where an innocuous change to the kernel > > configuration "fixed" the problem by introducing some second-order > > effect (causing kernel threads to be scheduled in a different > > order, for instance). > > > Regardless of whether the suggestion above makes a difference, it would > > be helpful to see verbose dmesgs from both a clean boot and a boot that > > hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some > > assertions that will cause the system to panic when the hang occurs, > > making it easier to see what's going on. > > Hmmm. > I have now saved a couple of versions of /var/run/dmesg.boot > (in boot_verbose mode) when EARLY_AP_STARTUP is disabled and > the boot is successful. However, I don't know how to capture > such log when booting hangs, as I have no serial interface > and the boot never completes. All I have is a screen photo > of the last state when a hang occurs (showing ada disks > successfully attached, followed immediately by the attempt > to attach a da disk, which hangs). Ok, let's not worry about this for now. Index: sys/kern/kern_clock.c === --- sys/kern/kern_clock.c (revision 321401) +++ sys/kern/kern_clock.c (working copy) @@ -385,6 +385,8 @@ static int devpoll_run = 0; #endif +bool inited_clocks = false; + /* * Initialize clock frequencies and start both clocks running. */ @@ -412,6 +414,8 @@ #ifdef SW_WATCHDOG EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0); #endif + + inited_clocks = true; } /* Index: sys/kern/kern_synch.c === --- sys/kern/kern_synch.c (revision 321401) +++ sys/kern/kern_synch.c (working copy) @@ -298,6 +298,8 @@ return (rval); } +extern bool inited_clocks; + /* * pause() delays the calling thread by the given number of system ticks. * During cold bootup, pause() uses the DELAY() function instead of @@ -330,6 +332,10 @@ DELAY(sbt); return (0); } + if (cold && !inited_clocks) { + printf("%s: sleeping before eventtimer init\n", curthread->td_name); + kdb_backtrace(); + } return (_sleep(_wchan[curcpu], NULL, 0, wmesg, sbt, pr, flags)); } ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 debugging kernel unable to produce crashdump again
On Sun, Jul 23, 2017 at 04:26:45PM +0700, Eugene Grosbein wrote: > On 14.01.2017 18:40, Eugene Grosbein wrote: > > > >> I suspect that this is because we only stop the scheduler upon a panic > >> if SMP is configured. Can you retest with the patch below applied? > >> > >> Index: sys/kern/kern_shutdown.c > >> === > >> --- sys/kern/kern_shutdown.c (revision 312082) > >> +++ sys/kern/kern_shutdown.c (working copy) > >> @@ -713,6 +713,7 @@ > >>CPU_CLR(PCPU_GET(cpuid), _cpus); > >>stop_cpus_hard(other_cpus); > >>} > >> +#endif > >> > >>/* > >> * Ensure that the scheduler is stopped while panicking, even if panic > >> @@ -719,7 +720,6 @@ > >> * has been entered from kdb. > >> */ > >>td->td_stopsched = 1; > >> -#endif > >> > >>bootopt = RB_AUTOBOOT; > >>newpanic = 0; > >> > >> > > > > Indeed, my router is uniprocessor system and your patch really solves the > > problem. > > Now kernel generates crashdump just fine in case of panic. Please commit > > the fix, thanks! > > Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump: Is this amd64 GENERIC, or something else? > > - "call doadump" from DDB prompt works just fine; > - "shutdown -r now" reboots the system without problems; > - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just > afer showing uptime > instead of continuing with crashdump generation; same if "real" panic occurs. > > Same for debug.minidump set to 1 or 0. How do I debug this? I'm not able to reproduce the problem in bhyve using r321401. Looking at the code, the culprits might be cngrab(), or one of the shutdown_post_sync eventhandlers. Since you're apparently able to see the console output at the time of the panic, I guess it's probably the latter. Could you try your test with the patch below applied? It'll print a bunch of "entering post_sync"/"leaving post_sync" messages with addresses that can be resolved using kgdb. That'll help determine where we're getting stuck. Index: sys/sys/eventhandler.h === --- sys/sys/eventhandler.h (revision 321401) +++ sys/sys/eventhandler.h (working copy) @@ -85,7 +85,11 @@ _t = (struct eventhandler_entry_ ## name *)_ep; \ CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \ (void *)_t->eh_func); \ + if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \ + printf("entering post_sync %p\n", (void *)_t->eh_func); \ _t->eh_func(_ep->ee_arg , ## __VA_ARGS__); \ + if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \ + printf("leaving post_sync %p\n", (void *)_t->eh_func); \ EHL_LOCK((list)); \ } \ } \ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Trouble with SM961 in SuperMicro X11
> It's an SM961, not PM951. Welcome to the club! 8-{ See PR211723 - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713 as well as the forums: https://forums.freebsd.org/threads/58170/#post-334061 I (and others) have offered developers remote console access to systems that exhibit the problem, as well as confirming it works on the same hard- ware using Linux, gathered requested traces and so on, and then things just sort of... died. You should probably pile onto both the forum discussion and the PR with a "me too!" so it becomes more and more obvious that this is affecting a larger number of people as time goes on and these modules become more pop- ular. Terry Kennedy http://www.glaver.org New York, NY USA ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-RELEASE from SVN
Hi Glen, thanks for the update.. Your work (and the work of others of course) is one of the main reasons why i use and promote the use of FreeBSD. One of the last truly engineered Operating Systems. Thank you for your work, you probably can guess it already, i truly appreciate it. I'll store the fireworks till wednesday then.. Sydney > On 23. Jul 2017, at 22:38, Glen Barberwrote: > > Hi Sydney, > > The release date in UPDATING actually was a mistake. (I looked at the > wrong month...) > > Glen > > On Sun, Jul 23, 2017 at 10:23:43PM +0200, Sydney Meyer via freebsd-stable > wrote: >> Hi Dimitry, >> >> thank you for your reply. >> >> Please excuse me if i came across impatiently, that was not my intention. >> >> It's just that i find the way the project handles events like these, new >> releases, security incidents, etc. very interesting and just generally love >> to hear about it. >> >> Glen, e.g., sent a revised RC-Announcement Mail because of a omitted PGP >> signature. Gotta love this attention to detail. >> >> So i saw the commit with the anticipated 11.1-RELEASE date in the UPDATING >> file and the updated schedule on the website and thought, i just ask.. >> >> The (seamingly) disappeared 11.1.0-RELEASE was my mistake (old pathrev in >> the url on svnweb). >> >> Anyhow, i am sure everybody hard at work and i'm looking forward to >> (another) really awesome dot-release. >> >> Have a nice rest-weekend.. >> >> Sydney >> >>> On 23. Jul 2017, at 14:53, Dimitry Andric wrote: >>> >>> On 23 Jul 2017, at 14:36, Sydney Meyer via freebsd-stable >>> wrote: are there any "last-minute" issues/changes with the 11.1-RELEASE build? The 11.1-RELEASE appears to be gone from SVN after the switch from releng/11.1 to -RELEASE and the Press Release Schedule seems to have changed without notice. Not that this is an issue to me, as the releases aren't officially released until @re sends the announcment email, i'm just curious.. >>> >>> Don't worry, the release engineers are furiously working behind the >>> scenes to get all the correct bits built, verified and uploaded. This >>> will just take a few days. The schedule is here: >>> >>> https://www.freebsd.org/releases/11.1R/schedule.html >>> >>> It is also perfectly normal for stable/11 to be renamed -STABLE again, >>> this is the usual procedure after tagging releases in releng. >>> >>> -Dimitry >>> >> >> ___ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1-RELEASE from SVN
Hi Sydney, The release date in UPDATING actually was a mistake. (I looked at the wrong month...) Glen On Sun, Jul 23, 2017 at 10:23:43PM +0200, Sydney Meyer via freebsd-stable wrote: > Hi Dimitry, > > thank you for your reply. > > Please excuse me if i came across impatiently, that was not my intention. > > It's just that i find the way the project handles events like these, new > releases, security incidents, etc. very interesting and just generally love > to hear about it. > > Glen, e.g., sent a revised RC-Announcement Mail because of a omitted PGP > signature. Gotta love this attention to detail. > > So i saw the commit with the anticipated 11.1-RELEASE date in the UPDATING > file and the updated schedule on the website and thought, i just ask.. > > The (seamingly) disappeared 11.1.0-RELEASE was my mistake (old pathrev in the > url on svnweb). > > Anyhow, i am sure everybody hard at work and i'm looking forward to (another) > really awesome dot-release. > > Have a nice rest-weekend.. > > Sydney > > > On 23. Jul 2017, at 14:53, Dimitry Andricwrote: > > > > On 23 Jul 2017, at 14:36, Sydney Meyer via freebsd-stable > > wrote: > >> > >> are there any "last-minute" issues/changes with the 11.1-RELEASE build? > >> > >> The 11.1-RELEASE appears to be gone from SVN after the switch from > >> releng/11.1 to -RELEASE and the Press Release Schedule seems to have > >> changed without notice. > >> > >> Not that this is an issue to me, as the releases aren't officially > >> released until @re sends the announcment email, i'm just curious.. > > > > Don't worry, the release engineers are furiously working behind the > > scenes to get all the correct bits built, verified and uploaded. This > > will just take a few days. The schedule is here: > > > > https://www.freebsd.org/releases/11.1R/schedule.html > > > > It is also perfectly normal for stable/11 to be renamed -STABLE again, > > this is the usual procedure after tagging releases in releng. > > > > -Dimitry > > > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" signature.asc Description: PGP signature
Re: 11.1-RELEASE from SVN
Hi Dimitry, thank you for your reply. Please excuse me if i came across impatiently, that was not my intention. It's just that i find the way the project handles events like these, new releases, security incidents, etc. very interesting and just generally love to hear about it. Glen, e.g., sent a revised RC-Announcement Mail because of a omitted PGP signature. Gotta love this attention to detail. So i saw the commit with the anticipated 11.1-RELEASE date in the UPDATING file and the updated schedule on the website and thought, i just ask.. The (seamingly) disappeared 11.1.0-RELEASE was my mistake (old pathrev in the url on svnweb). Anyhow, i am sure everybody hard at work and i'm looking forward to (another) really awesome dot-release. Have a nice rest-weekend.. Sydney > On 23. Jul 2017, at 14:53, Dimitry Andricwrote: > > On 23 Jul 2017, at 14:36, Sydney Meyer via freebsd-stable > wrote: >> >> are there any "last-minute" issues/changes with the 11.1-RELEASE build? >> >> The 11.1-RELEASE appears to be gone from SVN after the switch from >> releng/11.1 to -RELEASE and the Press Release Schedule seems to have changed >> without notice. >> >> Not that this is an issue to me, as the releases aren't officially released >> until @re sends the announcment email, i'm just curious.. > > Don't worry, the release engineers are furiously working behind the > scenes to get all the correct bits built, verified and uploaded. This > will just take a few days. The schedule is here: > > https://www.freebsd.org/releases/11.1R/schedule.html > > It is also perfectly normal for stable/11 to be renamed -STABLE again, > this is the usual procedure after tagging releases in releng. > > -Dimitry > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11: Kernel page fault with the following non-sleepable locks held: CAM device lock
On 23.07.2017 20:02, Eugene Grosbein wrote: > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0xa > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x80e494e1 > stack pointer = 0x28:0xfe04675ff670 > frame pointer = 0x28:0xfe04675ff670 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 1387 (smartd) > trap number = 12 > panic: page fault > cpuid = 0 I also have a screenshot of another case of same panic that notes lock order reversal: (Giant after non-sleepable): http://www.grosbein.net/freebsd/crash.png ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
stable/11: Kernel page fault with the following non-sleepable locks held: CAM device lock
Hi! Long story short: stable/11 r321371 started to panic at the moment of smartd invocation after my SSD died. I have Intel motherboard with graid-supported pseudo-raid. I use it in RAID1 mode with one HDD and one SSD. Yesterday the SSD has died: it is not detected by BIOS nor FreeBSD kernel (timeouts). This went unnoticed by me as graid just disconnected it on-the-fly: kernel: ahcich5: Timeout on slot 24 port 0 kernel: ahcich5: is cs ss 0100 rs 0100 tfd 40 serr cmd d817 kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 02 ad 12 9e 40 3b 00 00 00 00 00 kernel: (ada1:ahcich5:0:0:0): CAM status: Command timeout kernel: (ada1:ahcich5:0:0:0): Retrying command kernel: ahcich5: AHCI reset: device not ready after 31000ms (tfd = 0080) [skip] kernel: ada1 at ahcich5 bus 0 scbus2 target 0 lun 0 kernel: ada1: s/n JYKJ550855860139 detached [skip] kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 02 ad 12 9e 40 3b 00 00 00 00 00 kernel: (ada1:ahcich5:0:0:0): CAM status: Command timeout kernel: (ada1:ahcich5:0:0:0): Error 5, Periph was invalidated kernel: GEOM_RAID: Write failed: failing subdisk. ada1[WRITE(offset=269389066240, length=32768)] kernel: GEOM_RAID: Intel-c291fe96: Disk ada1 state changed from ACTIVE to FAILED. kernel: GEOM_RAID: Intel-c291fe96: Subdisk r0:1-ada1 state changed from ACTIVE to FAILED. kernel: GEOM_RAID: Intel-c291fe96: Volume r0 state changed from OPTIMAL to DEGRADED. kernel: GEOM_RAID: Intel-c291fe96: Disk ada1 state changed from FAILED to OFFLINE. kernel: GEOM_RAID: Intel-c291fe96: Subdisk r0:1-[unknown] state changed from FAILED to NONE. kernel: GEOM_RAID: Write failed: failing subdisk. ada1[WRITE(offset=270699851776, length=32768)] kernel: GEOM_RAID: Intel-c291fe96: Warning! Fail request to a disk in a wrong state (OFFLINE)! Unaware of that, I've performed standard source upgrade from 11.1-PRERELEASE r318692 to stable/11 r321371 that went smooth. After reboot, BIOS was unable to detect SSD, reported degraded state of the mirror and booted FreeBSD using second mirror component (HDD). After long timeout, the kernel could not detect dead SSD too and continued to run with degraded mirror just fine: the system went multiuser mode and almost finished loading when rcNG started smartd. The kernel panices that moment. This is repeatable: I can cold-boot to single user mode, start smartd and get same panic. This is debugging kernel and I managed to obtain crashdump. kgdb session follows: <118>Starting smartd. Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex CAM device lock (CAM device lock) r = 0 (0xf8000cf71c60) locked @ /home/src/sys/cam/scsi/scsi_pass.c:1766 stack backtrace: #0 0x80a12620 at witness_debugger+0x70 #1 0x80a13a4e at witness_warn+0x45e #2 0x80e4b693 at trap_pfault+0x53 #3 0x80e4ae3e at trap+0x29e #4 0x80e2ed91 at calltrap+0x8 #5 0x8033873a at passsendccb+0x6a #6 0x80337836 at passdoioctl+0x3c6 #7 0x80337052 at passioctl+0x22 #8 0x80878c78 at devfs_ioctl_f+0x138 #9 0x80a18184 at kern_ioctl+0x2c4 #10 0x80a17e4f at sys_ioctl+0x16f #11 0x80e4c05a at amd64_syscall+0x53a #12 0x80e2f07b at Xfast_syscall+0xfb Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xa fault code = supervisor read data, page not present instruction pointer = 0x20:0x80e494e1 stack pointer = 0x28:0xfe04675ff670 frame pointer = 0x28:0xfe04675ff670 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 1387 (smartd) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe04675ff250 vpanic() at vpanic+0x186/frame 0xfe04675ff2d0 panic() at panic+0x43/frame 0xfe04675ff330 trap_fatal() at trap_fatal+0x322/frame 0xfe04675ff380 trap_pfault() at trap_pfault+0x62/frame 0xfe04675ff3e0 trap() at trap+0x29e/frame 0xfe04675ff5a0 calltrap() at calltrap+0x8/frame 0xfe04675ff5a0 --- trap 0xc, rip = 0x80e494e1, rsp = 0xfe04675ff670, rbp = 0xfe04675ff670 --- copyin() at copyin+0x41/frame 0xfe04675ff670 passsendccb() at passsendccb+0x6a/frame 0xfe04675ff6f0 passdoioctl() at passdoioctl+0x3c6/frame 0xfe04675ff7a0 passioctl() at passioctl+0x22/frame 0xfe04675ff7e0 devfs_ioctl_f() at devfs_ioctl_f+0x138/frame 0xfe04675ff840 kern_ioctl() at kern_ioctl+0x2c4/frame 0xfe04675ff8a0 sys_ioctl() at sys_ioctl+0x16f/frame 0xfe04675ff980 amd64_syscall() at amd64_syscall+0x53a/frame 0xfe04675ffab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe04675ffab0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip
Re: 11.1-RELEASE from SVN
On 23 Jul 2017, at 14:36, Sydney Meyer via freebsd-stablewrote: > > are there any "last-minute" issues/changes with the 11.1-RELEASE build? > > The 11.1-RELEASE appears to be gone from SVN after the switch from > releng/11.1 to -RELEASE and the Press Release Schedule seems to have changed > without notice. > > Not that this is an issue to me, as the releases aren't officially released > until @re sends the announcment email, i'm just curious.. Don't worry, the release engineers are furiously working behind the scenes to get all the correct bits built, verified and uploaded. This will just take a few days. The schedule is here: https://www.freebsd.org/releases/11.1R/schedule.html It is also perfectly normal for stable/11 to be renamed -STABLE again, this is the usual procedure after tagging releases in releng. -Dimitry signature.asc Description: Message signed with OpenPGP
11.1-RELEASE from SVN
Hello @re, are there any "last-minute" issues/changes with the 11.1-RELEASE build? The 11.1-RELEASE appears to be gone from SVN after the switch from releng/11.1 to -RELEASE and the Press Release Schedule seems to have changed without notice. Not that this is an issue to me, as the releases aren't officially released until @re sends the announcment email, i'm just curious.. Thanks.. Sydney ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
stable/11 debugging kernel unable to produce crashdump again
On 14.01.2017 18:40, Eugene Grosbein wrote: > >> I suspect that this is because we only stop the scheduler upon a panic >> if SMP is configured. Can you retest with the patch below applied? >> >> Index: sys/kern/kern_shutdown.c >> === >> --- sys/kern/kern_shutdown.c (revision 312082) >> +++ sys/kern/kern_shutdown.c (working copy) >> @@ -713,6 +713,7 @@ >> CPU_CLR(PCPU_GET(cpuid), _cpus); >> stop_cpus_hard(other_cpus); >> } >> +#endif >> >> /* >> * Ensure that the scheduler is stopped while panicking, even if panic >> @@ -719,7 +720,6 @@ >> * has been entered from kdb. >> */ >> td->td_stopsched = 1; >> -#endif >> >> bootopt = RB_AUTOBOOT; >> newpanic = 0; >> >> > > Indeed, my router is uniprocessor system and your patch really solves the > problem. > Now kernel generates crashdump just fine in case of panic. Please commit the > fix, thanks! Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump: - "call doadump" from DDB prompt works just fine; - "shutdown -r now" reboots the system without problems; - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just afer showing uptime instead of continuing with crashdump generation; same if "real" panic occurs. Same for debug.minidump set to 1 or 0. How do I debug this? Eugene Grosbein ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On Sat, Jul 22, 2017 at 10:51:42PM -0700, Don Lewis wrote: > > The stack is aligned to a 4096 (0x1000) boundary. The first access to a > > local variable below 0xfe085cfa5000 is what triggered the trap. The > > other end of the stack must be at 0xfe085cfa9000 less a bit. I don't > > know why the first stack pointer value in the trace is > > 0xfe085cfa8a10. That would seem to indicate that amd64_syscall is > > using ~1500 bytes of stack space. > > Actually there could be quite a bit of CPU context that gets saved. That > could be sizeable on amd64. Yes, the usermode trap frame is located on the kernel stack. Also, pcb and usermode FPU save area (FPU == all non-general purpose x86 registers, including XMM/AVX/AVX512 as implemented by CPU) are on the stack. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"