Re: cannot destroy faulty zvol

2017-07-23 Thread Eugene M. Zheganin

Hi.

On 23.07.2017 0:28, Eugene M. Zheganin wrote:

Hi,

On 22.07.2017 17:08, Eugene M. Zheganin wrote:


is this weird error "cannot destroy: already exists" related to the 
fact that the zvol is faulty ? Does it indicate that metadata is 
probably faulty too ? Anyway, is there a way to destroy this dataset ?
Follow-up: I sent a similar zvol of the thexactly same size into the 
faulty one, zpool errors are gone, still cannot destroy the zvol. Is 
this a zfs bug ?
Seems like it. zdb shows some "invisible" dataset not shown by zfs list 
-t all, one of them is a child dataset of the one I'm trying to destroy:


# zdb -d zfsroot
Dataset mos [META], ID 0, cr_txg 4, 60.7M, 1078 objects
Dataset zfsroot/usr/src [ZPL], ID 1181, cr_txg 59832, 1.03G, 158964 objects
Dataset zfsroot/usr/home [ZPL], ID 1197, cr_txg 59911, 21.7M, 551 objects
Dataset zfsroot/usr/ports [ZPL], ID 1189, cr_txg 59881, 887M, 335670 objects
Dataset zfsroot/usr [ZPL], ID 1173, cr_txg 59829, 23.0K, 7 objects
Dataset zfsroot/tmp [ZPL], ID 1253, cr_txg 59931, 6.66G, 480610 objects
Dataset zfsroot/userdata/worker226 [ZVOL], ID 940, cr_txg 59580, 12.0K, 
2 objects
Dataset zfsroot/userdata/worker251 [ZVOL], ID 948, cr_txg 59583, 132M, 2 
objects
Dataset zfsroot/userdata/worker152 [ZVOL], ID 924, cr_txg 59551, 928M, 2 
objects
Dataset zfsroot/userdata/worker125 [ZVOL], ID 932, cr_txg 59566, 997M, 2 
objects
Dataset zfsroot/userdata/worker158 [ZVOL], ID 916, cr_txg 59536, 498M, 2 
objects
Dataset zfsroot/userdata/worker214 [ZVOL], ID 908, cr_txg 59530, 736M, 2 
objects
Dataset zfsroot/userdata/worker160 [ZVOL], ID 900, cr_txg 59524, 774M, 2 
objects
Dataset zfsroot/userdata/worker184 [ZVOL], ID 892, cr_txg 59518, 609M, 2 
objects
Dataset zfsroot/userdata/worker235 [ZVOL], ID 1012, cr_txg 59663, 1.62G, 
2 objects
Dataset zfsroot/userdata/worker242 [ZVOL], ID 1021, cr_txg 59674, 96.1M, 
2 objects
Dataset zfsroot/userdata/worker248 [ZVOL], ID 1004, cr_txg 59660, 153M, 
2 objects
Dataset zfsroot/userdata/worker141 [ZVOL], ID 988, cr_txg 59631, 1014M, 
2 objects
Dataset zfsroot/userdata/worker136 [ZVOL], ID 996, cr_txg 59646, 995M, 2 
objects
Dataset zfsroot/userdata/worker207 [ZVOL], ID 980, cr_txg 59617, 577M, 2 
objects
Dataset zfsroot/userdata/worker179 [ZVOL], ID 972, cr_txg 59602, 801M, 2 
objects
Dataset zfsroot/userdata/worker197 [ZVOL], ID 964, cr_txg 59595, 383M, 2 
objects
Dataset zfsroot/userdata/worker173 [ZVOL], ID 956, cr_txg 59586, 1.26G, 
2 objects
Dataset zfsroot/userdata/worker190 [ZVOL], ID 1085, cr_txg 59757, 236M, 
2 objects
Dataset zfsroot/userdata/worker174 [ZVOL], ID 1077, cr_txg 59743, 2.11G, 
2 objects
Dataset zfsroot/userdata/worker200 [ZVOL], ID 1069, cr_txg 59732, 260M, 
2 objects
Dataset zfsroot/userdata/worker131 [ZVOL], ID 1053, cr_txg 59711, 792M, 
2 objects
Dataset zfsroot/userdata/worker146 [ZVOL], ID 1061, cr_txg 59725, 418M, 
2 objects
Dataset zfsroot/userdata/worker245 [ZVOL], ID 1037, cr_txg 59692, 208M, 
2 objects
Dataset zfsroot/userdata/worker232 [ZVOL], ID 1045, cr_txg 59695, 527M, 
2 objects
Dataset zfsroot/userdata/worker238 [ZVOL], ID 1029, cr_txg 59677, 1.94G, 
2 objects
Dataset zfsroot/userdata/worker167 [ZVOL], ID 1165, cr_txg 59823, 4.43G, 
2 objects
Dataset zfsroot/userdata/worker189 [ZVOL], ID 1157, cr_txg 59817, 326M, 
2 objects
Dataset zfsroot/userdata/worker183 [ZVOL], ID 1149, cr_txg 59811, 1.18G, 
2 objects
Dataset zfsroot/userdata/worker219 [ZVOL], ID 1141, cr_txg 59808, 12.0K, 
2 objects
Dataset zfsroot/userdata/worker213 [ZVOL], ID 1133, cr_txg 59802, 1.04G, 
2 objects
Dataset zfsroot/userdata/worker122 [ZVOL], ID 1117, cr_txg 59782, 1.05G, 
2 objects
Dataset zfsroot/userdata/worker155 [ZVOL], ID 1125, cr_txg 59790, 963M, 
2 objects
Dataset zfsroot/userdata/worker128 [ZVOL], ID 1109, cr_txg 59769, 1.67G, 
2 objects
Dataset zfsroot/userdata/worker256 [ZVOL], ID 1093, cr_txg 59763, 602K, 
2 objects
Dataset zfsroot/userdata/worker221 [ZVOL], ID 1101, cr_txg 59766, 12.0K, 
2 objects
Dataset zfsroot/userdata/worker126 [ZVOL], ID 666, cr_txg 59194, 781M, 2 
objects
Dataset zfsroot/userdata/worker151 [ZVOL], ID 674, cr_txg 59205, 435M, 2 
objects
Dataset zfsroot/userdata/worker252 [ZVOL], ID 650, cr_txg 59188, 127M, 2 
objects
Dataset zfsroot/userdata/worker225 [ZVOL], ID 658, cr_txg 59191, 12.0K, 
2 objects
Dataset zfsroot/userdata/worker187 [ZVOL], ID 642, cr_txg 59171, 2.55G, 
2 objects
Dataset zfsroot/userdata/worker169 [ZVOL], ID 634, cr_txg 59157, 359M, 2 
objects
Dataset zfsroot/userdata/worker163 [ZVOL], ID 626, cr_txg 59139, 2.30G, 
2 objects
Dataset zfsroot/userdata/worker217 [ZVOL], ID 618, cr_txg 59136, 12.0K, 
2 objects
Dataset zfsroot/userdata/worker135 [ZVOL], ID 731, cr_txg 59301, 1.36G, 
2 objects
Dataset zfsroot/userdata/worker142 [ZVOL], ID 739, cr_txg 59315, 468M, 2 
objects
Dataset zfsroot/userdata/worker148 [ZVOL], ID 723, cr_txg 59288, 1.21G, 
2 objects
Dataset zfsroot/userdata/worker241 [ZVOL], ID 707, cr_txg 59265, 758M, 2 
objects
Dataset zfsroot/userdata/worker236 [ZVOL], ID 

Re: Trouble with SM961 in SuperMicro X11

2017-07-23 Thread Brandon Allbery
On Sun, Jul 23, 2017 at 10:47 PM, Terry Kennedy  wrote:

>   My offer of a test system with the card is still open, if someone wants
> to pick this up again. I will note that in my case, it only happens in my
> Supermicro systems (but again, Linux works well with it on those boxes).
> The SM961 in the same adapter works fine in a Dell system, and an Optane
> card also works fine in the Supermicro system.


Sounds like the bad old days when I had fun stuff like a video card
incompatible with a HD controller.

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trouble with SM961 in SuperMicro X11

2017-07-23 Thread Terry Kennedy
> I bought this card but never could get it to fail, despite trying them in a
> number of different systems :(. So lighten up please...

  They are obviously working for some people, but enough people are seeing
the same problem over and over that there is definitely something wrong.

  I spent a good deal of time gathering the requested traces for the de-
veloper who was working on it with me, going so far as to purchase a dif-
ferent board (same model, different firmware) and a second adapter card,
and tried all of the above in multiple systems. I then sent off the re-
quested info, re-iterated my offer of remote access to one of the systems
showing the problem, and heard... nothing. A follow-up some months later
also got no response, while more and more people are running into the is-
sue and reporting it either on the lists or in the forums.

  My offer of a test system with the card is still open, if someone wants
to pick this up again. I will note that in my case, it only happens in my
Supermicro systems (but again, Linux works well with it on those boxes).
The SM961 in the same adapter works fine in a Dell system, and an Optane
card also works fine in the Supermicro system. 

Terry Kennedy http://www.glaver.org  New York, NY USA
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trouble with SM961 in SuperMicro X11

2017-07-23 Thread Warner Losh
On Jul 23, 2017 7:05 PM, "Terry Kennedy"  wrote:

> It's an SM961, not PM951.

  Welcome to the club! 8-{

  See PR211723 - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713
as well as the forums: https://forums.freebsd.org/threads/58170/#post-334061

  I (and others) have offered developers remote console access to systems
that exhibit the problem, as well as confirming it works on the same hard-
ware using Linux, gathered requested traces and so on, and then things just
sort of... died.

  You should probably pile onto both the forum discussion and the PR with
a "me too!" so it becomes more and more obvious that this is affecting a
larger number of people as time goes on and these modules become more pop-
ular.


I bought this card but never could get it to fail, despite trying them in a
number of different systems :(. So lighten up please...

Warner


Terry Kennedy http://www.glaver.org  New York, NY USA
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-23 Thread Mark Johnston
On Thu, Jul 20, 2017 at 03:45:39PM +0200, Mark Martinec wrote:
> 2017-07-20 02:03, Mark Johnston wrote:
> > One thing to try at this point would be to disable EARLY_AP_STARTUP in
> > the kernel config. That is, take a configuration with which you're able
> > to reproduce the hang during boot, and remove "options
> > EARLY_AP_STARTUP".
> 
> Done. And it avoids the problem altogether! Thanks.
> Tried a reboot several times and it succeeds every time.

Thanks. Sorry for the delayed follow-up.

> 
> Here is all that I had in a config file for building a kernel,
> i.e. I took away the 'options DDB' which also seemingly avoided
> the problem:
>include GENERIC
>ident NELI
>nooptions EARLY_AP_STARTUP

Could you try re-enabling EARLY_AP_STARTUP, applying the patch at the
end of this email, and see if the message "sleeping before eventtimer
init" appears in the boot output? If it does, it'll be followed by a
backtrace that might be useful for tracking down the hang. It might
produce false positives, but we'll see.

> 
> > This feature has a fairly large impact on the bootup process and has
> > had a few problems that manifested as hangs during boot. There was at
> > least one other case where an innocuous change to the kernel
> > configuration "fixed" the problem by introducing some second-order
> > effect (causing kernel threads to be scheduled in a different
> > order, for instance).
> 
> > Regardless of whether the suggestion above makes a difference, it would
> > be helpful to see verbose dmesgs from both a clean boot and a boot that
> > hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some
> > assertions that will cause the system to panic when the hang occurs,
> > making it easier to see what's going on.
> 
> Hmmm.
> I have now saved a couple of versions of /var/run/dmesg.boot
> (in boot_verbose mode) when EARLY_AP_STARTUP is disabled and
> the boot is successful. However, I don't know how to capture
> such log when booting hangs, as I have no serial interface
> and the boot never completes. All I have is a screen photo
> of the last state when a hang occurs (showing ada disks
> successfully attached, followed immediately by the attempt
> to attach a da disk, which hangs).

Ok, let's not worry about this for now.

Index: sys/kern/kern_clock.c
===
--- sys/kern/kern_clock.c   (revision 321401)
+++ sys/kern/kern_clock.c   (working copy)
@@ -385,6 +385,8 @@
 static int devpoll_run = 0;
 #endif
 
+bool inited_clocks = false;
+
 /*
  * Initialize clock frequencies and start both clocks running.
  */
@@ -412,6 +414,8 @@
 #ifdef SW_WATCHDOG
EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0);
 #endif
+
+   inited_clocks = true;
 }
 
 /*
Index: sys/kern/kern_synch.c
===
--- sys/kern/kern_synch.c   (revision 321401)
+++ sys/kern/kern_synch.c   (working copy)
@@ -298,6 +298,8 @@
return (rval);
 }
 
+extern bool inited_clocks;
+
 /*
  * pause() delays the calling thread by the given number of system ticks.
  * During cold bootup, pause() uses the DELAY() function instead of
@@ -330,6 +332,10 @@
DELAY(sbt);
return (0);
}
+   if (cold && !inited_clocks) {
+   printf("%s: sleeping before eventtimer init\n", 
curthread->td_name);
+   kdb_backtrace();
+   }
return (_sleep(_wchan[curcpu], NULL, 0, wmesg, sbt, pr, flags));
 }
 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-23 Thread Mark Johnston
On Sun, Jul 23, 2017 at 04:26:45PM +0700, Eugene Grosbein wrote:
> On 14.01.2017 18:40, Eugene Grosbein wrote:
> > 
> >> I suspect that this is because we only stop the scheduler upon a panic
> >> if SMP is configured. Can you retest with the patch below applied?
> >>
> >> Index: sys/kern/kern_shutdown.c
> >> ===
> >> --- sys/kern/kern_shutdown.c   (revision 312082)
> >> +++ sys/kern/kern_shutdown.c   (working copy)
> >> @@ -713,6 +713,7 @@
> >>CPU_CLR(PCPU_GET(cpuid), _cpus);
> >>stop_cpus_hard(other_cpus);
> >>}
> >> +#endif
> >>  
> >>/*
> >> * Ensure that the scheduler is stopped while panicking, even if panic
> >> @@ -719,7 +720,6 @@
> >> * has been entered from kdb.
> >> */
> >>td->td_stopsched = 1;
> >> -#endif
> >>  
> >>bootopt = RB_AUTOBOOT;
> >>newpanic = 0;
> >>
> >>
> > 
> > Indeed, my router is uniprocessor system and your patch really solves the 
> > problem.
> > Now kernel generates crashdump just fine in case of panic. Please commit 
> > the fix, thanks!
> 
> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:

Is this amd64 GENERIC, or something else?

> 
> - "call doadump" from DDB prompt works just fine;
> - "shutdown -r now" reboots the system without problems;
> - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just 
> afer showing uptime
> instead of continuing with crashdump generation; same if "real" panic occurs.
> 
> Same for debug.minidump set to 1 or 0. How do I debug this?

I'm not able to reproduce the problem in bhyve using r321401. Looking
at the code, the culprits might be cngrab(), or one of the
shutdown_post_sync eventhandlers. Since you're apparently able to see
the console output at the time of the panic, I guess it's probably the
latter. Could you try your test with the patch below applied? It'll
print a bunch of "entering post_sync"/"leaving post_sync" messages with
addresses that can be resolved using kgdb. That'll help determine where
we're getting stuck.

Index: sys/sys/eventhandler.h
===
--- sys/sys/eventhandler.h  (revision 321401)
+++ sys/sys/eventhandler.h  (working copy)
@@ -85,7 +85,11 @@
_t = (struct eventhandler_entry_ ## name *)_ep; \
CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \
(void *)_t->eh_func);   \
+   if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
+   printf("entering post_sync %p\n", (void 
*)_t->eh_func); \
_t->eh_func(_ep->ee_arg , ## __VA_ARGS__);  \
+   if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
+   printf("leaving post_sync %p\n", (void 
*)_t->eh_func); \
EHL_LOCK((list));   \
}   \
}   \
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trouble with SM961 in SuperMicro X11

2017-07-23 Thread Terry Kennedy
> It's an SM961, not PM951.

  Welcome to the club! 8-{

  See PR211723 - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713
as well as the forums: https://forums.freebsd.org/threads/58170/#post-334061

  I (and others) have offered developers remote console access to systems
that exhibit the problem, as well as confirming it works on the same hard-
ware using Linux, gathered requested traces and so on, and then things just
sort of... died.

  You should probably pile onto both the forum discussion and the PR with
a "me too!" so it becomes more and more obvious that this is affecting a
larger number of people as time goes on and these modules become more pop-
ular.
Terry Kennedy http://www.glaver.org  New York, NY USA
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.1-RELEASE from SVN

2017-07-23 Thread Sydney Meyer via freebsd-stable
Hi Glen, 

thanks for the update..

Your work (and the work of others of course) is one of the main reasons why i 
use and promote the use of FreeBSD. One of the last truly engineered Operating 
Systems.

Thank you for your work, you probably can guess it already, i truly appreciate 
it.

I'll store the fireworks till wednesday then..

Sydney

> On 23. Jul 2017, at 22:38, Glen Barber  wrote:
> 
> Hi Sydney,
> 
> The release date in UPDATING actually was a mistake.  (I looked at the
> wrong month...)
> 
> Glen
> 
> On Sun, Jul 23, 2017 at 10:23:43PM +0200, Sydney Meyer via freebsd-stable 
> wrote:
>> Hi Dimitry,
>> 
>> thank you for your reply.
>> 
>> Please excuse me if i came across impatiently, that was not my intention.
>> 
>> It's just that i find the way the project handles events like these, new 
>> releases, security incidents, etc. very interesting and just generally love 
>> to hear about it.
>> 
>> Glen, e.g., sent a revised RC-Announcement Mail because of a omitted PGP 
>> signature. Gotta love this attention to detail.
>> 
>> So i saw the commit with the anticipated 11.1-RELEASE date in the UPDATING 
>> file and the updated schedule on the website and thought, i just ask..
>> 
>> The (seamingly) disappeared 11.1.0-RELEASE was my mistake (old pathrev in 
>> the url on svnweb).
>> 
>> Anyhow, i am sure everybody hard at work and i'm looking forward to 
>> (another) really awesome dot-release.
>> 
>> Have a nice rest-weekend..
>> 
>> Sydney
>> 
>>> On 23. Jul 2017, at 14:53, Dimitry Andric  wrote:
>>> 
>>> On 23 Jul 2017, at 14:36, Sydney Meyer via freebsd-stable 
>>>  wrote:
 
 are there any "last-minute" issues/changes with the 11.1-RELEASE build?
 
 The 11.1-RELEASE appears to be gone from SVN after the switch from 
 releng/11.1 to -RELEASE and the Press Release Schedule seems to have 
 changed without notice.
 
 Not that this is an issue to me, as the releases aren't officially 
 released until @re sends the announcment email, i'm just curious..
>>> 
>>> Don't worry, the release engineers are furiously working behind the
>>> scenes to get all the correct bits built, verified and uploaded.  This
>>> will just take a few days.  The schedule is here:
>>> 
>>> https://www.freebsd.org/releases/11.1R/schedule.html
>>> 
>>> It is also perfectly normal for stable/11 to be renamed -STABLE again,
>>> this is the usual procedure after tagging releases in releng.
>>> 
>>> -Dimitry
>>> 
>> 
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.1-RELEASE from SVN

2017-07-23 Thread Glen Barber
Hi Sydney,

The release date in UPDATING actually was a mistake.  (I looked at the
wrong month...)

Glen

On Sun, Jul 23, 2017 at 10:23:43PM +0200, Sydney Meyer via freebsd-stable wrote:
> Hi Dimitry,
> 
> thank you for your reply.
> 
> Please excuse me if i came across impatiently, that was not my intention.
> 
> It's just that i find the way the project handles events like these, new 
> releases, security incidents, etc. very interesting and just generally love 
> to hear about it.
> 
> Glen, e.g., sent a revised RC-Announcement Mail because of a omitted PGP 
> signature. Gotta love this attention to detail.
> 
> So i saw the commit with the anticipated 11.1-RELEASE date in the UPDATING 
> file and the updated schedule on the website and thought, i just ask..
> 
> The (seamingly) disappeared 11.1.0-RELEASE was my mistake (old pathrev in the 
> url on svnweb).
> 
> Anyhow, i am sure everybody hard at work and i'm looking forward to (another) 
> really awesome dot-release.
> 
> Have a nice rest-weekend..
> 
> Sydney
> 
> > On 23. Jul 2017, at 14:53, Dimitry Andric  wrote:
> > 
> > On 23 Jul 2017, at 14:36, Sydney Meyer via freebsd-stable 
> >  wrote:
> >> 
> >> are there any "last-minute" issues/changes with the 11.1-RELEASE build?
> >> 
> >> The 11.1-RELEASE appears to be gone from SVN after the switch from 
> >> releng/11.1 to -RELEASE and the Press Release Schedule seems to have 
> >> changed without notice.
> >> 
> >> Not that this is an issue to me, as the releases aren't officially 
> >> released until @re sends the announcment email, i'm just curious..
> > 
> > Don't worry, the release engineers are furiously working behind the
> > scenes to get all the correct bits built, verified and uploaded.  This
> > will just take a few days.  The schedule is here:
> > 
> > https://www.freebsd.org/releases/11.1R/schedule.html
> > 
> > It is also perfectly normal for stable/11 to be renamed -STABLE again,
> > this is the usual procedure after tagging releases in releng.
> > 
> > -Dimitry
> > 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


signature.asc
Description: PGP signature


Re: 11.1-RELEASE from SVN

2017-07-23 Thread Sydney Meyer via freebsd-stable
Hi Dimitry,

thank you for your reply.

Please excuse me if i came across impatiently, that was not my intention.

It's just that i find the way the project handles events like these, new 
releases, security incidents, etc. very interesting and just generally love to 
hear about it.

Glen, e.g., sent a revised RC-Announcement Mail because of a omitted PGP 
signature. Gotta love this attention to detail.

So i saw the commit with the anticipated 11.1-RELEASE date in the UPDATING file 
and the updated schedule on the website and thought, i just ask..

The (seamingly) disappeared 11.1.0-RELEASE was my mistake (old pathrev in the 
url on svnweb).

Anyhow, i am sure everybody hard at work and i'm looking forward to (another) 
really awesome dot-release.

Have a nice rest-weekend..

Sydney

> On 23. Jul 2017, at 14:53, Dimitry Andric  wrote:
> 
> On 23 Jul 2017, at 14:36, Sydney Meyer via freebsd-stable 
>  wrote:
>> 
>> are there any "last-minute" issues/changes with the 11.1-RELEASE build?
>> 
>> The 11.1-RELEASE appears to be gone from SVN after the switch from 
>> releng/11.1 to -RELEASE and the Press Release Schedule seems to have changed 
>> without notice.
>> 
>> Not that this is an issue to me, as the releases aren't officially released 
>> until @re sends the announcment email, i'm just curious..
> 
> Don't worry, the release engineers are furiously working behind the
> scenes to get all the correct bits built, verified and uploaded.  This
> will just take a few days.  The schedule is here:
> 
> https://www.freebsd.org/releases/11.1R/schedule.html
> 
> It is also perfectly normal for stable/11 to be renamed -STABLE again,
> this is the usual procedure after tagging releases in releng.
> 
> -Dimitry
> 

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11: Kernel page fault with the following non-sleepable locks held: CAM device lock

2017-07-23 Thread Eugene Grosbein
On 23.07.2017 20:02, Eugene Grosbein wrote:

> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0xa
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80e494e1
> stack pointer   = 0x28:0xfe04675ff670
> frame pointer   = 0x28:0xfe04675ff670
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 1387 (smartd)
> trap number = 12
> panic: page fault
> cpuid = 0

I also have a screenshot of another case of same panic that notes
lock order reversal: (Giant after non-sleepable):

http://www.grosbein.net/freebsd/crash.png

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


stable/11: Kernel page fault with the following non-sleepable locks held: CAM device lock

2017-07-23 Thread Eugene Grosbein
Hi!

Long story short: stable/11 r321371 started to panic at the moment of smartd 
invocation
after my SSD died.

I have Intel motherboard with graid-supported pseudo-raid.
I use it in RAID1 mode with one HDD and one SSD.

Yesterday the SSD has died: it is not detected by BIOS nor FreeBSD kernel 
(timeouts).
This went unnoticed by me as graid just disconnected it on-the-fly:

kernel: ahcich5: Timeout on slot 24 port 0
kernel: ahcich5: is  cs  ss 0100 rs 0100 tfd 40 serr 
 cmd d817
kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 02 ad 12 9e 40 3b 00 
00 00 00 00
kernel: (ada1:ahcich5:0:0:0): CAM status: Command timeout
kernel: (ada1:ahcich5:0:0:0): Retrying command
kernel: ahcich5: AHCI reset: device not ready after 31000ms (tfd = 0080)
[skip]
kernel: ada1 at ahcich5 bus 0 scbus2 target 0 lun 0
kernel: ada1:  s/n JYKJ550855860139 detached
[skip]
kernel: (ada1:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 02 ad 12 9e 40 3b 00 
00 00 00 00
kernel: (ada1:ahcich5:0:0:0): CAM status: Command timeout
kernel: (ada1:ahcich5:0:0:0): Error 5, Periph was invalidated
kernel: GEOM_RAID: Write failed: failing subdisk. 
ada1[WRITE(offset=269389066240, length=32768)]
kernel: GEOM_RAID: Intel-c291fe96: Disk ada1 state changed from ACTIVE to 
FAILED.
kernel: GEOM_RAID: Intel-c291fe96: Subdisk r0:1-ada1 state changed from ACTIVE 
to FAILED.
kernel: GEOM_RAID: Intel-c291fe96: Volume r0 state changed from OPTIMAL to 
DEGRADED.
kernel: GEOM_RAID: Intel-c291fe96: Disk ada1 state changed from FAILED to 
OFFLINE.
kernel: GEOM_RAID: Intel-c291fe96: Subdisk r0:1-[unknown] state changed from 
FAILED to NONE.
kernel: GEOM_RAID: Write failed: failing subdisk. 
ada1[WRITE(offset=270699851776, length=32768)]
kernel: GEOM_RAID: Intel-c291fe96: Warning! Fail request to a disk in a wrong 
state (OFFLINE)!

Unaware of that, I've performed standard source upgrade from 11.1-PRERELEASE 
r318692
to stable/11 r321371 that went smooth. After reboot, BIOS was unable to detect 
SSD,
reported degraded state of the mirror and booted FreeBSD using second mirror 
component (HDD).

After long timeout, the kernel could not detect dead SSD too and continued to 
run with degraded mirror
just fine: the system went multiuser mode and almost finished loading when rcNG 
started smartd.
The kernel panices that moment. This is repeatable: I can cold-boot to single 
user mode, start smartd
and get same panic. This is debugging kernel and I managed to obtain crashdump.

kgdb session follows:

<118>Starting smartd.
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex CAM device lock (CAM device lock) r = 0 
(0xf8000cf71c60) locked @ /home/src/sys/cam/scsi/scsi_pass.c:1766
stack backtrace:
#0 0x80a12620 at witness_debugger+0x70
#1 0x80a13a4e at witness_warn+0x45e
#2 0x80e4b693 at trap_pfault+0x53
#3 0x80e4ae3e at trap+0x29e
#4 0x80e2ed91 at calltrap+0x8
#5 0x8033873a at passsendccb+0x6a
#6 0x80337836 at passdoioctl+0x3c6
#7 0x80337052 at passioctl+0x22
#8 0x80878c78 at devfs_ioctl_f+0x138
#9 0x80a18184 at kern_ioctl+0x2c4
#10 0x80a17e4f at sys_ioctl+0x16f
#11 0x80e4c05a at amd64_syscall+0x53a
#12 0x80e2f07b at Xfast_syscall+0xfb

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xa
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80e494e1
stack pointer   = 0x28:0xfe04675ff670
frame pointer   = 0x28:0xfe04675ff670
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 1387 (smartd)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe04675ff250
vpanic() at vpanic+0x186/frame 0xfe04675ff2d0
panic() at panic+0x43/frame 0xfe04675ff330
trap_fatal() at trap_fatal+0x322/frame 0xfe04675ff380
trap_pfault() at trap_pfault+0x62/frame 0xfe04675ff3e0
trap() at trap+0x29e/frame 0xfe04675ff5a0
calltrap() at calltrap+0x8/frame 0xfe04675ff5a0
--- trap 0xc, rip = 0x80e494e1, rsp = 0xfe04675ff670, rbp = 
0xfe04675ff670 ---
copyin() at copyin+0x41/frame 0xfe04675ff670
passsendccb() at passsendccb+0x6a/frame 0xfe04675ff6f0
passdoioctl() at passdoioctl+0x3c6/frame 0xfe04675ff7a0
passioctl() at passioctl+0x22/frame 0xfe04675ff7e0
devfs_ioctl_f() at devfs_ioctl_f+0x138/frame 0xfe04675ff840
kern_ioctl() at kern_ioctl+0x2c4/frame 0xfe04675ff8a0
sys_ioctl() at sys_ioctl+0x16f/frame 0xfe04675ff980
amd64_syscall() at amd64_syscall+0x53a/frame 0xfe04675ffab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe04675ffab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip 

Re: 11.1-RELEASE from SVN

2017-07-23 Thread Dimitry Andric
On 23 Jul 2017, at 14:36, Sydney Meyer via freebsd-stable 
 wrote:
> 
> are there any "last-minute" issues/changes with the 11.1-RELEASE build?
> 
> The 11.1-RELEASE appears to be gone from SVN after the switch from 
> releng/11.1 to -RELEASE and the Press Release Schedule seems to have changed 
> without notice.
> 
> Not that this is an issue to me, as the releases aren't officially released 
> until @re sends the announcment email, i'm just curious..

Don't worry, the release engineers are furiously working behind the
scenes to get all the correct bits built, verified and uploaded.  This
will just take a few days.  The schedule is here:

https://www.freebsd.org/releases/11.1R/schedule.html

It is also perfectly normal for stable/11 to be renamed -STABLE again,
this is the usual procedure after tagging releases in releng.

-Dimitry



signature.asc
Description: Message signed with OpenPGP


11.1-RELEASE from SVN

2017-07-23 Thread Sydney Meyer via freebsd-stable
Hello @re,

are there any "last-minute" issues/changes with the 11.1-RELEASE build?

The 11.1-RELEASE appears to be gone from SVN after the switch from releng/11.1 
to -RELEASE and the Press Release Schedule seems to have changed without notice.

Not that this is an issue to me, as the releases aren't officially released 
until @re sends the announcment email, i'm just curious..

Thanks..

Sydney
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


stable/11 debugging kernel unable to produce crashdump again

2017-07-23 Thread Eugene Grosbein
On 14.01.2017 18:40, Eugene Grosbein wrote:
> 
>> I suspect that this is because we only stop the scheduler upon a panic
>> if SMP is configured. Can you retest with the patch below applied?
>>
>> Index: sys/kern/kern_shutdown.c
>> ===
>> --- sys/kern/kern_shutdown.c (revision 312082)
>> +++ sys/kern/kern_shutdown.c (working copy)
>> @@ -713,6 +713,7 @@
>>  CPU_CLR(PCPU_GET(cpuid), _cpus);
>>  stop_cpus_hard(other_cpus);
>>  }
>> +#endif
>>  
>>  /*
>>   * Ensure that the scheduler is stopped while panicking, even if panic
>> @@ -719,7 +720,6 @@
>>   * has been entered from kdb.
>>   */
>>  td->td_stopsched = 1;
>> -#endif
>>  
>>  bootopt = RB_AUTOBOOT;
>>  newpanic = 0;
>>
>>
> 
> Indeed, my router is uniprocessor system and your patch really solves the 
> problem.
> Now kernel generates crashdump just fine in case of panic. Please commit the 
> fix, thanks!

Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:

- "call doadump" from DDB prompt works just fine;
- "shutdown -r now" reboots the system without problems;
- "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just 
afer showing uptime
instead of continuing with crashdump generation; same if "real" panic occurs.

Same for debug.minidump set to 1 or 0. How do I debug this?

Eugene Grosbein

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-23 Thread Konstantin Belousov
On Sat, Jul 22, 2017 at 10:51:42PM -0700, Don Lewis wrote:
> > The stack is aligned to a 4096 (0x1000) boundary.  The first access to a
> > local variable below 0xfe085cfa5000 is what triggered the trap.  The
> > other end of the stack must be at 0xfe085cfa9000 less a bit. I don't
> > know why the first stack pointer value in the trace is
> > 0xfe085cfa8a10. That would seem to indicate that amd64_syscall is
> > using ~1500 bytes of stack space.
> 
> Actually there could be quite a bit of CPU context that gets saved. That
> could be sizeable on amd64.

Yes, the usermode trap frame is located on the kernel stack.  Also, pcb
and usermode FPU save area (FPU == all non-general purpose x86 registers,
including XMM/AVX/AVX512 as implemented by CPU) are on the stack.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"