Re: CURRENT r331284: crashing with USB

2018-03-21 Thread Warner Losh
On Wed, Mar 21, 2018 at 11:17 PM, Warner Losh  wrote:

>
>
> On Wed, Mar 21, 2018 at 8:42 PM, Hyun Hwang 
> wrote:
>
>> On Wednesday, March 21, 2018, 12:07 PM (UTC+0100), "Hartmann, O." <
>> ohartm...@walstatt.org> wrote:
>> > Hello.
>> >
>> > Incident: CURRENT r331284 can be brought down reliably with an USB
>> > flash drive plugged in and out without mounting or doing anything with
>> > it.
>> >
>> > [...]
>> >
>> > Does anyone else observe this bug?
>> >
>>
>> Can confirm: whenever I plug my Transcend USB microSD reader into my
>> builder (amd64, r331284), the kernel does attach da0 then immediately
>> panics and falls down to `db>` prompt.
>>
>
> Do you have a traceback?
>

actually, can you test https://reviews.freebsd.org/D14792 for me please?
The hardware I bought to provoke this wound up in my wife's bags for a trip
she's still on and I won't be able to test until Friday (which is why I've
been slow to fix this). I hesitate to commit another change I'm sure will
fix it on the off chance I'll be wrong again...

Occasionally, we'll send a TUR to the device. To make sure that the periph
doesn't go away while that's going on, we acquire a reference to the
device. When the command completes we release it. The problem is that
there's a race that the new asserts I put in uncovered. If we've sent a TUR
to the device, but it hasn't completed when damediapoll timeout fires, it
will think that we can send a TUR since we cleared the TUR work flag. This
bumps the count, and bang! we have two TURs in flight. The Transend USB
reader, at least the one I got takes a long time for TUR to return, so this
can trigger the race.  The above fix simply says that if a TUR is in
flight, don't schedule another one. We'll poll again later anyway, and we
have the TUR in flight already, so we'll accomplish the goal of TUR even
though we chose to omit one we might otherwise do.

Warner


> Warner
>
>
>> > I can plugin the USB and then unplug it and after two or three times
>> doing this, the box goes down.
>>
>> I did not even have to plug-unplug the reader three times; plug the
>> reader in and bam! immediate panic.
>>
>> AFAIK, r331115 did not have this issue because I was able to update my
>> RPi 2 with the very reader from the very builder.
>> I managed to salvage kernel binary dump; in case the dump is needed,
>> please let me know.
>> --
>> Hyun Hwang
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org
>> "
>>
>
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT r331284: crashing with USB

2018-03-21 Thread Warner Losh
On Wed, Mar 21, 2018 at 8:42 PM, Hyun Hwang  wrote:

> On Wednesday, March 21, 2018, 12:07 PM (UTC+0100), "Hartmann, O." <
> ohartm...@walstatt.org> wrote:
> > Hello.
> >
> > Incident: CURRENT r331284 can be brought down reliably with an USB
> > flash drive plugged in and out without mounting or doing anything with
> > it.
> >
> > [...]
> >
> > Does anyone else observe this bug?
> >
>
> Can confirm: whenever I plug my Transcend USB microSD reader into my
> builder (amd64, r331284), the kernel does attach da0 then immediately
> panics and falls down to `db>` prompt.
>

Do you have a traceback?

Warner


> > I can plugin the USB and then unplug it and after two or three times
> doing this, the box goes down.
>
> I did not even have to plug-unplug the reader three times; plug the reader
> in and bam! immediate panic.
>
> AFAIK, r331115 did not have this issue because I was able to update my RPi
> 2 with the very reader from the very builder.
> I managed to salvage kernel binary dump; in case the dump is needed,
> please let me know.
> --
> Hyun Hwang
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT r331284: crashing with USB

2018-03-21 Thread bob prohaska
On Wed, Mar 21, 2018 at 12:07:18PM +0100, Hartmann, O. wrote:
> 
> not the ZFS. I can plugin the USB and then unplug it and after two or
> three times doing this, the box goes down.
> 
> 
> Does anyone else observe this bug?
> 

An RPI2 running r331179 didn't crash, but it did complain about setting
addresses. The same error crops up from time to time when pl2303 serial
adapters have been in use for some time (hours):

login: ugen0.5:  at usbus0 (disconnected)
uftdi0: at uhub1, port 4, addr 5 (disconnected)
uftdi0: detached
ugen0.5:  at usbus0
umass1 on uhub1
umass1:  on usbus0
umass1:  SCSI over Bulk-Only; quirks = 0x0100
umass1:1:1: Attached to scbus1
da1 at umass-sim1 bus 1 scbus1 target 0 lun 0
da1:  Removable Direct Access SPC-4 SCSI device
da1: Serial Number AA010428162242131598
da1: 40.000MB/s transfers
da1: 59836MB (122544516 512 byte sectors)
da1: quirks=0x2


FreeBSD/arm (www.zefox.com) (ttyu0)

login: ugen0.5:  at usbus0 (disconnected)
umass1: at uhub1, port 5, addr 5 (disconnected)
da1 at umass-sim1 bus 1 scbus1 target 0 lun 0
da1:   s/n AA010428162242131598 detached
(da1:umass-sim1:1:0:0): Periph destroyed
umass1: detached
ugen0.5:  at usbus0
umass1 on uhub1
umass1:  on usbus0
umass1:  SCSI over Bulk-Only; quirks = 0x0100
umass1:1:1: Attached to scbus1
da1 at umass-sim1 bus 1 scbus1 target 0 lun 0
da1:  Removable Direct Access SPC-4 SCSI device
da1: Serial Number AA010428162242131598
da1: 40.000MB/s transfers
da1: 59836MB (122544516 512 byte sectors)
da1: quirks=0x2
ugen0.5:  at usbus0 (disconnected)
umass1: at uhub1, port 5, addr 5 (disconnected)
da1 at umass-sim1 bus 1 scbus1 target 0 lun 0
da1:   s/n AA010428162242131598 detached
(da1:umass-sim1:1:0:0): Periph destroyed
umass1: detached
usbd_req_re_enumerate: addr=5, set address failed! (USB_ERR_IOERROR, ignored)
usbd_setup_device_desc: getting device descriptor at addr 5 failed, 
USB_ERR_IOERROR
usbd_req_re_enumerate: addr=5, set address failed! (USB_ERR_IOERROR, ignored)
usbd_setup_device_desc: getting device descriptor at addr 5 failed, 
USB_ERR_IOERROR
usb_alloc_device: Failure selecting configuration index 0:USB_ERR_IOERROR, port 
5, addr 5 (ignored)
ugen0.5:  at usbus0
ugen0.5:  at usbus0 (disconnected)
ugen0.5:  at usbus0
umass1 on uhub1
umass1:  on usbus0
umass1:  SCSI over Bulk-Only; quirks = 0x0100
umass1:1:1: Attached to scbus1
da1 at umass-sim1 bus 1 scbus1 target 0 lun 0
da1:  Removable Direct Access SPC-4 SCSI device
da1: Serial Number AA010428162242131598
da1: 40.000MB/s transfers
da1: 59836MB (122544516 512 byte sectors)
da1: quirks=0x2

Thanks for reading, I hope it's useful.

bob prohaska

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT r331284: crashing with USB

2018-03-21 Thread Hyun Hwang
On Wednesday, March 21, 2018, 12:07 PM (UTC+0100), "Hartmann, O." 
 wrote:
> Hello.
> 
> Incident: CURRENT r331284 can be brought down reliably with an USB
> flash drive plugged in and out without mounting or doing anything with
> it.
> 
> [...]
> 
> Does anyone else observe this bug?
> 

Can confirm: whenever I plug my Transcend USB microSD reader into my builder 
(amd64, r331284), the kernel does attach da0 then immediately panics and falls 
down to `db>` prompt.

> I can plugin the USB and then unplug it and after two or three times doing 
> this, the box goes down.

I did not even have to plug-unplug the reader three times; plug the reader in 
and bam! immediate panic.

AFAIK, r331115 did not have this issue because I was able to update my RPi 2 
with the very reader from the very builder.
I managed to salvage kernel binary dump; in case the dump is needed, please let 
me know.
-- 
Hyun Hwang
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pcm1:virtual:dsp1.vp0: play interrupt timeout, channel dead

2018-03-21 Thread Nikolaj Thygesen

On 03/19/2018 00:41, Cy Schubert wrote:

The other thing you might want to check out is if you multiboot your
laptop that any non-FreeBSD operating system may put hardware into an
inconsistent state. For example, my Acer laptop loses sound if I boot
Windows then boot FreeBSD. The workaround is either adjust the HDA
inputs/outputs through sysctl or simply power cycle the laptop.
I fixed that issue by changing the Windows sound driver to the generic 
hda driver instead of the Realtek driver.


    N :o)
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Call for Testing: UEFI Changes

2018-03-21 Thread Kyle Evans
Hello!

A number of changes have gone in recently pertaining to UEFI booting
and UEFI runtime services. The changes with the most damaging
potential are:

We now put UEFI runtime services into virtual address mode, fixing
runtime services with U-Boot/UEFI as well as the firmware
implementation in many Lenovos. The previously observed behavior was a
kernel panic upon invocation of efibootmgr/efivar, or a kernel panic
just loading efirt.ko or compiling EFIRT into the kernel.

Graphics mode selection is now done differently to avoid regression
caused by r327058 while still achieving the same effect. The observed
regression was that the kernel would usually end up drawing
incorrectly at the old resolution on a subset of the screen, due to
incorrect framebuffer information.

Explicit testing of these changes, the latest of which happened in
r331326, and any feedback from this testing would be greatly
appreciated. Testing should be done with either `options EFIRT` in
your kernel config or efirt.ko loaded along with updated bootloader
bits.

I otherwise plan to MFC commits involved with the above-mentioned
changes by sometime in the first week of April, likely no earlier than
two (2) weeks from now on April 4th.

Thanks,

Kyle Evans
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: freebsd-update: to a specific patch level - help please?

2018-03-21 Thread Rainer Duffner


> Am 21.03.2018 um 22:12 schrieb Derek (freebsd lists) 
> <48225...@razorfever.net>:
> 
> Hi!
> 
> I was surprised when using freebsd-update, that there was no way to specify a 
> patch level.



AFAIK, the usual answer to these kinds of requests is: „Run your own 
freebsd-update server“.

Mirroring one of the existing ones is AFAIK neither guaranteed to work nor 
desired by the current „administration“.

I’ve contemplated doing both, but never had enough heart-ache to do it and 
never thought the pay-off would be greater than the potential problems.

It’s also a somewhat transient problem now because - AFAIK - FreeBSD will see 
packaged base and you can probably mirror those packages and snapshot the 
directory at any point in time.
And/Or it’s just easier to create these base-packages yourselves vs. running 
your own freebsd-update server.






___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


freebsd-update: to a specific patch level - help please?

2018-03-21 Thread Derek (freebsd lists)

Hi!

I was surprised when using freebsd-update, that there was no way 
to specify a patch level.


In my day to day, I need to ensure security patches are applied.

I also need to assess the impact of patches, and ensure 
consistency (ie. versions) in my environments.  This can take time.


Here's a story for context, please feel free to skip:

  We are planning to cut our 10.3-RELEASE infrastructure over to 
11.1-RELEASE before the end of the month, because it's EoL in 
April.  We updated and cut over our production load balancer 
March 6th (and patted ourselves on the back for being ahead of 
schedule), and within less than 12 hours, updated our backup load 
balancers.  Unfortunately, we're now on ever so slightly 
different versions (-p6/-p7), and we're not affected by the -p7 
problems.  This makes my eye twitch slightly, especially when -p7 
was the first patch of 2018.


  Now we need to upgrade our application servers, that are 
running our trusted code, and -p8 comes out.


  I'm nervous about just applying -p8, but I definitely want to 
upgrade to 11.1-RELEASE asap.


  After assessing the impact of -p8 on our infrastructure, I 
feel the security risk is relatively low in the short term (and 
we've waited this long anyway), but I feel the probability of 
introducing unintended side-effects is high, and want some time 
to test and asses.


/story

It would seem to me, for repeatable environments, that binary 
updates from FreeBSD that can be pinned to specific version are 
highly desireable.


I've gone ahead and created a patch for my use here:

https://github.com/derekmarcotte/freebsd/commit/009015a7dda5d1f1c46f4706c222614f17fb535c

(there's a 10.3-specific one here:
https://github.com/derekmarcotte/freebsd/commit/458879f36ae984add0ff525fb6c2765fcf1fba67
)

I'd be happy to open a PR, and to iterate and improve on this 
PoC, but if there's no support from the project, I'll keep it to 
myself.


I guess what I'm asking is, for these reasons, is anyone willing 
to work with me (in mentorship+commit bits) to add this feature 
(maybe not this particular implementation) to freebsd-update?


Thanks!
Derek


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


problem with [intr{swi4: clock (0)}]

2018-03-21 Thread AN

Hi:

I would appreciate any help with this issue, this is a new machine built 
in the last week and if it is a hardware issue I want to return it.  The 
problem seems to have started in the last 24 hours or so.  I am seeing a 
really high cpu utilization for [intr{swi4: clock (0)}].  I have tried a 
couple things to troubleshoot:


rebuilt world and kernel
turned off Virtualbox ( did not load kernel module)
turned off in BIOS network, audio
installed disk from another similar machine, booted and it shows the exact 
same problem.


Here is what I see in top:
last pid: 56553;  load averages:  0.09,  0.44,  0.26 
up 0+00:04:38  11:25:24

472 processes: 14 running, 418 sleeping, 40 waiting
CPU 0:   0.0% user,  0.0% nice,  0.0% system, 27.5% interrupt, 72.5% idle
CPU 1:   0.7% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.3% idle
CPU 2:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 3:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 4:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 5:   0.0% user,  0.0% nice,  0.7% system,  0.0% interrupt, 99.3% idle
CPU 6:   0.8% user,  0.0% nice,  0.8% system,  0.0% interrupt, 98.5% idle
CPU 7:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 8:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 9:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 10:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 11:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 1096M Active, 53M Inact, 300K Laundry, 568M Wired, 290M Buf, 14G Free
Swap: 21G Total, 21G Free

  PID USERNAME   PRI NICE   SIZERES STATE   C   TIMEWCPU COMMAND
   11 root   155 ki31 0K   192K CPU11   4:32 100.00% 
[idle{idle: cpu1}]
   11 root   155 ki31 0K   192K CPU88   4:31 100.00% 
[idle{idle: cpu8}]
   11 root   155 ki31 0K   192K CPU99   4:30 100.00% 
[idle{idle: cpu9}]
   11 root   155 ki31 0K   192K CPU22   4:30 100.00% 
[idle{idle: cpu2}]
   11 root   155 ki31 0K   192K CPU10  10   4:30 100.00% 
[idle{idle: cpu10}]
   11 root   155 ki31 0K   192K CPU55   4:27 100.00% 
[idle{idle: cpu5}]
   11 root   155 ki31 0K   192K RUN11   4:25  99.82% 
[idle{idle: cpu11}]
   11 root   155 ki31 0K   192K CPU66   4:30  98.93% 
[idle{idle: cpu6}]
   11 root   155 ki31 0K   192K CPU77   4:31  96.83% 
[idle{idle: cpu7}]
   11 root   155 ki31 0K   192K CPU33   4:27  94.94% 
[idle{idle: cpu3}]
   11 root   155 ki31 0K   192K CPU44   4:29  94.11% 
[idle{idle: cpu4}]
   11 root   155 ki31 0K   192K RUN 0   3:45  71.60% 
[idle{idle: cpu0}]
   12 root   -60- 0K   656K CPU00   0:53  28.43% 
[intr{swi4: clock (0)}]



28.20% [intr{swi4: clock (0)}] - the process is using close to 30% cpu 
time.


I have no idea what could be causing this, any advice would be 
appreciated.  Thanks in advance.


12 root   -60- 0K   656K WAIT0   1:27  28.80% [intr{swi4: 
clock (0)}]


systat shows:

  1 usersLoad  0.20  0.16  0.18  Mar 21 11:35
   Mem usage:  11%Phy  1%Kmem
Mem: KBREALVIRTUAL  VN PAGER   SWAP 
PAGER
Tot   Share  TotShareFree   in   out in 
out

Act 1357104  111928  4267688   193328  14176K  count
All 1357984  112656  4285556   211028  pages
Proc: 
Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt  8 ioflt  996k 
total
  1 314   2.0  296 2213  133  1.0  155cow 
atkbd0 1
8 zfod   996k 
cpu0:timer
 0.1%Sys   1.9%Intr  0.1%User  0.0%Nice 98.0%Idle ozfod68 
xhci0 259
||||||||||   %ozfod 
ahci0 260
+ daefr 5 re0 
261
 4 dtbuf  prcfr 
hdac0 262
Namei Name-cache   Dir-cache349771 desvn   21 totfr 
hdac1 280
   Callshits   %hits   %  3740 numvn  react 4 
cpu6:timer
 474 474 100   958 frevn  pdwak 5 
cpu10:time
  456 pdpgs11 
cpu7:timer
Disks  ada0 pass0 intrn10 
cpu11:time
KB/t   0.00  0.00  469596 wire  3 
cpu1:timer
tps   0 0 1121780 act   2 
cpu8:timer
MB/s   0.00  0.00  170492 inact 8 
cpu9:timer
%busy 0 0 300 laund 5 
cpu4:timer
 14516016 free  2 
cpu2:timer
   183472 buf   7 
cpu5:timer
 

Re: "panic: Unholding 5 with cnt = 0" head/amd64 @r331290

2018-03-21 Thread Warner Losh
On Mar 21, 2018 7:02 AM, "David Wolfskill"  wrote:

On Wed, Mar 21, 2018 at 04:26:57PM +0400, Roman Bogorodskiy wrote:
> ...
> > Anything I can/should do to poke at it before trying to capture a dump?
>
> Looks like it's related to:
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226510#c18
> .

Hmm...  OK; thanks.  I went ahead and grabbed crash dump; it may be
found at  (along
with core.txt).

Also, my laptop (running a kernel based on GENERIC, but with a few bits
snipped out and things like IPFIREWALL_DEFAULT_TO_ACCEPT turned off) did
not exhibit an issue.

I am presently re-building head @r331290 on each machine (on a different
slice); this time, without the Forth loader stuff being built (and with
the Lua loader stuff being built).  I mention this, not because I think
the loader has anything to do with this, but because at this time, I
have but a single failure out of two possible; smoke-testing the current
builds will provide a chance to see if I somehow did something weird to
the build machine earlier.

Anyway, Warner should know how to reach me. :-)



The fix is in 331291.. at least the back out for other issues...

Warner

Peace,
david
--
David H. Wolfskill  da...@catwhisker.org
An investigator who doesn't make a perp nervous isn't doing his job.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: "panic: Unholding 5 with cnt = 0" head/amd64 @r331290

2018-03-21 Thread David Wolfskill
On Wed, Mar 21, 2018 at 04:26:57PM +0400, Roman Bogorodskiy wrote:
> ...
> > Anything I can/should do to poke at it before trying to capture a dump?
> 
> Looks like it's related to:
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226510#c18
> .

Hmm...  OK; thanks.  I went ahead and grabbed crash dump; it may be
found at  (along
with core.txt).

Also, my laptop (running a kernel based on GENERIC, but with a few bits
snipped out and things like IPFIREWALL_DEFAULT_TO_ACCEPT turned off) did
not exhibit an issue.

I am presently re-building head @r331290 on each machine (on a different
slice); this time, without the Forth loader stuff being built (and with
the Lua loader stuff being built).  I mention this, not because I think
the loader has anything to do with this, but because at this time, I
have but a single failure out of two possible; smoke-testing the current
builds will provide a chance to see if I somehow did something weird to
the build machine earlier.

Anyway, Warner should know how to reach me. :-)

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
An investigator who doesn't make a perp nervous isn't doing his job.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: "panic: Unholding 5 with cnt = 0" head/amd64 @r331290

2018-03-21 Thread Warner Losh
i've reverted r331273. In trying to fix the too many refs problem, I
created the too few refs problem.

Warner

On Wed, Mar 21, 2018 at 6:26 AM, Roman Bogorodskiy 
wrote:

>   David Wolfskill wrote:
>
> > This is on my build machine, running a GENERIC kernel.  (Laptop is still
> > building lib32 shim libraries as I type).
> >
> > Here's a copy/paste from the serial console:
> >
> > da3: Attempt to query device size failed: NOT READY, Medium not present
> > da3: quirks=0x2
> > da3: Delete methods: 
> > GEOM: new disk da1
> > GEOM: new disk da2
> > GEOM: new disk da3
> > (da1:umass-sim0:0:0:1): PREVENT ALLOW MEDIUM REMOVAL not supported.
> > ugen0.4:  at usbus0
> > (dpanic: Unholding 5 with cnt = 0
> > cpuid = 3
> > time = 1521633742
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe4913c0
> > vpanic() at vpanic+0x18d/frame 0xfe491420
> > panic() at panic+0x43/frame 0xfe491480
> > dadone() at dadone+0x1cc9/frame 0xfe4919e0
> > xpt_done_process() at xpt_done_process+0x390/frame 0xfe491a20
> > xpt_done_td() at xpt_done_td+0xf6/frame 0xfe491a70
> > fork_exit() at fork_exit+0x84/frame 0xfe491ab0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe491ab0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> > [ thread pid 15 tid 100065 ]
> > Stopped at  kdb_enter+0x3b: movq$0,kdb_why
> > db>
> >
> >
> > Anything I can/should do to poke at it before trying to capture a dump?
>
> Looks like it's related to:
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226510#c18
>
> > Peace,
> > david
> > --
> > David H. Wolfskillda...@catwhisker.org
> > An investigator who doesn't make a perp nervous isn't doing his job.
> >
> > See http://www.catwhisker.org/~david/publickey.gpg for my public key.
>
>
>
> Roman Bogorodskiy
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: "panic: Unholding 5 with cnt = 0" head/amd64 @r331290

2018-03-21 Thread Roman Bogorodskiy
  David Wolfskill wrote:

> This is on my build machine, running a GENERIC kernel.  (Laptop is still
> building lib32 shim libraries as I type).
> 
> Here's a copy/paste from the serial console:
> 
> da3: Attempt to query device size failed: NOT READY, Medium not present
> da3: quirks=0x2
> da3: Delete methods: 
> GEOM: new disk da1
> GEOM: new disk da2
> GEOM: new disk da3
> (da1:umass-sim0:0:0:1): PREVENT ALLOW MEDIUM REMOVAL not supported.
> ugen0.4:  at usbus0
> (dpanic: Unholding 5 with cnt = 0
> cpuid = 3
> time = 1521633742
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe4913c0
> vpanic() at vpanic+0x18d/frame 0xfe491420
> panic() at panic+0x43/frame 0xfe491480
> dadone() at dadone+0x1cc9/frame 0xfe4919e0
> xpt_done_process() at xpt_done_process+0x390/frame 0xfe491a20
> xpt_done_td() at xpt_done_td+0xf6/frame 0xfe491a70
> fork_exit() at fork_exit+0x84/frame 0xfe491ab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe491ab0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 15 tid 100065 ]
> Stopped at  kdb_enter+0x3b: movq$0,kdb_why
> db> 
> 
> 
> Anything I can/should do to poke at it before trying to capture a dump?

Looks like it's related to:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226510#c18

> Peace,
> david
> -- 
> David H. Wolfskillda...@catwhisker.org
> An investigator who doesn't make a perp nervous isn't doing his job.
> 
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.



Roman Bogorodskiy


signature.asc
Description: PGP signature


"panic: Unholding 5 with cnt = 0" head/amd64 @r331290

2018-03-21 Thread David Wolfskill
This is on my build machine, running a GENERIC kernel.  (Laptop is still
building lib32 shim libraries as I type).

Here's a copy/paste from the serial console:

da3: Attempt to query device size failed: NOT READY, Medium not present
da3: quirks=0x2
da3: Delete methods: 
GEOM: new disk da1
GEOM: new disk da2
GEOM: new disk da3
(da1:umass-sim0:0:0:1): PREVENT ALLOW MEDIUM REMOVAL not supported.
ugen0.4:  at usbus0
(dpanic: Unholding 5 with cnt = 0
cpuid = 3
time = 1521633742
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe4913c0
vpanic() at vpanic+0x18d/frame 0xfe491420
panic() at panic+0x43/frame 0xfe491480
dadone() at dadone+0x1cc9/frame 0xfe4919e0
xpt_done_process() at xpt_done_process+0x390/frame 0xfe491a20
xpt_done_td() at xpt_done_td+0xf6/frame 0xfe491a70
fork_exit() at fork_exit+0x84/frame 0xfe491ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe491ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 15 tid 100065 ]
Stopped at  kdb_enter+0x3b: movq$0,kdb_why
db> 


Anything I can/should do to poke at it before trying to capture a dump?

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
An investigator who doesn't make a perp nervous isn't doing his job.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: CURRENT r331284: crashing with USB

2018-03-21 Thread Hans Petter Selasky

On 03/21/18 12:07, Hartmann, O. wrote:

Hello.

Incident: CURRENT r331284 can be brought down reliably with an USB
flash drive plugged in and out without mounting or doing anything with
it.

I first recognized the incident with a ZFS on a SanDisk 32GB USB 3.0
flash drive. Plugging the USB flash and typing "zpool import" revealed
the very first time I issue this command the existence of the ZFS
fielsystem. Usually, I import then this USB drive for maintenance
purposes. Now, typing "zpool import" a second time, nothing is shown at
all. I see that umass0 has been destroyed - although the USB drive is
still plugged in.

Pulling the USB flash drive without having actually imported the
ZFS makes CURRENT crash and reboot.

I tried different USB flash drives, 3.0, 2.0, different boxes running
CURRENT, different hardware (Notebooks, Fujitsu workstations, HP
servers). It seems that the USB subsystem does have a serious problem -
not the ZFS. I can plugin the USB and then unplug it and after two or
three times doing this, the box goes down.


Does anyone else observe this bug?

By the way: all ZFS USB drives I use or all other USB flash drives
cause no problem on FreeBSD 11.1-RELENG-p7!



I've seen something similar, but I thought the issue was fixed by:

https://reviews.freebsd.org/D14456

--HPS

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


CURRENT r331284: crashing with USB

2018-03-21 Thread Hartmann, O.
Hello.

Incident: CURRENT r331284 can be brought down reliably with an USB
flash drive plugged in and out without mounting or doing anything with
it.

I first recognized the incident with a ZFS on a SanDisk 32GB USB 3.0
flash drive. Plugging the USB flash and typing "zpool import" revealed
the very first time I issue this command the existence of the ZFS
fielsystem. Usually, I import then this USB drive for maintenance
purposes. Now, typing "zpool import" a second time, nothing is shown at
all. I see that umass0 has been destroyed - although the USB drive is
still plugged in.

Pulling the USB flash drive without having actually imported the
ZFS makes CURRENT crash and reboot.

I tried different USB flash drives, 3.0, 2.0, different boxes running
CURRENT, different hardware (Notebooks, Fujitsu workstations, HP
servers). It seems that the USB subsystem does have a serious problem -
not the ZFS. I can plugin the USB and then unplug it and after two or
three times doing this, the box goes down.


Does anyone else observe this bug?

By the way: all ZFS USB drives I use or all other USB flash drives
cause no problem on FreeBSD 11.1-RELENG-p7!


Kind regards,

Oliver 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS i/o error in recent 12.0

2018-03-21 Thread Markus Wild
Hello Thomas,

> > I had faced the exact same issue on a HP Microserver G8 with 8TB disks and 
> > a 16TB zpool on FreeBSD 11 about a year
> > ago.  
> I will ask you the same question as I asked the OP:
> 
> Has this pool had new vdevs addded to it since the server was installed?

No. This is a microserver with only 4 (not even hotplug) trays. It was set up 
using the freebsd installer 
originally. I had to apply the (then patch, don't know whether it's included 
standard now) btx loader fix to retry
a failed read to get around BIOS bugs with that server, but after that, the 
server booted fine. It's only after
a bit of use and a kernel update that things went south. I tried many different 
things at that time, but the only
approach that worked for me was to steal 2 of the 4 swap partitions which I 
placed on every disk initially, and 
build a mirrored boot zpool from those. The loader had no problem loading the 
kernel from that, and when the kernel
took over, it had no problem using the original root pool (that the boot loader 
wasn't able to find/load). Whence my
conclusion that the 2nd stage boot loader has a problem (probably due to yet 
another bios bug on that server) loading
blocks beyond a certain limit, which could be 2TB or 4TB.

> What does a "zpool status" look like when the pool is imported?

$ zpool status
  pool: zboot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Mar 21 03:58:36 2018
config:

NAME   STATE READ WRITE CKSUM
zboot  ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
gpt/zfs-boot0  ONLINE   0 0 0
gpt/zfs-boot1  ONLINE   0 0 0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 6h49m with 0 errors on Sat Mar 10 10:17:49 2018
config:

NAME  STATE READ WRITE CKSUM
zroot ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
gpt/zfs0  ONLINE   0 0 0
gpt/zfs1  ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
gpt/zfs2  ONLINE   0 0 0
gpt/zfs3  ONLINE   0 0 0

errors: No known data errors

Please note: this server is in use at a customer now, it's workin fine with 
this workaround. I just brought it up 
to give a possible explanation to the observed problem of the original poster, 
and that it _might_ have nothing to do
with a newer version of the current kernel, but rather be due to the updated 
kernel being written to a new location
on disk, which can't be read properly by the boot loader.

Cheers,
Markus
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS i/o error in recent 12.0

2018-03-21 Thread Thomas Steen Rasmussen
On 03/20/2018 08:50 AM, Markus Wild wrote:
>
> I had faced the exact same issue on a HP Microserver G8 with 8TB disks and a 
> 16TB zpool on FreeBSD 11 about a year ago.
Hello,

I will ask you the same question as I asked the OP:

Has this pool had new vdevs addded to it since the server was installed?
What does a "zpool status" look like when the pool is imported?

Explanation: Some controllers only make a small fixed number of devices
visible to the bios during boot. Imagine a zpool was booted with, say, 4
disks in a pool, and 4 more was added. If the HBA only shows 4 drives to
the bios during boot, you see this error.

If you think this might be relevant you need to chase down a setting
called "maximum int13 devices for this adapter" or something like that.
See page 3-4 in this documentation:
https://supermicro.com/manuals/other/LSI_HostRAID_2308.pdf

The setting has been set to 4 on a bunch of servers I've bought over the
last years. Then you install the server with 4 disks, later add new
disks, reboot one day and nothing works until you set it high enough
that the bootloader can see the whole pool, and you're good again.

/Thomas

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS i/o error in recent 12.0

2018-03-21 Thread Thomas Steen Rasmussen
On 03/20/2018 12:00 AM, KIRIYAMA Kazuhiko wrote:
> Hi,
>
> I've been encountered suddenly death in ZFS full volume
> machine(r330434) about 10 days after installation[1]:
>
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS of pool zroot
> gptzfsboot: failed to mount default pool zroot
>
> FreeBSD/x86 boot
> ZFS: i/o error - all block copies unavailable
> ZFS: can't find dataset u
> Default: zroot/<0x0>:
> boot: 

Has this pool had new vdevs addded to it since the server was installed?
What does a "zpool status" look like when the pool is imported?

/Thomas

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"