Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread void
On Sun, 15 Oct 2023, at 15:53, Warner Losh wrote:

> The one with the uboot traceback? I can't help you there. The report is 
> confusing. I don't know the error / problem being reported to even know 
> what to look at.  Or is it a different thing? I'm so confused at this 
> point. I also think we need to recreate it on as a!clean system as 
> possible. Weird problems in the boot chain prior to loader.efi, I have 
> little interest in and no time to look at...

The problem being reported on the 30th Sept is a panic on booting zroot
because the disk couldn't be found by the loader, presumably because 
usb3 needed time to settle, because making it wait a while resulted in
a bootable system. But all the waiting does is workaround. I don't
know how to fix it permanently. The problem gets to (i guess it's called stage1
boot) where i can select what kernel to boot. At that point, i thought u-boot
had completed.

The reason I'm mentioning this in this thread is because the OP is reporting
an issue looking on the face of it very similar to the one I saw, and i thought
mentioning this here might be helpful.



Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread Warner Losh
On Sun, Oct 15, 2023, 9:47 AM void  wrote:

> On Sun, 15 Oct 2023, at 15:35, Warner Losh wrote:
>
> > I've fixed all known nvme issues in current that aren't caused by other
> > parts of the system. If it isn't a very recent 15 or 14,  then there
> > are known issues and you'll need to try those first.
>
> The problem manifested with a source upgrade on the 30th September.
>
> > On arm it could be a lot of things. I keep seeing problems in other
> > area that are hard to track down without the systems in hand and
> > time to look at complex problems it can take a while to track down.
> >
> > I'll need a simple reproducer to look at things...
>
> If you can tell me what to do, i'm happy to test.
>
> Additionally, I can provide a connected instance to you for destructive
> testing on arm64. It would take me about 24hrs to set up as I'd
> need to backup the disk. Let me know if you need it.
>

The one with the uboot traceback? I can't help you there. The report is
confusing. I don't know the error / problem being reported to even know
what to look at.  Or is it a different thing? I'm so confused at this
point. I also think we need to recreate it on as a!clean system as
possible. Weird problems in the boot chain prior to loader.efi, I have
little interest in and no time to look at...

Warner

>


Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread void
On Sun, 15 Oct 2023, at 15:35, Warner Losh wrote:

> I've fixed all known nvme issues in current that aren't caused by other 
> parts of the system. If it isn't a very recent 15 or 14,  then there 
> are known issues and you'll need to try those first.

The problem manifested with a source upgrade on the 30th September. 

> On arm it could be a lot of things. I keep seeing problems in other 
> area that are hard to track down without the systems in hand and 
> time to look at complex problems it can take a while to track down.
>
> I'll need a simple reproducer to look at things... 

If you can tell me what to do, i'm happy to test.

Additionally, I can provide a connected instance to you for destructive 
testing on arm64. It would take me about 24hrs to set up as I'd
need to backup the disk. Let me know if you need it.
-- 



Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread Warner Losh
On Sun, Oct 15, 2023, 9:28 AM void  wrote:

> Hi,
>
> On Fri, 13 Oct 2023, at 03:40, Pete Wright wrote:
> > I had similar issues on my workstation as well.  Scrubbing the NVMe
> > device on my real-hardware workstation hasn't turned up any issues, but
> > the system has locked up a handful of times.
> >
> > Just curious if others have seen the same, or if someone could point me
> > in the right direction...
>
> I've seen similar issues (zpool timeout) in a quite different context
> (arm64, usb3-connected disk, not-vm) which i posted about here:
>
> https://lists.freebsd.org/archives/freebsd-arm/2023-September/003122.html
>
> Unsure if these have been fixed yet?
>

I've fixed all known nvme issues in current that aren't caused by other
parts of the system. If it isn't a very recent 15 or 14,  then there are
known issues and you'll need to try those first.

On arm it could be a lot of things. I keep seeing problems in other area
that are hard to track down without the systems in hand and time to
look at complex problems it can take a while to track down.

I'll need a simple reproducer to look at things...

Warner

>


Re: nvme timeout issues with hardware and bhyve vm's

2023-10-15 Thread void
Hi,

On Fri, 13 Oct 2023, at 03:40, Pete Wright wrote:
> I had similar issues on my workstation as well.  Scrubbing the NVMe 
> device on my real-hardware workstation hasn't turned up any issues, but 
> the system has locked up a handful of times.
>
> Just curious if others have seen the same, or if someone could point me 
> in the right direction...

I've seen similar issues (zpool timeout) in a quite different context 
(arm64, usb3-connected disk, not-vm) which i posted about here:

https://lists.freebsd.org/archives/freebsd-arm/2023-September/003122.html

Unsure if these have been fixed yet?