Re: STABLE-12 system does not boot

2019-04-13 Thread Warner Losh
What did you upgrade from?

Warner

On Sat, Apr 13, 2019, 11:21 AM Filippo Moretti via freebsd-stable <
freebsd-stable@freebsd.org> wrote:

> I upgraded my amd64 box to today stable but the system no longer boot it
> hangs or reboots after Loading kernel modulesAny help appreciated.I can
> still access the system via kernel old.I did not get any error during
> the the update,sincerelyFilippo
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


STABLE-12 system does not boot

2019-04-13 Thread Filippo Moretti via freebsd-stable
I upgraded my amd64 box to today stable but the system no longer boot it hangs 
or reboots after Loading kernel modulesAny help appreciated.I can still access 
the system via kernel old.I did not get any error during the the 
update,sincerelyFilippo
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Replicable file-system corruption due to fsck/ufs

2019-04-13 Thread Kirk McKusick
> Date: Sat, 13 Apr 2019 14:32:45 +0200
> From: Peter Holm 
> To: Kirk McKusick 
> Cc: Jamie Landeg-Jones , ja...@catflap.dyslexicfish.net,
> Warner Losh , freebsd-stable@freebsd.org
> Subject: Re: Replicable file-system corruption due to fsck/ufs
> 
> On Fri, Apr 12, 2019 at 04:13:00PM -0700, Kirk McKusick wrote:
> 
>> This is indeed a bug in the calculation of the location of the last
>> block of a file. I believe that the following patch to head will
>> fix it.
>> 
>> Peter, can you please test and let me know.
>> 
>> If Peter confirms that it fixes the bug, I will check it into head
>> and MFC it to 12-stable and 11-stable after a 2-week settle-in time.
>> 
>>  Kirk McKusick
> 
> Yes, this patch works for me.
> 
> -- 
> Peter

Great, thanks for the quick test. Now committed to head as -r346185.

Kirk
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Replicable file-system corruption due to fsck/ufs

2019-04-13 Thread Peter Holm
On Fri, Apr 12, 2019 at 04:13:00PM -0700, Kirk McKusick wrote:
> > Peter Holm  wrote:
> > 
> >> I see this even with a single truncate on HEAD.
> >>
> >> $ ./truncate10.sh
> >> 96 -rw-r--r--  1 root  wheel  1073741824 11 apr. 06:33 test
> >> ** /dev/md10a
> >> ** Last Mounted on /mnt
> >> ** Phase 1 - Check Blocks and Sizes
> >> INODE 3: FILE SIZE 1073741824 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 
> >> 268435456
> >> ADJUST? yes
> > 
> > Thanks.. I should have tested that myself.. doh! I was trying to
> > closer replicate my real file that triggered the problem which
> > contained a number of sparse areas.
> > 
> > And thanks for adding Kirk to the discussion. I wanted to first be
> > sure it wasn't just me :-)
> > 
> > Cheers, Jamie
> 
> This is indeed a bug in the calculation of the location of the last
> block of a file. I believe that the following patch to head will
> fix it.
> 
> Peter, can you please test and let me know.
> 
> If Peter confirms that it fixes the bug, I will check it into head
> and MFC it to 12-stable and 11-stable after a 2-week settle-in time.
> 
>   Kirk McKusick
> 

Yes, this patch works for me.

-- 
Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-13 Thread Karl Denninger
On 4/11/2019 13:57, Karl Denninger wrote:
> On 4/11/2019 13:52, Zaphod Beeblebrox wrote:
>> On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger  wrote:
>>
>>
>>> In this specific case the adapter in question is...
>>>
>>> mps0:  port 0xc000-0xc0ff mem
>>> 0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
>>> mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
>>> mps0: IOCCapabilities:
>>> 1285c
>>>
>>> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects
>>> his drives via dumb on-MoBo direct SATA connections.
>>>
>> Maybe I'm in good company.  My current setup has 8 of the disks connected
>> to:
>>
>> mps0:  port 0xb000-0xb0ff mem
>> 0xfe24-0xfe24,0xfe20-0xfe23 irq 32 at device 0.0 on pci6
>> mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd
>> mps0: IOCCapabilities:
>> 5a85c
>>
>> ... just with a cable that breaks out each of the 2 connectors into 4
>> SATA-style connectors, and the other 8 disks (plus boot disks and SSD
>> cache/log) connected to ports on...
>>
>> - ahci0:  port
>> 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem
>> 0xfe90-0xfe9001ff irq 44 at device 0.0 on pci2
>> - ahci2:  port
>> 0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem
>> 0xfe61-0xfe6107ff irq 40 at device 0.0 on pci7
>> - ahci3:  port
>> 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem
>> 0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0
>>
>> ... each drive connected to a single port.
>>
>> I can actually reproduce this at will.  Because I have 16 drives, when one
>> fails, I need to find it.  I pull the sata cable for a drive, determine if
>> it's the drive in question, if not, reconnect, "ONLINE" it and wait for
>> resilver to stop... usually only a minute or two.
>>
>> ... if I do this 4 to 6 odd times to find a drive (I can tell, in general,
>> that a drive is part of the SAS controller or the SATA controllers... so
>> I'm only looking among 8, ever) ... then I "REPLACE" the problem drive.
>> More often than not, the a scrub will find a few problems.  In fact, it
>> appears that the most recent scrub is an example:
>>
>> [1:7:306]dgilbert@vr:~> zpool status
>>   pool: vr1
>>  state: ONLINE
>>   scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr  1 23:12:03
>> 2019
>> config:
>>
>> NAMESTATE READ WRITE CKSUM
>> vr1 ONLINE   0 0 0
>>   raidz2-0  ONLINE   0 0 0
>> gpt/v1-d0   ONLINE   0 0 0
>> gpt/v1-d1   ONLINE   0 0 0
>> gpt/v1-d2   ONLINE   0 0 0
>> gpt/v1-d3   ONLINE   0 0 0
>> gpt/v1-d4   ONLINE   0 0 0
>> gpt/v1-d5   ONLINE   0 0 0
>> gpt/v1-d6   ONLINE   0 0 0
>> gpt/v1-d7   ONLINE   0 0 0
>>   raidz2-2  ONLINE   0 0 0
>> gpt/v1-e0c  ONLINE   0 0 0
>> gpt/v1-e1b  ONLINE   0 0 0
>> gpt/v1-e2b  ONLINE   0 0 0
>> gpt/v1-e3b  ONLINE   0 0 0
>> gpt/v1-e4b  ONLINE   0 0 0
>> gpt/v1-e5a  ONLINE   0 0 0
>> gpt/v1-e6a  ONLINE   0 0 0
>> gpt/v1-e7c  ONLINE   0 0 0
>> logs
>>   gpt/vr1logONLINE   0 0 0
>> cache
>>   gpt/vr1cache  ONLINE   0 0 0
>>
>> errors: No known data errors
>>
>> ... it doesn't say it now, but there were 5 CKSUM errors on one of the
>> drives that I had trial-removed (and not on the one replaced).
>> ___
> That is EXACTLY what I'm seeing; the "OFFLINE'd" drive is the one that,
> after a scrub, comes up with the checksum errors.  It does *not* flag
> any errors during the resilver and the drives *not* taken offline do not
> (ever) show checksum errors either.
>
> Interestingly enough you have 19.00.00.00 firmware on your card as well
> -- which is what was on mine.
>
> I have flashed my card forward to 20.00.07.00 -- we'll see if it still
> does it when I do the next swap of the backup set.

Verry interesting.

This drive was last written/read under 19.00.00.00.  Yesterday I swapped
it back in.  Note that right now I am running:

mps0:  port 0xc000-0xc0ff mem
0xfbb3c000-0xfbb3,0xfbb4-0xfbb7 irq 30 at device 0.0 on pci3
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities:
1285c

And, after the scrub completed overnight

[karl@NewFS ~]$ zpool status backup
  pool: backup
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using