Re: ZFS cache devs UNAVAIL
On 23.10.12 22:23, Andriy Gapon wrote: > on 23/10/2012 23:08 Andriy Gapon said the following: >> on 23/10/2012 20:56 Michael Schmiedgen said the following: >>> FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 >>> root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 >> ... >>> vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: >>> 5267967234359339128 != 0. >> >> Thank you for this valuable information. >> >> Do you have a rough estimate of when you started to experience this issue? >> >> Could you please also provide output of the following command captured right >> after a reboot and then after you re-add the cache disks? >> $ zdb -lll /dev/ada0p >> >> > > I still would like to get the above information if possible. > But here is a patch that you can try: > > > I think that I introduced this bug because I used some old OpenSolaris code as > an inspiration and completely missed the new states. > My NAS experienced same problem, I thought the old IDE SSD had just died of old age, that's why i didn't investigate further yet. :) With the patch the cache device is back. Thanks, Florian signature.asc Description: OpenPGP digital signature
Re: ZFS cache devs UNAVAIL
On 10/23/12 23:57, Florian Smeets wrote: > My NAS experienced same problem, I thought the old IDE SSD had just died > of old age, that's why i didn't investigate further yet. :) I got 2 physical SSDs, with both first partitions striped as cache for my main zpool (cache devs gone UNAVAIL) and both second partitions for a mirrored temp zpool (ONLINE). So I saw good chances to *not* blame the hardware. ;) > With the patch the cache device is back. Works here, too. Michael ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS cache devs UNAVAIL
On 10/23/12 22:23, Andriy Gapon wrote: > on 23/10/2012 23:08 Andriy Gapon said the following: >> on 23/10/2012 20:56 Michael Schmiedgen said the following: >> ... >>> vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: >>> 5267967234359339128 != 0. >> >> Thank you for this valuable information. >> >> Do you have a rough estimate of when you started to experience this issue? >> >> Could you please also provide output of the following command captured right >> after a reboot and then after you re-add the cache disks? >> $ zdb -lll /dev/ada0p >> >> > > I still would like to get the above information if possible. > But here is a patch that you can try: > > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > @@ -270,14 +270,16 @@ vdev_geom_read_config(struct g_consumer *cp, nvlist_t > **config) > continue; > > if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_STATE, > - &state) != 0 || state >= POOL_STATE_DESTROYED) { > + &state) != 0 || state == POOL_STATE_DESTROYED || > + state > POOL_STATE_L2CACHE) { > nvlist_free(*config); > *config = NULL; > continue; > } > > - if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG, > - &txg) != 0 || txg == 0) { > + if (state != POOL_STATE_SPARE && state != POOL_STATE_L2CACHE && > + (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG, > + &txg) != 0 || txg == 0)) { > nvlist_free(*config); > *config = NULL; > continue; This works for me. Thank you very much! :) For zdb data see below, it has not changed since patch-apply/readd/reboot. Michael # zdb -lll /dev/ada0p1 LABEL 0 version: 5000 state: 4 guid: 13019058935211054376 LABEL 1 version: 5000 state: 4 guid: 13019058935211054376 LABEL 2 version: 5000 state: 4 guid: 13019058935211054376 LABEL 3 version: 5000 state: 4 guid: 13019058935211054376 # zdb -lll /dev/ada1p1 LABEL 0 version: 5000 state: 4 guid: 1347428618237802818 LABEL 1 version: 5000 state: 4 guid: 1347428618237802818 LABEL 2 version: 5000 state: 4 guid: 1347428618237802818 LABEL 3 version: 5000 state: 4 guid: 1347428618237802818 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS cache devs UNAVAIL
Hi Andy, thank you for your reply. I will test your patch right now and give you feedback. On 10/23/12 22:08, Andriy Gapon wrote: > on 23/10/2012 20:56 Michael Schmiedgen said the following: > ... >> vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: >> 5267967234359339128 != 0. > > Thank you for this valuable information. > > Do you have a rough estimate of when you started to experience this issue? I experienced this since my 2012-10-17 build. I build every 3-5 weeks. > Could you please also provide output of the following command captured right > after a reboot and then after you re-add the cache disks? > $ zdb -lll /dev/ada0p Here the data in UNAVAIL state: # zdb -lll /dev/ada0p1 LABEL 0 version: 5000 state: 4 guid: 5267967234359339128 LABEL 1 version: 5000 state: 4 guid: 5267967234359339128 LABEL 2 version: 5000 state: 4 guid: 5267967234359339128 LABEL 3 version: 5000 state: 4 guid: 5267967234359339128 # zdb -lll /dev/ada1p1 LABEL 0 version: 5000 state: 4 guid: 5693315451104805234 LABEL 1 version: 5000 state: 4 guid: 5693315451104805234 LABEL 2 version: 5000 state: 4 guid: 5693315451104805234 LABEL 3 version: 5000 state: 4 guid: 5693315451104805234 Here the data after readding the two devs: zdb -lll /dev/ada0p1 LABEL 0 version: 5000 state: 4 guid: 13019058935211054376 LABEL 1 version: 5000 state: 4 guid: 13019058935211054376 LABEL 2 version: 5000 state: 4 guid: 13019058935211054376 LABEL 3 version: 5000 state: 4 guid: 13019058935211054376 # zdb -lll /dev/ada1p1 LABEL 0 version: 5000 state: 4 guid: 1347428618237802818 LABEL 1 version: 5000 state: 4 guid: 1347428618237802818 LABEL 2 version: 5000 state: 4 guid: 1347428618237802818 LABEL 3 version: 5000 state: 4 guid: 1347428618237802818 I will post the data after build/install/reboot soon. Thanks, Michael ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS cache devs UNAVAIL
on 23/10/2012 23:08 Andriy Gapon said the following: > on 23/10/2012 20:56 Michael Schmiedgen said the following: >> FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 >> root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 > ... >> vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: >> 5267967234359339128 != 0. > > Thank you for this valuable information. > > Do you have a rough estimate of when you started to experience this issue? > > Could you please also provide output of the following command captured right > after a reboot and then after you re-add the cache disks? > $ zdb -lll /dev/ada0p > > I still would like to get the above information if possible. But here is a patch that you can try: --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c @@ -270,14 +270,16 @@ vdev_geom_read_config(struct g_consumer *cp, nvlist_t **config) continue; if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_STATE, - &state) != 0 || state >= POOL_STATE_DESTROYED) { + &state) != 0 || state == POOL_STATE_DESTROYED || + state > POOL_STATE_L2CACHE) { nvlist_free(*config); *config = NULL; continue; } - if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG, - &txg) != 0 || txg == 0) { + if (state != POOL_STATE_SPARE && state != POOL_STATE_L2CACHE && + (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG, + &txg) != 0 || txg == 0)) { nvlist_free(*config); *config = NULL; continue; I think that I introduced this bug because I used some old OpenSolaris code as an inspiration and completely missed the new states. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS cache devs UNAVAIL
On Tue, Oct 23, 2012 at 11:08:55PM +0300 I heard the voice of Andriy Gapon, and lo! it spake thus: > > Do you have a rough estimate of when you started to experience this issue? I saw it with r241541 and not with my previous kernel (strings says it was r238937; July 31). So not a very narrow range for me. The major changes in ZFS in that interval I say in a glance at the log were the TRIM and the tasting-for-root-pool. But I don't have any reason to suspect them other than "hey, these are high-profile". -- Matthew Fuller (MF4839) | fulle...@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS cache devs UNAVAIL
on 23/10/2012 20:56 Michael Schmiedgen said the following: > FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 > root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 ... > vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: > 5267967234359339128 != 0. Thank you for this valuable information. Do you have a rough estimate of when you started to experience this issue? Could you please also provide output of the following command captured right after a reboot and then after you re-add the cache disks? $ zdb -lll /dev/ada0p -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS cache devs UNAVAIL
Hi Andriy, my dmesg is listed below. Thanks, Michael FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 CPU: Intel(R) Xeon(R) CPU E3110 @ 3.00GHz (2992.57-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 0x6 Model = 0x17 Stepping = 6 Features=0xbfebfbff Features2=0x8e3fd AMD Features=0x20100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 6442450944 (6144 MB) avail memory = 6145687552 (5860 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 cpu1: on acpi0 hpet0: iomem 0xfed0-0xfed003ff irq 0,8 on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 450 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 Event timer "HPET3" frequency 14318180 Hz quality 440 atrtc0: port 0x70-0x71 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x43,0x50-0x53 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 16 at device 1.0 on pci0 pci1: on pcib1 vgapci0: port 0x2000-0x207f mem 0xd200-0xd2ff,0xc000-0xcfff,0xd000-0xd1ff irq 16 at device 0.0 on pci1 nvidia0: on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: child nvidia0 requested pci_enable_io em0: port 0x1820-0x183f mem 0xd330-0xd331,0xd3324000-0xd3324fff irq 16 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet address: 00:30:48:93:f0:06 uhci0: port 0x1840-0x185f irq 16 at device 26.0 on pci0 usbus0 on uhci0 uhci1: port 0x1860-0x187f irq 17 at device 26.1 on pci0 usbus1 on uhci1 uhci2: port 0x1880-0x189f irq 18 at device 26.2 on pci0 usbus2 on uhci2 ehci0: mem 0xd3326800-0xd3326bff irq 18 at device 26.7 on pci0 usbus3: EHCI version 1.0 usbus3 on ehci0 hdac0: mem 0xd332-0xd3323fff irq 16 at device 27.0 on pci0 pcib2: irq 16 at device 28.0 on pci0 pci5: on pcib2 pcib3: irq 16 at device 28.4 on pci0 pci13: on pcib3 ahci0: port 0x3030-0x3037,0x3024-0x3027,0x3028-0x302f,0x3020-0x3023,0x3000-0x300f mem 0xd300-0xd30007ff irq 16 at device 0.0 on pci13 ahci0: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahcich4: at channel 4 on ahci0 ahcich5: at channel 5 on ahci0 ahcich6: at channel 6 on ahci0 ahcich7: at channel 7 on ahci0 atapci0: port 0x3048-0x304f,0x303c-0x303f,0x3040-0x3047,0x3038-0x303b,0x3010-0x301f mem 0xd3000800-0xd300080f irq 17 at device 0.1 on pci13 uhci3: port 0x18a0-0x18bf irq 23 at device 29.0 on pci0 usbus4 on uhci3 uhci4: port 0x18c0-0x18df irq 22 at device 29.1 on pci0 usbus5 on uhci4 uhci5: port 0x18e0-0x18ff irq 18 at device 29.2 on pci0 usbus6 on uhci5 ehci1: mem 0xd3326c00-0xd3326fff irq 23 at device 29.7 on pci0 usbus7: EHCI version 1.0 usbus7 on ehci1 pcib4: at device 30.0 on pci0 pci17: on pcib4 atapci1: port 0x4020-0x4027,0x4014-0x4017,0x4018-0x401f,0x4010-0x4013,0x4000-0x400f irq 23 at device 4.0 on pci17 ata2: at channel 0 on atapci1 ata3: at channel 1 on atapci1 isab0: at device 31.0 on pci0 isa0: on isab0 ahci1: port 0x1c70-0x1c77,0x1c64-0x1c67,0x1c68-0x1c6f,0x1c60-0x1c63,0x1c00-0x1c1f mem 0xd3326000-0xd33267ff irq 17 at device 31.2 on pci0 ahci1: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported ahcich8: at channel 0 on ahci1 ahcich9: at channel 1 on ahci1 ahcich10: at channel 2 on ahci1 ahcich11: at channel 3 on ahci1 ahcich12: at channel 4 on ahci1 ahcich13: at channel 5 on ahci1 ahciem0: on ahci1 pci0: at device 31.3 (no driver attached) pci0: at device 31.6 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa-0xb on isa0 ctl: CAM Target Layer loaded coretemp0: on cpu0 est0: on cpu0 p4tcc0: on cpu0 coretemp1: on cpu1 est1: on cpu1 p4tcc1: on cpu1 ZFS filesystem version: 5 zvol_init:1743[1]: ZVOL Initialized. ZFS storage pool version: features support (5000) Timecounters tick every 1.000 msec hdacc0: at cad 2 on hdac0 hdaa0: at nid 1 on hdacc0 pcm0: at nid 20,22,21,23,27 and 24,26,25 on hdaa0 usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB
Re: ZFS cache devs UNAVAIL
on 23/10/2012 05:24 Matthew D. Fuller said the following: > On Tue, Oct 23, 2012 at 12:57:34AM +0200 I heard the voice of > Michael Schmiedgen, and lo! it spake thus: >> >> after an update to CURRENT 2012-10-17 my ZFS cache devs are marked >> UAVAIL after boot. These two devs are SSD partitions that are listed >> with some wired numbers (see below). Before that they were listed >> fine as ada0p1 and ada1p1. > > I saw this after my update to 10.0-CURRENT #0 r241541: Sun Oct 14. > > In my case, it's ada2p2 which is the cache that comes up unavail on > boot. One notable thing may be that p1 is used for ZIL, and comes up > fine. > > NAMESTATE READ WRITE CKSUM > d ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada1p3 ONLINE 0 0 0 > ada0p3 ONLINE 0 0 0 > logs > ada2p1ONLINE 0 0 0 > cache > ada2p2ONLINE 0 0 0 > > > I notice that you also have a second partition on your drives that's > part of another pool. Maybe it's related to something giving up after > assigning one partition from the drive to zpool somewhere? Though in > your case it's p2 that's working and p1 that's wandered off, so maybe > that's not it... > > Guys, could you please reproduce the problem with vfs.zfs.debug=1 in loader.conf and share the dmesg? Thank you. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS cache devs UNAVAIL
On Tue, Oct 23, 2012 at 12:57:34AM +0200 I heard the voice of Michael Schmiedgen, and lo! it spake thus: > > after an update to CURRENT 2012-10-17 my ZFS cache devs are marked > UAVAIL after boot. These two devs are SSD partitions that are listed > with some wired numbers (see below). Before that they were listed > fine as ada0p1 and ada1p1. I saw this after my update to 10.0-CURRENT #0 r241541: Sun Oct 14. In my case, it's ada2p2 which is the cache that comes up unavail on boot. One notable thing may be that p1 is used for ZIL, and comes up fine. NAMESTATE READ WRITE CKSUM d ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 logs ada2p1ONLINE 0 0 0 cache ada2p2ONLINE 0 0 0 I notice that you also have a second partition on your drives that's part of another pool. Maybe it's related to something giving up after assigning one partition from the drive to zpool somewhere? Though in your case it's p2 that's working and p1 that's wandered off, so maybe that's not it... -- Matthew Fuller (MF4839) | fulle...@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
ZFS cache devs UNAVAIL
Hi, after an update to CURRENT 2012-10-17 my ZFS cache devs are marked UAVAIL after boot. These two devs are SSD partitions that are listed with some wired numbers (see below). Before that they were listed fine as ada0p1 and ada1p1. Switching TRIM to 'on', installing todays world and installing latest bootcode all did not help. Typing # zpool remove tank 5986966123239256412 # zpool remove tank 17739462735593715930 # zpool add tank cache ada0p1 ada1p1 after every boot helps out but is a little bit annoying after time. Am I doing something wrong here? Or is this a bug? Below some information, please let me know if more is required. Thanks, Michael root@gizeh:/root # zpool status pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada3p3 ONLINE 0 0 0 ada4p3 ONLINE 0 0 0 cache 5267967234359339128 UNAVAIL 0 0 0 was /dev/ada0p1 5693315451104805234 UNAVAIL 0 0 0 was /dev/ada1p1 pool: temp state: ONLINE config: NAMESTATE READ WRITE CKSUM tempONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 root@gizeh:/root # gpart show => 34 234441581 ada0 GPT (111G) 34 33554432 1 freebsd-zfs (16G) 33554466 197132288 2 freebsd-zfs (94G) 2306867543754861- free - (1.8G) => 34 234441581 ada1 GPT (111G) 34 33554432 1 freebsd-zfs (16G) 33554466 197132288 2 freebsd-zfs (94G) 2306867543754861- free - (1.8G) => 63 156301425 ada2 MBR (74G) 63 41943006 1 freebsd (20G) 41943069 2019- free - (1M) 41945088 204800 2 ntfs [active] (100M) 42149888 114149376 3 ntfs (54G) 156299264 2224- free - (1.1M) =>34 5860533101 ada3 GPT (2.7T) 34 6- free - (3.0k) 40 192 1 freebsd-boot (96k) 232 8388608 2 freebsd-swap (4.0G) 8388840 5767168000 3 freebsd-zfs (2.7T) 577555684084976295- free - (40G) =>34 5860533101 ada4 GPT (2.7T) 34 6- free - (3.0k) 40 192 1 freebsd-boot (96k) 232 8388608 2 freebsd-swap (4.0G) 8388840 5767168000 3 freebsd-zfs (2.7T) 577555684084976295- free - (40G) => 0 41943006 ada2s1 BSD (20G) 0 39845888 1 freebsd-ufs (19G) 39845888 2097117 2 freebsd-swap (1G) 41943005 1 - free - (512B) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"