Well, this has happened again, exactly a week later, same time too…. So the SSD ZILS didnt do the trick.
I think I am going to turn off the ZFS auto snapshot service …. Jun 7 15:50:22 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G Jun 7 15:50:23 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, portid 10400, topology Fabric Pt-to-Pt,speed 8G On 31 May 2012, at 17:37, Adrian Carpenter wrote: > A quick update for those who might be following this thread, I started to > collect zilstats, and what I have found is that about once every four days a > transaction takes over half an hour: > > TIME txg N-Bytes N-Bytes/s N-Max-Rate B-Bytes > B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB > .. > .. > 2012 May 31 15:21:36 475232 2044232 60124 390888 16531456 > 486219 2985984 175 0 0 175 > 2012 May 31 15:22:39 475233 2762416 43847 293064 19734528 > 313246 2244608 266 0 10 256 > 2012 May 31 16:00:06 475234 29059896 12927 3198840 148652032 > 66126 12713984 1825 0 181 1644 > 2012 May 31 16:08:05 475235 2544016 5311 657384 13819904 > 28851 3575808 182 0 2 180 > > at 15:32 xen pool master tried reseting the fibre channel HBAs, however > since the volume was still blocked I presume, the pool master became very > unhappy…… > > I then see the following in dmesg: > > May 31 16:00:08 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, > portid 20300, topology Fabric Pt-to-Pt,speed 8G > May 31 16:00:09 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, > portid 10400, topology Fabric Pt-to-Pt,speed 8G > > I've just taken delivery of some SSDs and will add them as mirrored ZILlog > devices, hopefully this will help. > > > > > > > On 21 May 2012, at 16:47, Mike La Spina wrote: > >> Hi Adrian, >> >> The SanBoxes? - Nexsan nothing in their logs >> OK >> >> Dmesg? : >>> May 17 17:33:47 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt1,0 >>> LINK UP, portid 20300, topology Fabric Pt-to-Pt,speed 8G May 17 >>> 17:33:48 hagrid fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, >>> portid 10400, topology Fabric Pt-to-Pt,speed 8G >> >> Seeing a single LINKUP notice would normally only occur on init. I would >> say it's just that, otherwise you would have a LINKDOWN before the >> LINKUP, meaning an event on the fabric is your root issue. >> >> Stmf service? Nothing at all in the logs >> OK >> >> Are you running snapshots? yes am running auto snapshot service, in >> addition I'm running a script (hourly) that snapshots the volume and >> send it over ssh to another machine. >> >> I suspect an issue here. The snapshot service runs on fixed time >> intervals e.g. 15Min 1Hour 24Hhour 1Month if your also adding a snapshot >> that runs hourly to do a ZFS send/rec they will overlap. The overlap may >> cause an excessive blocking to stmf sbd access and result in a timeout >> for the XEN host initiators. I suggest you use the auto based existing >> hourly snaps and simply send them over to the remote host or file system >> using a script @ 15 minutes after the hour. >> >> Dedup? >> >> Off - OK >> >> Compression? >> >> Lzjb - OK >> >> >> IRQ sharing? >> echo ::interrupts | mdb -k >> >> IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# ISR(s) >> 1 0x41 5 ISA Edg Fixed 3 1 0x0/0x1 i8042_intr >> 3 0xb1 12 ISA Edg Fixed 39 1 0x0/0x3 asyintr >> 4 0xb0 12 ISA Edg Fixed 38 1 0x0/0x4 asyintr >> 5 0xb2 12 ISA Edg Fixed 40 1 0x0/0x5 asyintr >> 9 0x80 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr >> 12 0x42 5 ISA Edg Fixed 4 1 0x0/0xc i8042_intr >> 16 0x83 9 PCI Lvl Fixed 7 2 0x0/0x10 ohci_intr, ohci_intr >> 17 0x81 9 PCI Lvl Fixed 5 1 0x0/0x11 ehci_intr >> 18 0x84 9 PCI Lvl Fixed 8 3 0x0/0x12 ohci_intr, >> ohci_intr, >> ohci_intr >> 19 0x82 9 PCI Lvl Fixed 6 1 0x0/0x13 ehci_intr >> 22 0x40 5 PCI Lvl Fixed 2 2 0x0/0x16 ata_intr, ata_intr >> 88 0x43 5 PCI Edg MSI-X 9 1 - ql_isr_aif >> 89 0x44 5 PCI Edg MSI-X 10 1 - ql_isr_aif >> 90 0x45 5 PCI Edg MSI-X 11 1 - ql_isr_aif >> 91 0x46 5 PCI Edg MSI-X 12 1 - ql_isr_aif >> 92 0x60 6 PCI Edg MSI-X 13 1 - igb_intr_tx_other >> 93 0x61 6 PCI Edg MSI-X 14 1 - igb_intr_rx >> 94 0x62 6 PCI Edg MSI-X 15 1 - igb_intr_tx_other >> 95 0x63 6 PCI Edg MSI-X 16 1 - igb_intr_rx >> 96 0x64 6 PCI Edg MSI-X 36 1 - igb_intr_tx_other >> 97 0x65 6 PCI Edg MSI-X 37 1 - igb_intr_rx >> 98 0x66 6 PCI Edg MSI-X 41 1 - igb_intr_tx_other >> 99 0x67 6 PCI Edg MSI-X 42 1 - igb_intr_rx >> 100 0x68 6 PCI Edg MSI-X 43 1 - igb_intr_tx_other >> 101 0x69 6 PCI Edg MSI-X 44 1 - igb_intr_rx >> 102 0x6a 6 PCI Edg MSI-X 45 1 - igb_intr_tx_other >> 103 0x6b 6 PCI Edg MSI-X 46 1 - igb_intr_rx >> 104 0x47 5 PCI Edg MSI 30 1 - qlt_isr >> 105 0x48 5 PCI Edg MSI 31 1 - qlt_isr >> 160 0xa0 0 Edg IPI all 0 - poke_cpu >> 208 0xd0 14 Edg IPI all 1 - >> kcpc_hw_overflow_intr >> 209 0xd1 14 Edg IPI all 1 - cbe_fire >> 210 0xd3 14 Edg IPI all 1 - cbe_fire >> 240 0xe0 15 Edg IPI all 1 - xc_serv >> 241 0xe1 15 Edg IPI all 1 - apic_error_intr >> >> OK >> >> >> Dr T Adrian Carpenter >> Reader in Imaging Sciences >> Wolfson Brain Imaging Centre >> >> >> _______________________________________________ >> OpenIndiana-discuss mailing list >> OpenIndiana-discuss@openindiana.org >> http://openindiana.org/mailman/listinfo/openindiana-discuss >> >> _______________________________________________ >> OpenIndiana-discuss mailing list >> OpenIndiana-discuss@openindiana.org >> http://openindiana.org/mailman/listinfo/openindiana-discuss > > > _______________________________________________ > OpenIndiana-discuss mailing list > OpenIndiana-discuss@openindiana.org > http://openindiana.org/mailman/listinfo/openindiana-discuss _______________________________________________ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss