Re: 10.0-RELEASE BTX halted on DELL R900
--- Original message --- From: wsk w...@gddsn.org.cn Date: 4 July 2014, 15:21:48 lists I met a BTX halted problem while upgrade Freebsd 9.0-RC3 to 10.0-Release via freebsd-update. and please check the link below: http://sw.gddsn.org.cn/jopens/test/btx.jpg BTW: I can booted 10.0-R from DVD-ROM as expected but got same error message with flash-driver. any ideas? -- wsk Look ACPI settings in the BIOS. I think he should have ACPI v1. -- Vladislav V. Prodan System Network Administrator support.od.ua ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Tell me how to increase the virtual disk with ZFS?
I have a Debian server virtual ok with Proxmox. In one of the virtual machines is FreeBSD 9.1 ZFS with one disk to 100G. Free space is not enough, how to extend the virtual disk without losing data? Add another virtual disk and do a RAID0 - not an option. It is not clear how to distribute the data from the old virtual disk to the new virtual disk. The manual of the Proxmox http://pve.proxmox.com/wiki/Resizing_disks FreeBSD is not mentioned :( You may have to do a Native ZFS for Linux on Proxmox and it will be easier to resize the virtual disk for the virtual machines? -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re[2]: Tell me how to increase the virtual disk with ZFS?
On May 11, 2013, at 8:59 AM, Vladislav Prodan univers...@ukr.net wrote: Add another virtual disk and do a RAID0 - not an option. It is not clear how to distribute the data from the old virtual disk to the new virtual disk. The other option would be to add an additional disk that is as large as you want to the VM, attach it to the zpool as a mirror. The mirror vdev will only be as large as the original device, but once the mirror completes resilvering, you can remove the old device and grow the remaining device to full size (it may do that anyway based on the setting of the auto expand property of the zpool. The default under 9.1 is NOT to autoexpand: root@FreeBSD2:/root # zpool get autoexpand rootpool NAME PROPERTYVALUE SOURCE rootpool autoexpand off default root@FreeBSD2:/root # Thanks. I did not realize that there was such an interesting and useful option :) # zpool get autoexpand tank NAME PROPERTYVALUE SOURCE tank autoexpand off default -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re[2]: Re[2]: AHCI timeout when using ZFS + AIO + NCQ
I once ran into a very severe AHCI timeout problem. After months of trying to figure it out and insane Hardware_ECC_Recovered error values, I found that the error was with the power connector plug / sata HDD interface. All errors disappeared after replacing that cable. Since you have error on more than 1 HDD, I suggest: 1. Check smartctl output for each AND all HDD 2. Check whether your power supply unit is still healthy or if it is supplying inconsistent power. 3. Check the main power supply line and whether it shows any voltage fluctuations or if there is a new heavy consumer of amps on the same power line as the server is plugged to. I've deliberately chose a different server that has a different chipset, and that there were no problems with the HDD. Added kernel support: device ahci # AHCI-compatible SATA controllers And now, after 2.5 days fell off one HDD. [3:14]beastie:root-/root# zpool status pool: tank state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 gpt/disk0ONLINE 0 0 0 gpt/disk2ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 gpt/disk1ONLINE 0 0 0 4931885954389536913 REMOVED 0 0 0 was /dev/gpt/disk3 errors: No known data errors Jan 30 09:49:28 beastie kernel: ahcich3: Timeout on slot 29 port 0 Jan 30 09:49:28 beastie kernel: ahcich3: is cs 2000 ss rs 2000 tfd c0 serr cmd 0004dd17 Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): CAM status: Command timeout Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): Retrying command Jan 30 09:51:31 beastie kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 0080) Jan 30 09:51:31 beastie kernel: ahcich3: Timeout on slot 29 port 0 Jan 30 09:51:31 beastie kernel: ahcich3: is cs 2000 ss rs 2000 tfd 80 serr cmd 0004dd17 Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked Jan 30 09:51:31 beastie kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 0080) Jan 30 09:51:31 beastie kernel: ahcich3: Timeout on slot 29 port 0 Jan 30 09:51:31 beastie kernel: ahcich3: is cs ss rs 2000 tfd 58 serr cmd 0004dd17 Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked Jan 30 09:51:31 beastie kernel: (ada3:ahcich3:0:0:0): lost device Jan 30 09:51:31 beastie kernel: (pass3:ahcich3:0:0:0): passdevgonecb: devfs entry is gone -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re[2]: AHCI timeout when using ZFS + AIO + NCQ
Is it always the same disk, of so replace it SMART helps identify issues but doesn't tell you 100% there's no problem. Now it has fallen off a different HDD - ada0. I'm 99% sure that MHDD will not find problems in HDD - ada0 and ada2. I still have three servers with similar chipsets that have similar problems with blade ahci times out. - Original Message - From: Vladislav Prodan univers...@ukr.net To: f...@freebsd.org Cc: curr...@freebsd.org Sent: Thursday, January 24, 2013 12:19 PM Subject: AHCI timeout when using ZFS + AIO + NCQ I have the server: FreeBSD 9.1-PRERELEASE #0: Wed Jul 25 01:40:56 EEST 2012 Jan 24 12:53:01 vesuvius kernel: atapci0: JMicron ATA controller port 0xc040-0xc047,0xc030-0xc033,0xc020-0xc027,0xc010-0xc013,0xc000-0xc00f mem 0xfe21-0xfe2101ff irq 51 at device 0.0 on pci3 ... Jan 24 12:53:01 vesuvius kernel: ahci0: ATI IXP700 AHCI SATA controller port 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem 0xfe307000-0xfe3073ff irq 19 at device 17.0 on pci0 Jan 24 12:53:01 vesuvius kernel: ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported ... Jan 24 12:53:01 vesuvius kernel: ada2 at ahcich2 bus 0 scbus4 target 0 lun 0 Jan 24 12:53:01 vesuvius kernel: ada2: ST3000DM001-9YN166 CC4C ATA-8 SATA 3.x device Jan 24 12:53:01 vesuvius kernel: ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) Jan 24 12:53:01 vesuvius kernel: ada2: Command Queueing enabled Jan 24 12:53:01 vesuvius kernel: ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) Jan 24 12:53:01 vesuvius kernel: ada2: Previously was known as ad12 ... I use 4 HDD in RAID10 via ZFS. With a very irregular intervals fall off HDD drives. As a result, the server stops. Jan 24 06:48:06 vesuvius kernel: ahcich2: Timeout on slot 6 port 0 Jan 24 06:48:06 vesuvius kernel: ahcich2: is cs ss 00c0 rs 00c0 tfd 40 serr cmd e817 Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 4c 4e 1e 40 68 00 00 01 00 00 Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): Retrying command Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 0080) Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0 Jan 24 06:51:11 vesuvius kernel: ahcich2: is cs 0100 ss rs 0100 tfd 00 serr cmd e817 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 0080) Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0 Jan 24 06:51:11 vesuvius kernel: ahcich2: is cs 0100 ss rs 0100 tfd 00 serr cmd e817 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked Jan 24 06:51:11 vesuvius kernel: swap_pager: I/O error - pagein failed; blkno 4227133,size 8192, error 6 Jan 24 06:51:11 vesuvius kernel: (ada2:(pass2:vm_fault: pager read error, pid 1943 (named) Jan 24 06:51:11 vesuvius kernel: ahcich2:0:ahcich2:0:0:0:0): lost device Jan 24 06:51:11 vesuvius kernel: 0): passdevgonecb: devfs entry is gone Jan 24 06:51:11 vesuvius kernel: pid 1943 (named), uid 53: exited on signal 11 ... Helps only restart by pressing Power. Judging by the state of SMART, HDD have no problems. SATA data cable changed. I found a similar problem: http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html PR: amd64/165547: NVIDIA MCP67 AHCI SATA controller timeout -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo
Re[2]: Re[2]: AHCI timeout when using ZFS + AIO + NCQ
- Original Message - From: Vladislav Prodan univers...@ukr.net Is it always the same disk, of so replace it SMART helps identify issues but doesn't tell you 100% there's no problem. Now it has fallen off a different HDD - ada0. I'm 99% sure that MHDD will not find problems in HDD - ada0 and ada2. I still have three servers with similar chipsets that have similar problems with blade ahci times out. I notice your disks are connecting at SATA 3.x, which rings bells. We had a very similar issue on a new Supermicro machine here and after much testing we proved to our satisfaction that the problem was the HW. I have a motherboard ASUS M5A97 PRO http://www.asus.com/Motherboard/M5A97_PRO/#specifications Has replacement SATA data cables. Putting hard RAID controller does not guarantee data recovery at his death. Essentially the combination of SATA 3 speeds the midplane / backplane degraded the connection between the MB and HDD enough to cause the disks to randomly drop when under load. If we connected the disks directly to the MB with SATA cables the problem went away. In the end we had midplanes changed from an AHCI pass-through to active LSI controller. So if you have any sort of midplane / backplane connecting your disks try connecting them direct to the MB / controller via known SATA 3.x compliant cables and see if that stops the drops. Another test you can do is to force the disks to connect at SATA 2.x this also fixed it in our case, but wasn't something we wanted to put into production hence the controller swap. To force SATA 2 speeds you can use the following in /boot/loader.conf where 'X' is disk identifier e.g. for ada0 X = 0:- hint.ahcich.X.sata_rev=2 Hope this helps. Regards Steve -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re[2]: AHCI timeout when using ZFS + AIO + NCQ
Essentially the combination of SATA 3 speeds the midplane / backplane degraded the connection between the MB and HDD enough to cause the disks to randomly drop when under load. If we connected the disks directly to the MB with SATA cables the problem went away. In the end we had midplanes changed from an AHCI pass-through to active LSI controller. So if you have any sort of midplane / backplane connecting your disks try connecting them direct to the MB / controller via known SATA 3.x compliant cables and see if that stops the drops. Another test you can do is to force the disks to connect at SATA 2.x this also fixed it in our case, but wasn't something we wanted to put into production hence the controller swap. To force SATA 2 speeds you can use the following in /boot/loader.conf where 'X' is disk identifier e.g. for ada0 X = 0:- hint.ahcich.X.sata_rev=2 This is still worth trying as it could still indicate a problem with your controller, cables or disks. Or, simply disable the ahci kernel module and use only ata? -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
AHCI timeout when using ZFS + AIO + NCQ
I have the server: FreeBSD 9.1-PRERELEASE #0: Wed Jul 25 01:40:56 EEST 2012 Jan 24 12:53:01 vesuvius kernel: atapci0: JMicron ATA controller port 0xc040-0xc047,0xc030-0xc033,0xc020-0xc027,0xc010-0xc013,0xc000-0xc00f mem 0xfe21-0xfe2101ff irq 51 at device 0.0 on pci3 ... Jan 24 12:53:01 vesuvius kernel: ahci0: ATI IXP700 AHCI SATA controller port 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem 0xfe307000-0xfe3073ff irq 19 at device 17.0 on pci0 Jan 24 12:53:01 vesuvius kernel: ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported ... Jan 24 12:53:01 vesuvius kernel: ada2 at ahcich2 bus 0 scbus4 target 0 lun 0 Jan 24 12:53:01 vesuvius kernel: ada2: ST3000DM001-9YN166 CC4C ATA-8 SATA 3.x device Jan 24 12:53:01 vesuvius kernel: ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) Jan 24 12:53:01 vesuvius kernel: ada2: Command Queueing enabled Jan 24 12:53:01 vesuvius kernel: ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) Jan 24 12:53:01 vesuvius kernel: ada2: Previously was known as ad12 ... I use 4 HDD in RAID10 via ZFS. With a very irregular intervals fall off HDD drives. As a result, the server stops. Jan 24 06:48:06 vesuvius kernel: ahcich2: Timeout on slot 6 port 0 Jan 24 06:48:06 vesuvius kernel: ahcich2: is cs ss 00c0 rs 00c0 tfd 40 serr cmd e817 Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 4c 4e 1e 40 68 00 00 01 00 00 Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): Retrying command Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 0080) Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0 Jan 24 06:51:11 vesuvius kernel: ahcich2: is cs 0100 ss rs 0100 tfd 00 serr cmd e817 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 0080) Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0 Jan 24 06:51:11 vesuvius kernel: ahcich2: is cs 0100 ss rs 0100 tfd 00 serr cmd e817 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked Jan 24 06:51:11 vesuvius kernel: swap_pager: I/O error - pagein failed; blkno 4227133,size 8192, error 6 Jan 24 06:51:11 vesuvius kernel: (ada2:(pass2:vm_fault: pager read error, pid 1943 (named) Jan 24 06:51:11 vesuvius kernel: ahcich2:0:ahcich2:0:0:0:0): lost device Jan 24 06:51:11 vesuvius kernel: 0): passdevgonecb: devfs entry is gone Jan 24 06:51:11 vesuvius kernel: pid 1943 (named), uid 53: exited on signal 11 ... Helps only restart by pressing Power. Judging by the state of SMART, HDD have no problems. SATA data cable changed. I found a similar problem: http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html PR: amd64/165547: NVIDIA MCP67 AHCI SATA controller timeout -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[ZFS] Server periodically become unavailable
FreeBSD 9.1-PRERELEASE #0: Wed Jul 25 01:40:56 EEST 2012 I have the server: 8 cores AMD, 16GB RAM, 4x3TB HDD in RAID10 for ZFS. Sometime wheels fall off the server and the network. Can this clean-up memory for ZFS cache? I enclose a picture with the monitoring system at the time lags. http://imageshack.us/a/img341/9643/memoryusage.png http://imageshack.us/a/img22/6935/nginxclientstat.png http://imageshack.us/a/img19/8817/realmemory.png #cat /etc/sysctl.conf kern.ipc.somaxconn=65535 kern.ipc.maxsockets=204800 net.inet.ip.portrange.first=1024 net.inet.ip.portrange.last=65535 kern.ipc.shmmax=67108864 kern.ipc.shmall=67108864 net.inet.tcp.rfc3465=0 net.graph.maxdgram=8388608 net.graph.recvspace=8388608 net.route.netisr_maxqlen=4096 kern.ipc.nmbclusters=40 kern.ipc.maxsockbuf=83886080 net.inet.tcp.recvbuf_inc=524288 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendbuf_inc=524288 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.sendspace=65536 net.inet.tcp.keepidle=30 net.inet.tcp.keepintvl=6 net.inet.ip.fw.dyn_max=65535 net.inet.ip.fw.dyn_buckets=65536 net.inet.ip.fw.dyn_ack_lifetime=120 net.inet.ip.fw.dyn_syn_lifetime=10 net.inet.tcp.nolocaltimewait=1 security.bsd.hardlink_check_uid=1 security.bsd.hardlink_check_gid=1 security.bsd.see_other_uids=0 security.bsd.see_other_gids=0 -- Vladislav V. Prodan System Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org