Re: Regression in 8.2-STABLE bge code (from 7.4-STABLE)
A patch which allows FreeBSD 8.3-STABLE to use the Broadcom GigE bge ports on the Tyan S4882 quad Opteron motherboard (and almost certainly on the S4881 motherboard, which had the same problem with 7.4-STABLE) has been developed by YongHyeon Pyun. The problem involves a bug in the PCI bridge which connects the Broadcom bge Ethernet ports. He will shortly be committing the patch to HEAD. Thank you! Mike Squires mikes at siralan.org UN*X at home since 1986 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Issue with hast replication
Mikolaj Golub (trociny) writes: > > > PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from > tcp4://192.168.1.200. > PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write > synchronization data: Cannot allocate memory. > PR> Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request > (Cannot allocate memory): WRITE(31642091520, 131072). > > 31642091520 looks like rather large offset for 10Gb volume... Sorry, that should have been 100G - I typed from memory instead of copy-pasting. > Just to be more confident that this is a HAST issue could you please try the > following experiment? > > 1) Stop hastd on h2. > > 2) On h1 run something like below: > > dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 > of=/dev/zvol/zfs/hvol > > (copy hvol from h1 to h2 without hastd to see if it will succeed). > > Note: you will need to recreate HAST provider on secondary after this. Ok this is interesting. (For debugging purposes I've renamed the target zvol as "junk", you'll see why below). 1) As you suggested: h1# dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 of=/dev/zvol/zfs/junk dd: /dev/zvol/zfs/junk: Invalid argument 0+6 records in 0+5 records out 131072 bytes transferred in 0.002344 secs (55920640 bytes/sec) To be certain which dd was complaining, I renamed the target zvol. 2) Tried repeatedly, sometimes the number of bytes is a bit different: 0+7 records in 0+6 records out 147456 bytes transferred in 0.002448 secs (60233277 bytes/sec) And yes, hastd is stopped on h2. 3) I tried dd'ing zero to the zvol locally on h2: h2# dd if=/dev/zero of=/dev/zvol/zfs/junk bs=131072 ^C1817+0 records in 1816+0 records out 238026752 bytes transferred in 1.582006 secs (150458820 bytes/sec) That works, until I ^C it. 4) I tried redirecting the output of the dd | ssh to a file on the h2 side: h1# dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 of=/tmp/x ^C653+0 records in 652+0 records out 85458944 bytes transferred in 2.408074 secs (35488506 bytes/sec) That works too, until I ^C it. 5) Things get even weirder - if I then go over to h2 and dd the "/tmp/x" test file over to the zvol: h2# dd if=x bs=131072 of=/dev/zvol/zfs/junk dd: /dev/zvol/zfs/junk: Invalid argument 652+1 records in 652+0 records out 85458944 bytes transferred in 0.444571 secs (192227879 bytes/sec) Note that the file /tmp/x is 86917120 bytes long. 6) I try to copy more data into /tmp/x - it's now 291946496 (~280 MB) h2# dd if=x bs=131072 of=/dev/zvol/zfs/junk 2227+1 records in 2227+1 records out 291946496 bytes transferred in 3.564129 secs (81912441 bytes/sec) No more "invalid argument"... 7) ktrace on the destination dd: [...] \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ \0" 5807 dd RET read 17992/0x4648 5807 dd CALL write(0x3,0x800c09000,0x4648) 5807 dd RET write -1 errno 22 Invalid argument 5807 dd CALL write(0x2,0x7fffd300,0x4) 5807 dd GIO fd 2 wrote 4 bytes "dd: " 5807 dd RET write 4 5807 dd CALL write(0x2,0x7fffd3e0,0x12) 5807 dd GIO fd 2 wrote 18 bytes "/dev/zvol/zfs/junk" truss is a bit more informative: fstat(0,{ mode=p- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0) lseek(0,0x0,SEEK_CUR)ERR#29 'Illegal seek' Illegal seek, eh ? Any clues ? The boxes are identical (HP DL380 G6), though the RAM config is different. Summary: - ssh works fine - h1 zvol to h2 zvol over ssh fails - h1 zvol to h2 /tmp/x over ssh is fine - h2 /dev/zero locally to h2 zvol is fine - h2 /tmp/x locally to h2 zvol fails at first, but works afterwards... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Issue with hast replication
On Sun, 11 Mar 2012 19:54:57 +0100 Phil Regnauld wrote: PR> Hi, PR> I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to stable PR> if told to, but want to check here first), ZFS and HAST. HAST is configured to PR> run on top of zvols configured on each host, as illustrated: PR> FS FS PR>+--++--+ PR>| hvol | < hastd -> | hvol | PR>+--++--+ PR>| zvol || zvol | PR>+--++--+ PR>| zfs || zfs | PR>+--++--+ PR> h1 h2 PR> Connection is gigabit to the same switch. No issues with large TCP PR> transfers such as SCP/FTP. PR> Config is vanilla: PR> # zfs create -V 10G zfs/hvol PR> hast.conf: PR> resource hvol { PR> on h1 { PR> local /dev/zvol/zfs/hvol PR> remote tcp4://192.168.1.100 PR> } PR> on h2 { PR> local /dev/zvol/zfs/hvol PR> remote tcp4://192.168.1.200 PR> } PR> } PR> h1 is behaving fine as primary, either with h2 turned off or in init - PR> but as soon as I set the role to secondary for h2, the receiver PR> repeatedly crashes and restarts - see the traces below. PR> Primary: PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. PR> Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. PR> Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31642091520, 131072). 31642091520 looks like rather large offset for 10Gb volume... Just to be more confident that this is a HAST issue could you please try the following experiment? 1) Stop hastd on h2. 2) On h1 run something like below: dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 of=/dev/zvol/zfs/hvol (copy hvol from h1 to h2 without hastd to see if it will succeed). Note: you will need to recreate HAST provider on secondary after this. -- Mikolaj Golub ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: What ZFS version will be in 8.3?
ZFS v28. It was merged into RELENG_8 from current in may last year. Alonso ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: What ZFS version will be in 8.3?
Am 11.03.2012 um 20:43 schrieb Steven Hartland: > Hi guys which version of ZFS support will be included in 8.3? V28, AFAIK. Has been available as a back-port for 8.2 for some time. Hopefully, it's stable. ;-) Rainer ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
What ZFS version will be in 8.3?
Hi guys which version of ZFS support will be included in 8.3? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Troube with SSD
- Original Message - From: "Willem Jan Withagen" Just as a followup. I reported the above problem Today it occurred again. But this time I was able to find a firmware upgrade for the Corsair Force GT from 1.2 to 1.3.3 (Need Win7 to be able to upgrade) Hopefully that helps, and it does not disconnect about every 4 weeks. Sandforce based SSD as known for this issue, the later firmware updates do indeed help with the problem. We've found the 1.3.3 Corsair on none GT versions to be nice and stable :) Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Issue with hast replication
Hi, I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to stable if told to, but want to check here first), ZFS and HAST. HAST is configured to run on top of zvols configured on each host, as illustrated: FS FS +--++--+ | hvol | < hastd -> | hvol | +--++--+ | zvol || zvol | +--++--+ | zfs || zfs | +--++--+ h1 h2 Connection is gigabit to the same switch. No issues with large TCP transfers such as SCP/FTP. Config is vanilla: # zfs create -V 10G zfs/hvol hast.conf: resource hvol { on h1 { local /dev/zvol/zfs/hvol remote tcp4://192.168.1.100 } on h2 { local /dev/zvol/zfs/hvol remote tcp4://192.168.1.200 } } h1 is behaving fine as primary, either with h2 turned off or in init - but as soon as I set the role to secondary for h2, the receiver repeatedly crashes and restarts - see the traces below. I've seen http://lists.freebsd.org/pipermail/freebsd-current/2011-May/024871.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2012-01/msg00510.html ... but in the first case the fix is in 9 since last year, and the second is referring to async replication - I'm using the default (fullsync). hastctl status on the primary shows the dirty size diminishing slowly, but obviously this isn't optimal (and causes freezes on I/O to the primary hvol, causing all kinds of issues with the consumers of the hvol). Any idea ? Am I doing something wrong ? Primary: Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31642091520, 131072). Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31649693696, 131072). Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31691243520, 131072). Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31783256064, 131072). Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31782731776, 131072). Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31803441152, 131072). Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31881953280, 131072). Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Secondary: Mar 11 01:01:30 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2874, exitcode=75). Mar 11 01:01:38 h2 hastd[2875]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:01:44 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2875, exitcode=75). Mar 11 01:01:45 h2 hastd[2876]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:01:50 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2876, exitcode=75). Mar 11 01:01:56 h2 hastd[2877]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:01 h2 hastd[2506]: [hvol] (secondary) Worker p
Re: devd(8) based AUTOMOUNTER (version 1.3)
Thanks for sharing ! Reads awesome. Even with exFAT integration, my new goto "external disk fs" :) best regards On Sun, Mar 4, 2012 at 10:49 AM, vermaden wrote: > Already at 1.3.1 ... > > Fixed the 'detach' section (s/PREFIX/MNTPREFIX/g). > Fixed removing directories of manually (properly) unmounted filesystems. > > "vermaden" pisze: >> Hi, >> >> after some 'fun' with MP3 players I have made some modifications and fixes. >> >> Here is a list of whats changed: >> >> Fixed bug about inproper exFAT detection, now mounts fine. >> Fixed bug about creating mount dirs for all attached devices no matter if >> needed or not. >> Revised 'detach' section, now removes only directory that is unmounted (if >> enabled of course). >> Simplified FAT/NTFS sections, removed additional check as it break some MP3 >> players default filesystems automount. >> >> The latest 1.3 version can be found here as usual: >> https://github.com/vermaden/automount/ >> >> Regards, >> vermaden > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Troube with SSD
On 2012-02-01 14:40, Willem Jan Withagen wrote: > Hi, > > I have this ZFS server up for about 27 days, and about 3 weeks ago (was > not really paying attention) it turns out it lost its SSD that I'm using > for log and cache. There is also a poor and lonely memory stick for log. > So the box did not really suffer file loss. > > system is running: > FreeBSD zfs.digiware.nl 8.2-STABLE FreeBSD 8.2-STABLE #58: Thu Nov 17 > 09:43:46 CET 2011 > r...@zfs.digiware.nl:/home/obj/usr/src/src8/src/sys/ZFS amd64 > > more info like dmesg, pciconf, kernconf, zpool iostat at: > http://www.tegenbosch28.nl/FreeBSD/systems/ZFS/ > > But it is weird to just lose a SSD from the bus. And it has happened > before. And you can see that AHCI really banged on the frontdoor... > > The device is a Corsair 60Gb Force GT. And thusfar I have not found any > suggestions that that serie of devices is prone to doing this. > > It was a real dead device, the only way to get it back: > powercycle the device by pulling it, and stick it back > then camcontrol rescan > > I've now upgrade it to a 120Gb Corsair, to see if that has the same problem. > > Other FreeBSD-ers have like problems? > > Regards, > --WjW > > > Jan 7 10:04:24 zfs kernel: ahcich3: Timeout on slot 27 port 0 > Jan 7 10:04:24 zfs kernel: ahcich3: is cs 2000 ss 3800 > rs 3800 tfd c0 serr cmd 0004dd17 > Jan 7 10:04:56 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 0080) > Jan 7 10:05:26 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:05:26 zfs kernel: ahcich3: is cs 2000 ss > rs 2000 tfd 80 serr cmd 0004dd17 > Jan 7 10:05:57 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 0080) > Jan 7 10:06:27 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:06:27 zfs kernel: ahcich3: is cs 2000 ss > rs 2000 tfd 80 serr cmd 0004dd17 > Jan 7 10:06:27 zfs kernel: (ada2:ahcich3:0:0:0): lost device > Jan 7 10:06:58 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 0080) > Jan 7 10:07:28 zfs kernel: ahcich3: Timeout on slot 29 port 0 > Jan 7 10:07:28 zfs kernel: ahcich3: is cs e000 ss e000 > rs e000 tfd 80 serr cmd 0004dd17 > Jan 7 10:08:16 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 0080) > Jan 7 10:08:16 zfs kernel: ahcich3: Poll timeout on slot 31 port 0 > Jan 7 10:08:16 zfs kernel: ahcich3: is cs 8000 ss > rs 8000 tfd 80 serr cmd 0004df17 > Jan 7 10:08:46 zfs kernel: ahcich3: Timeout on slot 31 port 0 > Jan 7 10:08:46 zfs kernel: ahcich3: is cs 8000 ss > rs 8000 tfd 80 serr cmd 0004df17 > Jan 7 10:08:48 zfs kernel: (ada2:ahcich3:0:0:0): removing device entry > Jan 7 10:09:33 zfs kernel: ahcich3: AHCI reset: device not ready after > 31000ms (tfd = 0080) > Jan 7 10:09:33 zfs kernel: ahcich3: Poll timeout on slot 31 port 0 > Jan 7 10:09:33 zfs kernel: ahcich3: is cs 8000 ss > rs 8000 tfd 80 serr cmd 0004df17 Just as a followup. I reported the above problem Today it occurred again. But this time I was able to find a firmware upgrade for the Corsair Force GT from 1.2 to 1.3.3 (Need Win7 to be able to upgrade) Hopefully that helps, and it does not disconnect about every 4 weeks. Ciao, --WjW ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Time Clock Stops in FreeBSD 9.0 guest running under ESXi 5.0
On Sat, 2012-03-10 at 15:07 +0700, Adam Strohl wrote: > I've now seen this on two different VMs on two different ESXi servers > (Xeon based hosts but different hardware otherwise and at different > facilities): > > Everything runs fine for weeks then (seemingly) suddenly/randomly the > clock STOPS. In the first case I saw a jump backwards of about 15 > minutes (and then a 'freeze' of the clock). The second time just 'time > standing still' with no backwards jump. Logging accuracy is of course > questionable given the nature of the issue, but nothing really jumps out > (ie; I don't see NTPd adjusting the time just before this happens or > anything like that). > > Naturally the clock stopping causes major issues, but the machine does > technically stay running. My open sessions respond, but anything that > relies on time moving forward hangs. I can't even gracefully reboot it > because shutdown/etc all rely on time moving forward (heh). > > So I'm not sure if this is a VMWare/ESXi issue or a FreeBSD issue, or > some kind of interaction between the two. I manage lots of VMWare > based FreeBSD VMs, but these are the only ESXi 5.0 servers and the only > FreeBSD 9.0 VMs. I have never seen anything quite like this before, and > last night as I mentioned above I had it happen for the second time on a > different VM + ESXi server combo so I'm not thinking its a fluke > anymore. I've looked for other reports of this both in VMWare and > FreeBSD contexts and not seeing anything. > > What is interesting is that the 2 servers that have shown this issue > perform similar tasks, which are different from the other VMs which have > not shown this issue (yet). This is 2 VMs out of a dozen VMs spread > over two ESXi servers on different coasts. This might be a coincidence > but seems suspicious. These two VMs run these services (where as the > other VMs don't): > > - BIND > - CouchDB > - MySQL > - NFS server > - Dovecot 2.x > > I would also say that these two VMs probably are the most active, have > the most RAM and consume the most CPU because of what they do (vs. the > others). > > I have disabled NTPd since I am running the OpenVM Tools (which I > believe should be keeping the time in sync with the ESXi host, which > itself uses NTP), my only guess is maybe there is some kind of collision > where NTPd and OpenVMTools were adjusting the time at the same time. > I'm playing the waiting game now to see what this brings (again though I > am running NTPd and OpenVMTools on all the other VMs which have yet to > show this issue). > > Anyone seen anything like this? Ring any bells? > I've run into the "time standing still" problem, but only on bringing up FreeBSD on new hardware (usually industrial single-board computers). In those cases time never advances beyond the time obtained from the RTC hardware at boot. I've never seen it happen that time runs normally for a while then stops advancing, but I have almost no experience with FreeBSD as a VM guest OS. When I have seen the problem, it's always been due to interrupt problems, such as the timer tick handler getting hung or the selected timer hardware not generating interrupts. It seems unlikely to me that ntpd and the vm tools would be fighting in a way that caused this symptom. The way ntpd affects timing is to step the clock (which gets logged), or to numerically steer the kernel's timekeeping routines. The steering is clamped at 500 ppm; to make the clock appear to stop it would have to steer at 1e6 ppm. I've always assumed that VM guest services daemons that handle timekeeping use the same ntp_adjtime() interface to the kernel timekeeping that ntpd itself uses, so the same steering limits would apply. If it happens again, interesting data might be found in the output of: sysctl kern.timecounter sysctl kern.eventtimer vmstat -i ntpdc -c kerninfo -- Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"