Re: after latest patches i386 not fully patched
On Thu, Sep 17, 2020, at 6:28 PM, Dan Langille wrote: > Hello, > > After running 'freebsd-update fetch install' on a i386 server, I have > this situation: > > [dan@gelt:~] $ freebsd-version -u > 12.1-RELEASE-p10 > [dan@gelt:~] $ freebsd-version -k > 12.1-RELEASE-p9 > [dan@gelt:~] $ > > Why did this not get a new kernel? > > I ask because: > > [dan@gelt:~] $ sudo /usr/local/etc/periodic/security/405.pkg-base-audit > > Checking for security vulnerabilities in base (userland & kernel): > Host system: > Database fetched: Wed Sep 16 07:06:52 UTC 2020 > FreeBSD-kernel-12.1_9 is vulnerable: > FreeBSD -- bhyve SVM guest escape > CVE: CVE-2020-7467 > WWW: > https://vuxml.FreeBSD.org/freebsd/e73c688b-f7e6-11ea-88f8-901b0ef719ab.html > > FreeBSD-kernel-12.1_9 is vulnerable: > FreeBSD -- bhyve privilege escalation via VMCS access > CVE: CVE-2020-24718 > WWW: > https://vuxml.FreeBSD.org/freebsd/2c5b9cd7-f7e6-11ea-88f8-901b0ef719ab.html > > FreeBSD-kernel-12.1_9 is vulnerable: > FreeBSD -- ure device driver susceptible to packet-in-packet attack > CVE: CVE-2020-7464 > WWW: > https://vuxml.FreeBSD.org/freebsd/bb53af7b-f7e4-11ea-88f8-901b0ef719ab.html > > 3 problem(s) in 1 installed package(s) found. > 0 problem(s) in 0 installed package(s) found. > > Oh, let's try again: > > [dan@slocum:~] $ sudo freebsd-update fetch install > Looking up update.FreeBSD.org mirrors... 3 mirrors found. > Fetching metadata signature for 12.1-RELEASE from update4.freebsd.org... done. > Fetching metadata index... done. > Inspecting system... done. > Preparing to download files... done. > > No updates needed to update system to 12.1-RELEASE-p10. > No updates are available to install. > [dan@slocum:~] $ > > I've done everything I can > > How do I properly patch this i386 server? > > For those wondering what I just ran: > > [dan@gelt:~] $ pkg which > /usr/local/etc/periodic/security/405.pkg-base-audit > /usr/local/etc/periodic/security/405.pkg-base-audit was installed by > package base-audit-0.4 > [dan@gelt:~] $ > > on an amd64 host I have: > > [dan@slocum:~] $ freebsd-version -u > 12.1-RELEASE-p10 > [dan@slocum:~] $ freebsd-version -k > 12.1-RELEASE-p10 I understand why this occurs. I have reported it before: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=245878 Status: Closed Works As Intended What steps can we take to improve this? vuxml will continue to report all i386 hosts as vuln until the next kernel version bump. Users have no choice but to ignore the reports. Invalid false positives lead to alert fatigue. Is there a way to avoid this situation where properly patched hosts are not incorrectly labelled as vulnerable? -- Dan Langille d...@langille.org ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
after latest patches i386 not fully patched
Hello, After running 'freebsd-update fetch install' on a i386 server, I have this situation: [dan@gelt:~] $ freebsd-version -u 12.1-RELEASE-p10 [dan@gelt:~] $ freebsd-version -k 12.1-RELEASE-p9 [dan@gelt:~] $ Why did this not get a new kernel? I ask because: [dan@gelt:~] $ sudo /usr/local/etc/periodic/security/405.pkg-base-audit Checking for security vulnerabilities in base (userland & kernel): Host system: Database fetched: Wed Sep 16 07:06:52 UTC 2020 FreeBSD-kernel-12.1_9 is vulnerable: FreeBSD -- bhyve SVM guest escape CVE: CVE-2020-7467 WWW: https://vuxml.FreeBSD.org/freebsd/e73c688b-f7e6-11ea-88f8-901b0ef719ab.html FreeBSD-kernel-12.1_9 is vulnerable: FreeBSD -- bhyve privilege escalation via VMCS access CVE: CVE-2020-24718 WWW: https://vuxml.FreeBSD.org/freebsd/2c5b9cd7-f7e6-11ea-88f8-901b0ef719ab.html FreeBSD-kernel-12.1_9 is vulnerable: FreeBSD -- ure device driver susceptible to packet-in-packet attack CVE: CVE-2020-7464 WWW: https://vuxml.FreeBSD.org/freebsd/bb53af7b-f7e4-11ea-88f8-901b0ef719ab.html 3 problem(s) in 1 installed package(s) found. 0 problem(s) in 0 installed package(s) found. Oh, let's try again: [dan@slocum:~] $ sudo freebsd-update fetch install Looking up update.FreeBSD.org mirrors... 3 mirrors found. Fetching metadata signature for 12.1-RELEASE from update4.freebsd.org... done. Fetching metadata index... done. Inspecting system... done. Preparing to download files... done. No updates needed to update system to 12.1-RELEASE-p10. No updates are available to install. [dan@slocum:~] $ I've done everything I can How do I properly patch this i386 server? For those wondering what I just ran: [dan@gelt:~] $ pkg which /usr/local/etc/periodic/security/405.pkg-base-audit /usr/local/etc/periodic/security/405.pkg-base-audit was installed by package base-audit-0.4 [dan@gelt:~] $ on an amd64 host I have: [dan@slocum:~] $ freebsd-version -u 12.1-RELEASE-p10 [dan@slocum:~] $ freebsd-version -k 12.1-RELEASE-p10 — Dan Langille http://langille.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 12.0-RELEASE-p4 kernel panic on i386 boot
> Hi all, > > Wanted to make you aware of an issue I have encountered, sorry if this is > the wrong list. > > I upgraded from FreeBSD 12.0-RELEASE-p3 to p4 using: > > freebsd-fetch update > freebsd-fetch install > > and use the GENERIC kernel. Upon reboot the system kernel panics when > attempting to mount the filesystem read-write. This also happens in > single-user mode if selected at boot. > > Selecting the kernel.old from the boot menu boots the system with 12-p3 and > all works fine. I have uploaded a screenshot here: > > https://imagebin.ca/v/4hCc2Kk5YqCX > > The computer is an i386 system. I also upgraded using "freebsd-update fetch install". I also went from -p3 to -p4 on an i386. My screen shot is here: https://twitter.com/DLangille/status/1128734141569208320 Hope this helps. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
patch which implements ZFS LZ4 compression
Here is a patch against FreeBSD 9.1 STABLE which implements ZFS LZ4 compression. https://plus.google.com/106386350930626759085/posts/PLbkNfndPiM short link: http://bpaste.net/show/76095 HTH -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: patch which implements ZFS LZ4 compression
On Feb 8, 2013, at 5:52 PM, Xin Li wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 02/08/13 14:29, Dan Langille wrote: Here is a patch against FreeBSD 9.1 STABLE which implements ZFS LZ4 compression. https://plus.google.com/106386350930626759085/posts/PLbkNfndPiM short link: http://bpaste.net/show/76095 Please DO NOT use this patch! It will ruin your data silently. I'm sorry, I will remove the post. As I already posted on Ivan's Google+ post, I'm doing final universe builds to make sure that there is no regression and will merge my changes to -HEAD later today. My apologies. :) Thank you. -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: bad sector in gmirror HDD
On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote: On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 After a recent power failure, I'm seeing this in my logs: Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors I doubt this is related to a power failure. Searching on that error message, I was led to believe that identifying the bad sector and running dd to read it would cause the HDD to reallocate that bad block. http://smartmontools.sourceforge.net/badblockhowto.html This is incorrect (meaning you've misunderstood what's written there). Unreadable LBAs can be a result of the LBA being actually bad (as in uncorrectable), or the LBA being marked suspect. In either case the LBA will return an I/O error when read. If the LBAs are marked suspect, the drive will perform re-analysis of the LBA (to determine if the LBA can be read and the data re-mapped, or if it cannot then the LBA is marked uncorrectable) when you **write** to the LBA. The above smartd output doesn't tell me much. Providing actual SMART attribute data (smartctl -a) for the drive would help. The brand of the drive, the firmware version, and the model all matter -- every drive behaves a little differently. Information such as this? http://beta.freebsddiary.org/smart-fixing-bad-sector.php Yes, perfect. Thank you. First thing first: upgrade smartmontools to 5.41. Your attributes will be the same after you do this (the drive is already in smartmontools' internal drive DB), but I often have to remind people that they really need to keep smartmontools updated as often as possible. The changes between versions are vast; this is especially important for people with SSDs (I'm responsible for submitting some recent improvements for Intel 320 and 510 SSDs). Done. Anyway, the drive (albeit an old PATA Maxtor) appears to have three anomalies: 1) One confirmed reallocated LBA (SMART attribute 5) 2) One suspect LBA (SMART attribute 197) 3) A very high temperature of 51C (SMART attribute 194). If this drive is in an enclosure or in a system with no fans this would be understandable, otherwise this is a bit high. My home workstation which has only one case fan has a drive with more platters than your Maxtor, and it idles at ~38C. Possibly this drive has been undergoing constant I/O recently (which does greatly increase drive temperature)? Not sure. I'm not going to focus too much on this one. This is an older system. I suspect insufficient ventilation. I'll look at getting a new case fan, if not some HDD fans. The SMART error log also indicates an LBA failure at the 26000 hour mark (which is 16 hours prior to when you did smartctl -a /dev/ad2). Whether that LBA is the remapped one or the suspect one is unknown. The LBA was 5566440. The SMART tests you did didn't really amount to anything; no surprise. short and long tests usually do not test the surface of the disk. There are some drives which do it on a long test, but as I said before, everything varies from drive to drive. Furthermore, on this model of drive, you cannot do a surface scans via SMART. Bummer. That's indicated in the Offline data collection capabilities section at the top, where it reads: No Selective Self-test supported. So you'll have to use the dd method. This takes longer than if surface scanning was supported by the drive, but is acceptable. I'll get to how to go about that in a moment. FWIW, I've done a dd read of the entire suspect disk already. Just two errors. From the URL mentioned above: [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror dd: /dev/ad2: Input/output error 2717+0 records in 2717+0 records out 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) dd: /dev/ad2: Input/output error 38170+1 records in 38170+1 records out 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec) [root@bast:~] # That seems to indicate two problems. Are those the values I should be using with dd? I did some more precise testing: # time dd of=/dev/null if=/dev/ad2 bs=512 iseek=5566440 dd: /dev/ad2: Input/output error 9+0 records in 9+0 records out 4608 bytes transferred in 5.368668 secs (858 bytes/sec) real0m5.429s user0m0.000s sys 0m0.010s NOTE: that's 9 blocks later than mentioned in smarctl The above generated this in /var/log/messages: Aug 20 17:29:25 bast kernel: ad2: FAILURE - READ_DMA status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=5566449 [stuff snipped] That said: http://jdc.parodius.com/freebsd/bad_block_scan If you run this on your ad2 drive, I'm hoping what you'll find are two LBAs which can't be read -- one will be the remapped LBA
Re: bad sector in gmirror HDD
On Aug 20, 2011, at 1:54 PM, Alex Samorukov wrote: [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror dd: /dev/ad2: Input/output error 2717+0 records in 2717+0 records out 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec) dd: /dev/ad2: Input/output error 38170+1 records in 38170+1 records out 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec) [root@bast:~] # That seems to indicate two problems. Are those the values I should be using with dd? You can run long self-test in smartmontools (-t long). Then you can get failed sector number from the smartmontools (-l selftest) and then you can use DD to write zero to the specific sector. Already done: http://beta.freebsddiary.org/smart-fixing-bad-sector.php Search for 786767 Or did you mean something else? That doesn't seem to map to a particular sector though... I ran it for a while... # time dd of=/dev/null if=/dev/ad2 bs=512 iseek=786767 ^C4301949+0 records in 4301949+0 records out 2202597888 bytes transferred in 780.245828 secs (2822954 bytes/sec) real13m0.256s user0m22.087s sys 3m24.215s Also i am highly recommending to setup smartd as daemon and to monitor number of relocated sectors. If they will grow again - then it is a good time to utilize this disk. It is running, but with nothing custom in the .conf file. -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: bad sector in gmirror HDD
On Aug 20, 2011, at 2:04 PM, Diane Bruce wrote: On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote: On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote: On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote: ... Information such as this? http://beta.freebsddiary.org/smart-fixing-bad-sector.php ... 3) A very high temperature of 51C (SMART attribute 194). If this drive is in an enclosure or in a system with no fans this would be ... eh? What's the temperature of the second drive? Roughly the same: [root@bast:/home/dan/tmp] # smartctl -a /dev/ad2 | grep -i temp 194 Temperature_Celsius 0x0022 080 076 042Old_age Always - 51 [root@bast:/home/dan/tmp] # smartctl -a /dev/ad0 | grep -i temp 194 Temperature_Celsius 0x0022 081 074 042Old_age Always - 49 [root@bast:/home/dan/tmp] # FYI, when I first set up smartd, I questioned those values. The HDD in question, at the time, did not feel hot to the touch. ... This is an older system. I suspect insufficient ventilation. I'll look at getting a new case fan, if not some HDD fans. ... I still suggest you replace the drive, although given its age I doubt Older drive and errors starting to happen, replace ASAP. you'll be able to find a suitable replacement. I tend to keep disks like this around for testing/experimental purposes and not for actual use. I have several unused 80GB HDD I can place into this system. I think that's what I'll wind up doing. But I'd like to follow this process through and get it documented for future reference. If the data is valuable, the sooner the better. It's actually somewhat saner if the two drives are not from the same lot. Noted. -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: bad sector in gmirror HDD
On Aug 20, 2011, at 2:36 PM, Jeremy Chadwick wrote: Dan, I will respond to your reply sometime tomorrow. I do not have time to review the Email today (~7.7KBytes), but will have time tomorrow. No worries. Thank you. -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: bad sector in gmirror HDD
On Aug 20, 2011, at 3:57 PM, Jeremy Chadwick wrote: I still suggest you replace the drive, although given its age I doubt you'll be able to find a suitable replacement. I tend to keep disks like this around for testing/experimental purposes and not for actual use. I have several unused 80GB HDD I can place into this system. I think that's what I'll wind up doing. But I'd like to follow this process through and get it documented for future reference. Yes, given the behaviour of the drive I would recommend you simply replace it at this point in time. What concerns me the most is Current_Pending_Sector incrementing, but it's impossible for me to determine if that incrementing means there are other LBAs which are bad, or if the drive is behaving how its firmware is designed. Keep the drive around for further experiments/tinkering if you're interested. Stuff like this is always interesting/fun as long as your data isn't at risk, so doing the replacement first would be best (especially if both drives in your mirror were bought at the same time from the same place and have similar manufacturing plants/dates on them). I'm happy to send you this drive for your experimentation pleasure. If so, please email me an address offline. You don't have a disk with errors, and it seems you should have one. After I wipe it. I'm sure I have a destroyer CD here somewhere -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
bad sector in gmirror HDD
System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 After a recent power failure, I'm seeing this in my logs: Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors And gmirror reports: # gmirror status NameStatus Components mirror/gm0 DEGRADED ad0 (100%) ad2 I think the solution is: gmirror rebuild Comments? Searching on that error message, I was led to believe that identifying the bad sector and running dd to read it would cause the HDD to reallocate that bad block. http://smartmontools.sourceforge.net/badblockhowto.html However, since ad2 is one half of a gmirror, I don't think this is the best approach. Comments? More information: smartd, gpart, dh, diskinfo, and fdisk output at http://beta.freebsddiary.org/smart-fixing-bad-sector.php also: # gmirror list Geom name: gm0 State: DEGRADED Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 3362720654 Providers: 1. Name: mirror/gm0 Mediasize: 40027028992 (37G) Sectorsize: 512 Mode: r6w5e14 Consumers: 1. Name: ad0 Mediasize: 40027029504 (37G) Sectorsize: 512 Mode: r1w1e1 State: SYNCHRONIZING Priority: 0 Flags: DIRTY, SYNCHRONIZING GenID: 0 SyncID: 1 Synchronized: 100% ID: 949692477 2. Name: ad2 Mediasize: 40027029504 (37G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY, BROKEN GenID: 0 SyncID: 1 ID: 3585934016 -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: bad sector in gmirror HDD
On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote: On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote: System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 After a recent power failure, I'm seeing this in my logs: Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors I doubt this is related to a power failure. Searching on that error message, I was led to believe that identifying the bad sector and running dd to read it would cause the HDD to reallocate that bad block. http://smartmontools.sourceforge.net/badblockhowto.html This is incorrect (meaning you've misunderstood what's written there). Unreadable LBAs can be a result of the LBA being actually bad (as in uncorrectable), or the LBA being marked suspect. In either case the LBA will return an I/O error when read. If the LBAs are marked suspect, the drive will perform re-analysis of the LBA (to determine if the LBA can be read and the data re-mapped, or if it cannot then the LBA is marked uncorrectable) when you **write** to the LBA. The above smartd output doesn't tell me much. Providing actual SMART attribute data (smartctl -a) for the drive would help. The brand of the drive, the firmware version, and the model all matter -- every drive behaves a little differently. Information such as this? http://beta.freebsddiary.org/smart-fixing-bad-sector.php -- Dan Langille - http://langille.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS - hot spares : automatic or not?
On 1/11/2011 11:10 AM, John Hawkes-Reed wrote: On 11/01/2011 03:38, Dan Langille wrote: On 1/4/2011 11:52 AM, John Hawkes-Reed wrote: On 04/01/2011 03:08, Dan Langille wrote: Hello folks, I'm trying to discover if ZFS under FreeBSD will automatically pull in a hot spare if one is required. This raised the issue back in March 2010, and refers to a PR opened in May 2009 * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 In turn, the PR refers to this March 2010 post referring to using devd to accomplish this task. http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html Does the above represent the the current state? I ask because I just ordered two more HDD to use as spares. Whether they sit on the shelf or in the box is open to discussion. As far as our testing could discover, it's not automatic. I wrote some Ugly Perl that's called by devd when it spots a drive-fail event, which seemed to DTRT when simulating a failure by pulling a drive. Without such a script, what is the value in creating hot spares? We went through that loop in the office. We're used to the way the Netapps work here, where often one's first notice of a failed disk is a visit from the courier with a replacement. (I'm only half joking) In the end, writing enough perl to swap in the spare disk made much more sense than paging the relevant admin on disk-fail and expecting them to be able to type straight at 4AM. Our thinking is that having a hot spare allows us to do the physical disk-swap in office hours, rather than (for instance) running in a degraded state over a long weekend. If it's of interest, I'll see if I can share the code. I think this very much of interest. :) -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS - hot spares : automatic or not?
On 1/4/2011 11:52 AM, John Hawkes-Reed wrote: On 04/01/2011 03:08, Dan Langille wrote: Hello folks, I'm trying to discover if ZFS under FreeBSD will automatically pull in a hot spare if one is required. This raised the issue back in March 2010, and refers to a PR opened in May 2009 * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 In turn, the PR refers to this March 2010 post referring to using devd to accomplish this task. http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html Does the above represent the the current state? I ask because I just ordered two more HDD to use as spares. Whether they sit on the shelf or in the box is open to discussion. As far as our testing could discover, it's not automatic. I wrote some Ugly Perl that's called by devd when it spots a drive-fail event, which seemed to DTRT when simulating a failure by pulling a drive. Without such a script, what is the value in creating hot spares? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS - benchmark tuning before and after doubling RAM
I've been running a ZFS array for about 10 months on a system with 4GB of RAM. I'm about to add another 4GB of RAM. I think this might be an opportune time to run some simple benchmarks and do some tuning. Getting more out of the system is not a priority for me. It does what I need now. However, I do see some merit in writing something up for others to see/follow/learn. The system is running FreeBSD 8.2-PRERELEASE #1: Tue Nov 30 22:07:59 EST 2010 on a 64 bit box. The ZFS array consists of 7x2TB commodity drives on two SiI3124 SATA controllers. The OS runs off a gmirror RAID-1. More details here: http://www.freebsddiary.org/zfs-benchmark.php First, up, I've done a simple bonnie++ benchmark before I add more RAM. I ran this on two different datasets; one with compression enabled, one without. If anyone has suggestions for various tests, option settings, etc, I'm happy to run them and include the results. We have lots of time to play with this. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS - benchmark tuning before and after doubling RAM
On 1/8/2011 4:33 PM, Mehmet Erol Sanliturk wrote: On Sat, Jan 8, 2011 at 3:37 PM, Dan Langille d...@langille.org mailto:d...@langille.org wrote: I've been running a ZFS array for about 10 months on a system with 4GB of RAM. I'm about to add another 4GB of RAM. I think this might be an opportune time to run some simple benchmarks and do some tuning. Getting more out of the system is not a priority for me. It does what I need now. However, I do see some merit in writing something up for others to see/follow/learn. The system is running FreeBSD 8.2-PRERELEASE #1: Tue Nov 30 22:07:59 EST 2010 on a 64 bit box. The ZFS array consists of 7x2TB commodity drives on two SiI3124 SATA controllers. The OS runs off a gmirror RAID-1. More details here: http://www.freebsddiary.org/zfs-benchmark.php First, up, I've done a simple bonnie++ benchmark before I add more RAM. I ran this on two different datasets; one with compression enabled, one without. If anyone has suggestions for various tests, option settings, etc, I'm happy to run them and include the results. We have lots of time to play with this. -- Dan Langille - http://langille.org/ I think , you know the following pages : http://hub.opensolaris.org/bin/view/Community+Group+zfs/zfstestsuite http://dlc.sun.com/osol/test/downloads/current/ http://hub.opensolaris.org/bin/view/Community+Group+testing/testsuites http://hub.opensolaris.org/bin/view/Community+Group+testing/zones Some of the links may disappear spontaneously because of restructuring of their respective sites . Looking briefly, them seen to be more aimed at regression testing than a benchmark. They all seem to be the same thing (just different instances). Perhaps I am mistaken, but I will look closer. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS - hot spares : automatic or not?
Hello folks, I'm trying to discover if ZFS under FreeBSD will automatically pull in a hot spare if one is required. This raised the issue back in March 2010, and refers to a PR opened in May 2009 * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 In turn, the PR refers to this March 2010 post referring to using devd to accomplish this task. http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html Does the above represent the the current state? I ask because I just ordered two more HDD to use as spares. Whether they sit on the shelf or in the box is open to discussion. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: slow ZFS on FreeBSD 8.1
On 12/31/2010 6:47 PM, Jeremy Chadwick wrote: On Sat, Jan 01, 2011 at 10:33:43AM +1100, Peter Jeremy wrote: Based on my experiences at home, I converted my desktop at work to pure ZFS. The only issues I've run into have been programs that extensively use mmap(2) - which is a known issue with ZFS. Is your ZFS root filesystem associated with a pool that's mirrored or using raidzX? What about mismatched /boot content (ZFS vs. UFS)? What about booting into single-user mode? http://wiki.freebsd.org/ZFSOnRoot indirectly hints at these problems but doesn't outright admit them (yet should), so I'm curious to know how people have solved them. Remembering manual one-offs for a system configured this way is not acceptable (read: highly prone to error/mistake). Is it worth the risk? Most administrators don't have the tolerance for stuff like that in the middle of a system upgrade or what not; they should be able to follow exactly what's in the handbook, to a tee. There's a link to www.dan.me.uk at the bottom of the above Wiki page that outlines the madness that's required to configure the setup, all of which has to be done by hand. I don't know many administrators who are going to tolerate this when deploying numerous machines, especially when compounded by the complexities mentioned above. This basically outlines the reason why I do not use ZFS on root. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: slow ZFS on FreeBSD 8.1
On 12/30/2010 4:00 AM, Matthew Seaman wrote: On 30/12/2010 00:56, Dan Langille wrote: them. You'll need to run both 'zpool update -a' and 'zfs update -a' -- As Jean-Yves pointed out: upgrade not update. Some word beginning with 'up' anyhow. # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0 This part applies only if you're booting from ZFS drives? Yes, if you're booting from ZFS and you're using gpt to partition the disks, as described at http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/ et seq. That's probably the most common way of installing FreeBSD+ZFS in use today. The reason I've not installed ZFS on root is because of the added complications. I run the OS on ufs (with gmirror) and my data is on ZFS. We must be hanging out with different groups. Most of the people I know don't have ZFS on root. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: slow ZFS on FreeBSD 8.1
On 12/29/2010 12:47 PM, Matthew Seaman wrote: On 29/12/2010 08:57, Freek van Hemert wrote: The hacks sounds good until 8.2 is released however, regarding this upgrade, is the on-disk format the same? Can I just add the pool to a new install (of 8.2-RC) and expect higher performance? Or do I need to recreate the pool with the newer versions of the utilities? No -- the on-disk format is different. ZFS will run fine with the older on-disk formats, but you won't get the full benefits without updating them. You'll need to run both 'zpool update -a' and 'zfs update -a' -- this is a non-reversible step, so be certain you will never need to downgrade before doing it. Also, you *will* need to update gptzfsboot on your drives, or you will come to a sticky end if you try and reboot. Something like this: # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad0 Cheers, Matthew This part applies only if you're booting from ZFS drives? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: MCA messages after upgrade to 8.2-BEAT1
On 12/22/2010 9:57 AM, John Baldwin wrote: On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote: Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833 Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105, Status 0x Dec 21 12:42:26 kavkaz kernel: MCA: Vendor AuthenticAMD, ID 0x40f33, APIC ID 0 Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD Memory Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0 You are getting corrected ECC errors in your RAM. You see them once an hour because we poll the machine check registers once an hour. If this happens constantly you might have a DIMM that is dying? John: I take it these ECC errors *may* have been happening for some time. What has changed is the OS now polls for the errors and reports them. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic after replacing log device
On 11/16/2010 8:41 PM, Terry Kennedy wrote: I would say it is definitely very odd that writes are a problem. Sounds like it might be a hardware problem. Is it possible to export the pool, remove the ZIL and re-import it? I myself would be pretty nervous trying that, but it would help isolate the problem? If you can risk it. I think it is unlikely to be a hardware problem. While I haven't run any destructive testing on the ZFS pool, the fact that it can be read without error, combined with ECC throughout the system and the panic always happen- ing on the first write, makes me think that it is a software issue in ZFS. When I do: zpool export data; zpool remove data da0 I get a No such pool: data. I then re-imported the pool and did: zpool offline data da0; zpool export data; zpool import data After doing that, I can write to the pool without a panic. But once I online the log device and do any writes, I get the panic again. As I mentioned, I have this data replicated elsewere, so I can exper- iment with the pool if it will help track down this issue. Any more news on this? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
'zfs send -i': destination has been modified
I am trying to do a 'zfs send -i' and failing. This is my simple proof of concept test: Create the data # zfs create storage/a # touch /storage/a/1 # touch /storage/a/2 # touch /storage/a/3 Snapshot # zfs snapshot storage/a...@2010.10.19 send # zfs send storage/a...@2010.10.19 | zfs receive -v storage/compressed/a receiving full stream of storage/a...@2010.10.19 into storage/compressed/a...@2010.10.19 received 252KB stream in 2 seconds (126KB/sec) # Create one more file and snapshot that # touch /storage/a/4 # zfs snapshot storage/a...@2010.10.20 send it # zfs send -i storage/a...@2010.10.19 storage/a...@2010.10.20 | zfs receive -v storage/compressed/a receiving incremental stream of storage/a...@2010.10.20 into storage/compressed/a...@2010.10.20 received 250KB stream in 3 seconds (83.4KB/sec) What do we have? # find /storage/compressed/a /storage/compressed/a /storage/compressed/a/1 /storage/compressed/a/2 /storage/compressed/a/3 /storage/compressed/a/4 Of note: * FreeBSD 8.1-STABLE * ZFS filesystem version 4. * ZFS pool version 15. * zfs send is on compression off * zfs receive has compression on What I actually want to do and what fails: # zfs snapshot storage/bac...@2010.10.19 # zfs send storage/bac...@2010.10.19 | zfs receive -v storage/compressed/bacula receiving full stream of storage/bac...@2010.10.19 into storage/compressed/bac...@2010.10.19 received 4.38TB stream in 42490 seconds (108MB/sec) # zfs snapshot storage/bac...@2010.10.20 # zfs send -i storage/bac...@2010.10.19 storage/bac...@2010.10.20 | zfs receive -v storage/compressed/bacula receiving incremental stream of storage/bac...@2010.10.20 into storage/compressed/bac...@2010.10.20 cannot receive incremental stream: destination storage/compressed/bacula has been modified since most recent snapshot warning: cannot send 'storage/bac...@2010.10.20': Broken pipe I have no idea why this fails. Clues please? To my knowledge, the destination has not been written to. -- Dan Langille -- http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 'zfs send -i': destination has been modified
On Wed, October 20, 2010 8:44 am, Ruben de Groot wrote: On Wed, Oct 20, 2010 at 08:32:33AM -0400, Dan Langille typed: I am trying to do a 'zfs send -i' and failing. This is my simple proof of concept test: Create the data # zfs create storage/a # touch /storage/a/1 # touch /storage/a/2 # touch /storage/a/3 Snapshot # zfs snapshot storage/a...@2010.10.19 send # zfs send storage/a...@2010.10.19 | zfs receive -v storage/compressed/a receiving full stream of storage/a...@2010.10.19 into storage/compressed/a...@2010.10.19 received 252KB stream in 2 seconds (126KB/sec) # Create one more file and snapshot that # touch /storage/a/4 # zfs snapshot storage/a...@2010.10.20 send it # zfs send -i storage/a...@2010.10.19 storage/a...@2010.10.20 | zfs receive -v storage/compressed/a receiving incremental stream of storage/a...@2010.10.20 into storage/compressed/a...@2010.10.20 received 250KB stream in 3 seconds (83.4KB/sec) What do we have? # find /storage/compressed/a /storage/compressed/a /storage/compressed/a/1 /storage/compressed/a/2 /storage/compressed/a/3 /storage/compressed/a/4 Of note: * FreeBSD 8.1-STABLE * ZFS filesystem version 4. * ZFS pool version 15. * zfs send is on compression off * zfs receive has compression on What I actually want to do and what fails: # zfs snapshot storage/bac...@2010.10.19 # zfs send storage/bac...@2010.10.19 | zfs receive -v storage/compressed/bacula receiving full stream of storage/bac...@2010.10.19 into storage/compressed/bac...@2010.10.19 received 4.38TB stream in 42490 seconds (108MB/sec) # zfs snapshot storage/bac...@2010.10.20 # zfs send -i storage/bac...@2010.10.19 storage/bac...@2010.10.20 | zfs receive -v storage/compressed/bacula receiving incremental stream of storage/bac...@2010.10.20 into storage/compressed/bac...@2010.10.20 cannot receive incremental stream: destination storage/compressed/bacula has been modified since most recent snapshot warning: cannot send 'storage/bac...@2010.10.20': Broken pipe I have no idea why this fails. Clues please? To my knowledge, the destination has not been written to. Has any read operation been done on the destination (ie: updated atime) ? Not that I know of. But I do think that is the issue. Thank you. Adding a -F option to the receive helps: # zfs send -i storage/bac...@2010.10.19 storage/bac...@2010.10.20 | zfs receive -vF storage/compressed/bacula receiving incremental stream of storage/bac...@2010.10.20 into storage/compressed/bac...@2010.10.20 received 20.0GB stream in 303 seconds (67.5MB/sec) Just after I sent my email, I found this post: http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075774.html Problem solved. :) -- Dan Langille -- http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: out of HDD space - zfs degraded
On 10/4/2010 7:19 AM, Alexander Leidinger wrote: Quoting Dan Langille d...@langille.org (from Sun, 03 Oct 2010 08:08:19 -0400): Overnight, the following appeared in /var/log/messages: Oct 2 21:56:46 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103157760 size=1024 Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103159808 size=1024 [...] Given the outage from yesterday when ada0 was offline for several hours, I'm guessing that checksum mismatches on that drive are expected. Yes, /dev/gpt/disk06-live == ada0. If you have the possibility to run a scrub of the pool, there should be no additional checksum errors accouring *after* the scrub is *finished*. If checksum errors still appear on this disk after the scrub is finished, you should have a look at the hardware (cable/disk) and take appropriate replacement actions. For the record, there have been no further checksum errors. :) -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
On Mon, October 4, 2010 3:27 am, Martin Matuska wrote: Try using zfs receive with the -v flag (gives you some stats at the end): # zfs send storage/bac...@transfer | zfs receive -v storage/compressed/bacula And use the following sysctl (you may set that in /boot/loader.conf, too): # sysctl vfs.zfs.txg.write_limit_override=805306368 I have good results with the 768MB writelimit on systems with at least 8GB RAM. With 4GB ram, you might want to try to set the TXG write limit to a lower threshold (e.g. 256MB): # sysctl vfs.zfs.txg.write_limit_override=268435456 You can experiment with that setting to get the best results on your system. A value of 0 means using calculated default (which is very high). I will experiment with the above. In the meantime: During the operation you can observe what your disks actually do: a) via ZFS pool I/O statistics: # zpool iostat -v 1 b) via GEOM: # gstat -a The following output was produced while the original copy was underway. $ sudo gstat -a -b -I 20s dT: 20.002s w: 20.000s L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 7452387 248019.5 64 21287.1 79.4 ada0 7452387 248019.5 64 21287.2 79.4 ada0p1 4492427 246556.7 64 21286.6 63.0 ada1 4494428 246916.9 65 21276.6 66.9 ada2 8379313 24798 13.5 65 21277.5 78.6 ada3 5372306 24774 14.2 64 21277.5 77.6 ada4 10355291 24741 15.9 63 21277.4 79.6 ada5 4380316 24807 13.2 64 21287.7 77.0 ada6 7452387 248019.5 64 21287.4 79.7 gpt/disk06-live 4492427 246556.7 64 21286.7 63.1 ada1p1 4494428 246916.9 65 21276.6 66.9 ada2p1 8379313 24798 13.5 65 21277.6 78.6 ada3p1 5372306 24774 14.2 64 21277.6 77.6 ada4p1 10355291 24741 15.9 63 21277.5 79.6 ada5p1 4380316 24807 13.2 64 21287.8 77.0 ada6p1 4492427 246556.8 64 21286.9 63.4 gpt/disk01-live 4494428 246916.9 65 21276.8 67.2 gpt/disk02-live 8379313 24798 13.5 65 21277.7 78.8 gpt/disk03-live 5372306 24774 14.2 64 21277.8 77.8 gpt/disk04-live 10355291 24741 15.9 63 21277.7 79.8 gpt/disk05-live 4380316 24807 13.2 64 21288.0 77.2 gpt/disk07-live $ zpool ol iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 8.08T 4.60T364161 41.7M 7.94M storage 8.08T 4.60T926133 112M 5.91M storage 8.08T 4.60T738164 89.0M 9.75M storage 8.08T 4.60T 1.18K179 146M 8.10M storage 8.08T 4.60T 1.09K193 135M 9.94M storage 8.08T 4.60T 1010185 122M 8.68M storage 8.08T 4.60T 1.06K184 131M 9.65M storage 8.08T 4.60T867178 105M 11.8M storage 8.08T 4.60T 1.06K198 131M 12.0M storage 8.08T 4.60T 1.06K185 131M 12.4M Yeterday's write bandwidth was more 80-90M. It's down, a lot. I'll look closer this evening. mm DÅa 4. 10. 2010 4:06, Artem Belevich wrote / napÃsal(a): On Sun, Oct 3, 2010 at 6:11 PM, Dan Langille d...@langille.org wrote: I'm rerunning my test after I had a drive go offline[1]. But I'm not getting anything like the previous test: time zfs send storage/bac...@transfer | mbuffer | zfs receive storage/compressed/bacula-buffer $ zpool iostat 10 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 6.83T 5.86T 8 31 1.00M 2.11M storage 6.83T 5.86T207481 25.7M 17.8M It may be worth checking individual disk activity using gstat -f 'da.$' Some time back I had one drive that was noticeably slower than the rest of the drives in RAID-Z2 vdev and was holding everything back. SMART looked OK, there were no obvious errors and yet performance was much worse than what I'd expect. gstat clearly showed that one drive was almost constantly busy with much lower number of reads and writes per second than its peers. Perhaps previously fast transfer rates were due to caching effects. I.e. if all metadata already made it into ARC, subsequent zfs send commands would avoid a lot of random seeks and would show much better throughput. --Artem ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: out of HDD space - zfs degraded
On 10/4/2010 7:19 AM, Alexander Leidinger wrote: Quoting Dan Langille d...@langille.org (from Sun, 03 Oct 2010 08:08:19 -0400): Overnight, the following appeared in /var/log/messages: Oct 2 21:56:46 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103157760 size=1024 Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103159808 size=1024 [...] Given the outage from yesterday when ada0 was offline for several hours, I'm guessing that checksum mismatches on that drive are expected. Yes, /dev/gpt/disk06-live == ada0. If you have the possibility to run a scrub of the pool, there should be no additional checksum errors accouring *after* the scrub is *finished*. If checksum errors still appear on this disk after the scrub is finished, you should have a look at the hardware (cable/disk) and take appropriate replacement actions. For the record, -just finished a scrub. $ zpool status pool: storage state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 13h48m with 0 errors on Mon Oct 4 16:54:15 2010 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live ONLINE 0 0 141K 3.47G repaired gpt/disk07-live ONLINE 0 0 0 errors: No known data errors -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
On 10/4/2010 2:10 PM, Jeremy Chadwick wrote: On Mon, Oct 04, 2010 at 01:31:07PM -0400, Dan Langille wrote: On Mon, October 4, 2010 3:27 am, Martin Matuska wrote: Try using zfs receive with the -v flag (gives you some stats at the end): # zfs send storage/bac...@transfer | zfs receive -v storage/compressed/bacula And use the following sysctl (you may set that in /boot/loader.conf, too): # sysctl vfs.zfs.txg.write_limit_override=805306368 I have good results with the 768MB writelimit on systems with at least 8GB RAM. With 4GB ram, you might want to try to set the TXG write limit to a lower threshold (e.g. 256MB): # sysctl vfs.zfs.txg.write_limit_override=268435456 You can experiment with that setting to get the best results on your system. A value of 0 means using calculated default (which is very high). I will experiment with the above. In the meantime: During the operation you can observe what your disks actually do: a) via ZFS pool I/O statistics: # zpool iostat -v 1 b) via GEOM: # gstat -a The following output was produced while the original copy was underway. $ sudo gstat -a -b -I 20s dT: 20.002s w: 20.000s L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 7452387 248019.5 64 21287.1 79.4 ada0 7452387 248019.5 64 21287.2 79.4 ada0p1 4492427 246556.7 64 21286.6 63.0 ada1 4494428 246916.9 65 21276.6 66.9 ada2 8379313 24798 13.5 65 21277.5 78.6 ada3 5372306 24774 14.2 64 21277.5 77.6 ada4 10355291 24741 15.9 63 21277.4 79.6 ada5 4380316 24807 13.2 64 21287.7 77.0 ada6 7452387 248019.5 64 21287.4 79.7 gpt/disk06-live 4492427 246556.7 64 21286.7 63.1 ada1p1 4494428 246916.9 65 21276.6 66.9 ada2p1 8379313 24798 13.5 65 21277.6 78.6 ada3p1 5372306 24774 14.2 64 21277.6 77.6 ada4p1 10355291 24741 15.9 63 21277.5 79.6 ada5p1 4380316 24807 13.2 64 21287.8 77.0 ada6p1 4492427 246556.8 64 21286.9 63.4 gpt/disk01-live 4494428 246916.9 65 21276.8 67.2 gpt/disk02-live 8379313 24798 13.5 65 21277.7 78.8 gpt/disk03-live 5372306 24774 14.2 64 21277.8 77.8 gpt/disk04-live 10355291 24741 15.9 63 21277.7 79.8 gpt/disk05-live 4380316 24807 13.2 64 21288.0 77.2 gpt/disk07-live $ zpool ol iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 8.08T 4.60T364161 41.7M 7.94M storage 8.08T 4.60T926133 112M 5.91M storage 8.08T 4.60T738164 89.0M 9.75M storage 8.08T 4.60T 1.18K179 146M 8.10M storage 8.08T 4.60T 1.09K193 135M 9.94M storage 8.08T 4.60T 1010185 122M 8.68M storage 8.08T 4.60T 1.06K184 131M 9.65M storage 8.08T 4.60T867178 105M 11.8M storage 8.08T 4.60T 1.06K198 131M 12.0M storage 8.08T 4.60T 1.06K185 131M 12.4M Yeterday's write bandwidth was more 80-90M. It's down, a lot. I'll look closer this evening. mm Dňa 4. 10. 2010 4:06, Artem Belevich wrote / napÃsal(a): On Sun, Oct 3, 2010 at 6:11 PM, Dan Langilled...@langille.org wrote: I'm rerunning my test after I had a drive go offline[1]. But I'm not getting anything like the previous test: time zfs send storage/bac...@transfer | mbuffer | zfs receive storage/compressed/bacula-buffer $ zpool iostat 10 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 6.83T 5.86T 8 31 1.00M 2.11M storage 6.83T 5.86T207481 25.7M 17.8M It may be worth checking individual disk activity using gstat -f 'da.$' Some time back I had one drive that was noticeably slower than the rest of the drives in RAID-Z2 vdev and was holding everything back. SMART looked OK, there were no obvious errors and yet performance was much worse than what I'd expect. gstat clearly showed that one drive was almost constantly busy with much lower number of reads and writes per second than its peers. Perhaps previously fast transfer rates were due to caching effects. I.e. if all metadata already made it into ARC, subsequent zfs send commands would avoid a lot of random seeks and would show much better throughput. --Artem Please read all of the following items before responding in-line. Some are just informational items for other people reading
Re: out of HDD space - zfs degraded
On 10/2/2010 10:04 PM, Dan Langille wrote: After a 'shutdown -p now', it was about 20 minutes before I went and powered it up (I was on minecraft). The box came back with the missing HDD: $ zpool status storage pool: storage state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live ONLINE 0 0 12 gpt/disk07-live ONLINE 0 0 0 Overnight, the following appeared in /var/log/messages: Oct 2 21:56:46 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103157760 size=1024 Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103159808 size=1024 Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103164416 size=512 Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=123103162880 size=512 Oct 2 23:00:58 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1875352305152 size=1024 Oct 3 02:44:55 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1914424351744 size=512 Oct 3 03:01:01 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1875175041536 size=512 Oct 3 03:01:02 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1886724290048 size=1024 Oct 3 04:05:44 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1953680806912 size=512 Oct 3 04:05:44 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1953680807424 size=512 Oct 3 04:05:44 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1953680807936 size=512 Oct 3 04:05:44 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=1953680808448 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172631552 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172729856 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172730368 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172730880 size=512 Oct 3 04:59:38 kraken root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/disk06-live offset=98172731392 size=512 Given the outage from yesterday when ada0 was offline for several hours, I'm guessing that checksum mismatches on that drive are expected. Yes, /dev/gpt/disk06-live == ada0. The current zpool status is: $ zpool status pool: storage state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 0h1m with 0 errors on Sun Oct 3 00:01:17 2010 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live ONLINE 0 025 778M resilvered gpt/disk07-live ONLINE 0 0 0 errors: No known data errors -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
On 10/1/2010 9:32 PM, Dan Langille wrote: On 10/1/2010 7:00 PM, Artem Belevich wrote: On Fri, Oct 1, 2010 at 3:49 PM, Dan Langilled...@langille.org wrote: FYI: this is all on the same box. In one of the previous emails you've used this command line: # mbuffer -s 128k -m 1G -I 9090 | zfs receive You've used mbuffer in network client mode. I assumed that you did do your transfer over network. If you're running send/receive locally just pipe the data through mbuffer -- zfs send|mbuffer|zfs receive As soon as I opened this email I knew what it would say. # time zfs send storage/bac...@transfer | mbuffer | zfs receive storage/compressed/bacula-mbuffer in @ 197 MB/s, out @ 205 MB/s, 1749 MB total, buffer 0% full $ zpool iostat 10 10 capacity operations bandwidth pool used avail read write read write -- - - - - - - storage 9.78T 2.91T 1.11K 336 92.0M 17.3M storage 9.78T 2.91T 769 436 95.5M 30.5M storage 9.78T 2.91T 797 853 98.9M 78.5M storage 9.78T 2.91T 865 962 107M 78.0M storage 9.78T 2.91T 828 881 103M 82.6M storage 9.78T 2.90T 1023 1.12K 127M 91.0M storage 9.78T 2.90T 1.01K 1.01K 128M 89.3M storage 9.79T 2.90T 962 1.08K 119M 89.1M storage 9.79T 2.90T 1.09K 1.25K 139M 67.8M Big difference. :) I'm rerunning my test after I had a drive go offline[1]. But I'm not getting anything like the previous test: time zfs send storage/bac...@transfer | mbuffer | zfs receive storage/compressed/bacula-buffer $ zpool iostat 10 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 6.83T 5.86T 8 31 1.00M 2.11M storage 6.83T 5.86T207481 25.7M 17.8M storage 6.83T 5.86T220516 27.4M 17.2M storage 6.83T 5.86T221523 27.5M 21.0M storage 6.83T 5.86T198430 24.5M 20.4M storage 6.83T 5.86T248528 30.8M 26.7M storage 6.83T 5.86T273508 33.9M 22.6M storage 6.83T 5.86T331499 41.1M 22.7M storage 6.83T 5.86T424662 52.6M 34.7M storage 6.83T 5.86T413605 51.3M 36.7M [1] - http://docs.freebsd.org/cgi/mid.cgi?4CA73702.5080203 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
out of HDD space - zfs degraded
online and the zpool stabilized? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: out of HDD space - zfs degraded
On 10/2/2010 10:19 AM, Jeremy Chadwick wrote: On Sat, Oct 02, 2010 at 09:43:30AM -0400, Dan Langille wrote: Overnight I was running a zfs send | zfs receive (both within the same system / zpool). The system ran out of space, a drive went off line, and the system is degraded. This is a raidz2 array running on FreeBSD 8.1-STABLE #0: Sat Sep 18 23:43:48 EDT 2010. The following logs are also available at http://www.langille.org/tmp/zfs-space.txt- no line wrapping This is what was running: # time zfs send storage/bac...@transfer | mbuffer | zfs receive storage/compressed/bacula-mbuffer in @ 0.0 kB/s, out @ 0.0 kB/s, 3670 GB total, buffer 100% fullcannot receive new filesystem stream: out of space mbuffer: error: outputThread: error writing tostdout at offset 0x395917c4000: Broken pipe summary: 3670 GByte in 10 h 40 min 97.8 MB/s mbuffer: warning: error during output tostdout: Broken pipe warning: cannot send 'storage/bac...@transfer': Broken pipe real640m48.423s user8m52.660s sys 211m40.862s Looking in the logs, I see this: Oct 2 00:50:53 kraken kernel: (ada0:siisch0:0:0:0): lost device Oct 2 00:50:54 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:54 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:54 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:55 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:55 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:55 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:56 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:56 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:56 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:57 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:57 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:57 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:58 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:58 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:58 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=270336 size=8192 error=6 Oct 2 00:50:59 kraken kernel: (ada0:siisch0:0:0:0): Synchronize cache failed Oct 2 00:50:59 kraken kernel: (ada0:siisch0:0:0:0): removing device entry Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=2000187564032 size=8192 error=6 Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=2000187826176 size=8192 error=6 $ zpool status pool: storage state: DEGRADED scrub: scrub in progress for 5h32m, 17.16% done, 26h44m to go config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live REMOVED 0 0 0 gpt/disk07-live ONLINE 0 0 0 $ zfs list NAMEUSED AVAIL REFER MOUNTPOINT storage6.97T 1.91T 1.75G /storage storage/bacula 4.72T 1.91T 4.29T /storage/bacula storage/compressed 2.25T 1.91T 46.9K /storage/compressed storage/compressed/bacula 2.25T 1.91T 42.7K /storage/compressed/bacula storage/pgsql 5.50G 1.91T 5.50G /storage/pgsql $ sudo camcontrol devlist Password: Hitachi HDS722020ALA330 JKAOA28A at scbus2 target 0 lun 0 (pass1,ada1) Hitachi HDS722020ALA330 JKAOA28A at scbus3 target 0 lun 0 (pass2,ada2) Hitachi HDS722020ALA330 JKAOA28A at scbus4 target 0 lun 0 (pass3,ada3) Hitachi HDS722020ALA330 JKAOA28A at scbus5 target 0 lun 0 (pass4,ada4) Hitachi HDS722020ALA330 JKAOA28A at scbus6 target 0 lun 0 (pass5,ada5) Hitachi HDS722020ALA330 JKAOA28A at scbus7 target 0 lun 0 (pass6,ada6) ST380815AS 4.AAB at scbus8 target 0 lun 0 (pass7,ada7) TSSTcorp CDDVDW SH-S223C SB01 at scbus9 target 0 lun 0 (cd0,pass8) WDC WD1600AAJS-75M0A0 02.03E02at scbus10 target 0 lun 0 (pass9,ada8) I'm not yet sure if the drive is fully dead or not. This is not a hot-swap box. It looks to me like the disk labelled gpt/disk06-live literally stopped responding to commands. The errors you see are coming from the OS and the siis(4) controller, and both indicate the actual hard
Re: out of HDD space - zfs degraded
On 10/2/2010 6:36 PM, Jeremy Chadwick wrote: On Sat, Oct 02, 2010 at 06:09:25PM -0400, Dan Langille wrote: On 10/2/2010 10:19 AM, Jeremy Chadwick wrote: On Sat, Oct 02, 2010 at 09:43:30AM -0400, Dan Langille wrote: Overnight I was running a zfs send | zfs receive (both within the same system / zpool). The system ran out of space, a drive went off line, and the system is degraded. This is a raidz2 array running on FreeBSD 8.1-STABLE #0: Sat Sep 18 23:43:48 EDT 2010. The following logs are also available at http://www.langille.org/tmp/zfs-space.txt- no line wrapping This is what was running: # time zfs send storage/bac...@transfer | mbuffer | zfs receive storage/compressed/bacula-mbuffer in @ 0.0 kB/s, out @ 0.0 kB/s, 3670 GB total, buffer 100% fullcannot receive new filesystem stream: out of space mbuffer: error: outputThread: error writing tostdout at offset 0x395917c4000: Broken pipe summary: 3670 GByte in 10 h 40 min 97.8 MB/s mbuffer: warning: error during output tostdout: Broken pipe warning: cannot send 'storage/bac...@transfer': Broken pipe real640m48.423s user8m52.660s sys 211m40.862s Looking in the logs, I see this: Oct 2 00:50:53 kraken kernel: (ada0:siisch0:0:0:0): lost device Oct 2 00:50:54 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:54 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:54 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:55 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:55 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:55 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:56 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:56 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:56 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:57 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:57 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:57 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:58 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:58 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:58 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=270336 size=8192 error=6 Oct 2 00:50:59 kraken kernel: (ada0:siisch0:0:0:0): Synchronize cache failed Oct 2 00:50:59 kraken kernel: (ada0:siisch0:0:0:0): removing device entry Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=2000187564032 size=8192 error=6 Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=2000187826176 size=8192 error=6 $ zpool status pool: storage state: DEGRADED scrub: scrub in progress for 5h32m, 17.16% done, 26h44m to go config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live REMOVED 0 0 0 gpt/disk07-live ONLINE 0 0 0 $ zfs list NAMEUSED AVAIL REFER MOUNTPOINT storage6.97T 1.91T 1.75G /storage storage/bacula 4.72T 1.91T 4.29T /storage/bacula storage/compressed 2.25T 1.91T 46.9K /storage/compressed storage/compressed/bacula 2.25T 1.91T 42.7K /storage/compressed/bacula storage/pgsql 5.50G 1.91T 5.50G /storage/pgsql $ sudo camcontrol devlist Password: Hitachi HDS722020ALA330 JKAOA28Aat scbus2 target 0 lun 0 (pass1,ada1) Hitachi HDS722020ALA330 JKAOA28Aat scbus3 target 0 lun 0 (pass2,ada2) Hitachi HDS722020ALA330 JKAOA28Aat scbus4 target 0 lun 0 (pass3,ada3) Hitachi HDS722020ALA330 JKAOA28Aat scbus5 target 0 lun 0 (pass4,ada4) Hitachi HDS722020ALA330 JKAOA28Aat scbus6 target 0 lun 0 (pass5,ada5) Hitachi HDS722020ALA330 JKAOA28Aat scbus7 target 0 lun 0 (pass6,ada6) ST380815AS 4.AAB at scbus8 target 0 lun 0 (pass7,ada7) TSSTcorp CDDVDW SH-S223C SB01 at scbus9 target 0 lun 0 (cd0,pass8) WDC WD1600AAJS-75M0A0 02.03E02 at scbus10 target 0 lun 0 (pass9,ada8) I'm not yet sure if the drive is fully dead or not. This is not a hot-swap box. It looks to me like the disk labelled gpt/disk06-live literally stopped responding
Re: out of HDD space - zfs degraded
On 10/2/2010 7:50 PM, Jeremy Chadwick wrote: On Sat, Oct 02, 2010 at 07:23:16PM -0400, Dan Langille wrote: On 10/2/2010 6:36 PM, Jeremy Chadwick wrote: On Sat, Oct 02, 2010 at 06:09:25PM -0400, Dan Langille wrote: On 10/2/2010 10:19 AM, Jeremy Chadwick wrote: On Sat, Oct 02, 2010 at 09:43:30AM -0400, Dan Langille wrote: Overnight I was running a zfs send | zfs receive (both within the same system / zpool). The system ran out of space, a drive went off line, and the system is degraded. This is a raidz2 array running on FreeBSD 8.1-STABLE #0: Sat Sep 18 23:43:48 EDT 2010. The following logs are also available at http://www.langille.org/tmp/zfs-space.txt- no line wrapping This is what was running: # time zfs send storage/bac...@transfer | mbuffer | zfs receive storage/compressed/bacula-mbuffer in @ 0.0 kB/s, out @ 0.0 kB/s, 3670 GB total, buffer 100% fullcannot receive new filesystem stream: out of space mbuffer: error: outputThread: error writing tostdoutat offset 0x395917c4000: Broken pipe summary: 3670 GByte in 10 h 40 min 97.8 MB/s mbuffer: warning: error during output tostdout: Broken pipe warning: cannot send 'storage/bac...@transfer': Broken pipe real640m48.423s user8m52.660s sys 211m40.862s Looking in the logs, I see this: Oct 2 00:50:53 kraken kernel: (ada0:siisch0:0:0:0): lost device Oct 2 00:50:54 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:54 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:54 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:55 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:55 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:55 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:56 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:56 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:56 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:57 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:57 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:57 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:58 kraken kernel: siisch0: Timeout on slot 30 Oct 2 00:50:58 kraken kernel: siisch0: siis_timeout is 0004 ss 4000 rs 4000 es sts 801f0040 serr Oct 2 00:50:58 kraken kernel: siisch0: Error while READ LOG EXT Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=270336 size=8192 error=6 Oct 2 00:50:59 kraken kernel: (ada0:siisch0:0:0:0): Synchronize cache failed Oct 2 00:50:59 kraken kernel: (ada0:siisch0:0:0:0): removing device entry Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=2000187564032 size=8192 error=6 Oct 2 00:50:59 kraken root: ZFS: vdev I/O failure, zpool=storage path=/dev/gpt/disk06-live offset=2000187826176 size=8192 error=6 $ zpool status pool: storage state: DEGRADED scrub: scrub in progress for 5h32m, 17.16% done, 26h44m to go config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live REMOVED 0 0 0 gpt/disk07-live ONLINE 0 0 0 $ zfs list NAMEUSED AVAIL REFER MOUNTPOINT storage6.97T 1.91T 1.75G /storage storage/bacula 4.72T 1.91T 4.29T /storage/bacula storage/compressed 2.25T 1.91T 46.9K /storage/compressed storage/compressed/bacula 2.25T 1.91T 42.7K /storage/compressed/bacula storage/pgsql 5.50G 1.91T 5.50G /storage/pgsql $ sudo camcontrol devlist Password: Hitachi HDS722020ALA330 JKAOA28A at scbus2 target 0 lun 0 (pass1,ada1) Hitachi HDS722020ALA330 JKAOA28A at scbus3 target 0 lun 0 (pass2,ada2) Hitachi HDS722020ALA330 JKAOA28A at scbus4 target 0 lun 0 (pass3,ada3) Hitachi HDS722020ALA330 JKAOA28A at scbus5 target 0 lun 0 (pass4,ada4) Hitachi HDS722020ALA330 JKAOA28A at scbus6 target 0 lun 0 (pass5,ada5) Hitachi HDS722020ALA330 JKAOA28A at scbus7 target 0 lun 0 (pass6,ada6) ST380815AS 4.AABat scbus8 target 0 lun 0 (pass7,ada7) TSSTcorp CDDVDW SH-S223C SB01 at scbus9 target 0 lun 0 (cd0,pass8) WDC WD1600AAJS-75M0A0 02.03E02 at scbus10 target 0 lun 0 (pass9,ada8) I'm not yet sure if the drive is fully dead
Re: zfs send/receive: is this slow?
On Wed, September 29, 2010 3:57 pm, Artem Belevich wrote: On Wed, Sep 29, 2010 at 11:04 AM, Dan Langille d...@langille.org wrote: It's taken about 15 hours to copy 800GB. I'm sure there's some tuning I can do. The system is now running: # zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula Try piping zfs data through mbuffer (misc/mbuffer in ports). I've found that it does help a lot to smooth out data flow and increase send/receive throughput even when send/receive happens on the same host. Run it with a buffer large enough to accommodate few seconds worth of write throughput for your target disks. Here's an example: http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/ I'm failing. In one session: # mbuffer -s 128k -m 1G -I 9090 | zfs receive storage/compressed/bacula-mbuffer Assertion failed: ((err == 0) (bsize == sizeof(rcvsize))), function openNetworkInput, file mbuffer.c, line 1358. cannot receive: failed to read from stream In the other session: # time zfs send storage/bac...@transfer | mbuffer -s 128k -m 1G -O 10.55.0.44:9090 Assertion failed: ((err == 0) (bsize == sizeof(sndsize))), function openNetworkOutput, file mbuffer.c, line 897. warning: cannot send 'storage/bac...@transfer': Broken pipe Abort trap: 6 (core dumped) real0m17.709s user0m0.000s sys 0m2.502s -- Dan Langille -- http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
On Fri, October 1, 2010 11:45 am, Dan Langille wrote: On Wed, September 29, 2010 3:57 pm, Artem Belevich wrote: On Wed, Sep 29, 2010 at 11:04 AM, Dan Langille d...@langille.org wrote: It's taken about 15 hours to copy 800GB. I'm sure there's some tuning I can do. The system is now running: # zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula Try piping zfs data through mbuffer (misc/mbuffer in ports). I've found that it does help a lot to smooth out data flow and increase send/receive throughput even when send/receive happens on the same host. Run it with a buffer large enough to accommodate few seconds worth of write throughput for your target disks. Here's an example: http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/ I'm failing. In one session: # mbuffer -s 128k -m 1G -I 9090 | zfs receive storage/compressed/bacula-mbuffer Assertion failed: ((err == 0) (bsize == sizeof(rcvsize))), function openNetworkInput, file mbuffer.c, line 1358. cannot receive: failed to read from stream In the other session: # time zfs send storage/bac...@transfer | mbuffer -s 128k -m 1G -O 10.55.0.44:9090 Assertion failed: ((err == 0) (bsize == sizeof(sndsize))), function openNetworkOutput, file mbuffer.c, line 897. warning: cannot send 'storage/bac...@transfer': Broken pipe Abort trap: 6 (core dumped) real0m17.709s user0m0.000s sys 0m2.502s My installed mbuffer was out of date. After an upgrade: # mbuffer -s 128k -m 1G -I 9090 | zfs receive storage/compressed/bacula-mbuffer mbuffer: warning: unable to set socket buffer size: No buffer space available in @ 0.0 kB/s, out @ 0.0 kB/s, 1897 MB total, buffer 100% full # time zfs send storage/bac...@transfer | mbuffer -s 128k -m 1G -O ::1:9090 mbuffer: warning: unable to set socket buffer size: No buffer space available in @ 4343 kB/s, out @ 2299 kB/s, 3104 MB total, buffer 85% full -- Dan Langille -- http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
On Wed, September 29, 2010 2:04 pm, Dan Langille wrote: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 7.67T 5.02T358 38 43.1M 1.96M storage 7.67T 5.02T317475 39.4M 30.9M storage 7.67T 5.02T357533 44.3M 34.4M storage 7.67T 5.02T371556 46.0M 35.8M storage 7.67T 5.02T313521 38.9M 28.7M storage 7.67T 5.02T309457 38.4M 30.4M storage 7.67T 5.02T388589 48.2M 37.8M storage 7.67T 5.02T377581 46.8M 36.5M storage 7.67T 5.02T310559 38.4M 30.4M storage 7.67T 5.02T430611 53.4M 41.3M Now that I'm using mbuffer: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 9.96T 2.73T 2.01K131 151M 6.72M storage 9.96T 2.73T615515 76.3M 33.5M storage 9.96T 2.73T360492 44.7M 33.7M storage 9.96T 2.73T388554 48.3M 38.4M storage 9.96T 2.73T403562 50.1M 39.6M storage 9.96T 2.73T313468 38.9M 28.0M storage 9.96T 2.73T462677 57.3M 22.4M storage 9.96T 2.73T383581 47.5M 21.6M storage 9.96T 2.72T142571 17.7M 15.4M storage 9.96T 2.72T 80598 10.0M 18.8M storage 9.96T 2.72T718503 89.1M 13.6M storage 9.96T 2.72T594517 73.8M 14.1M storage 9.96T 2.72T367528 45.6M 15.1M storage 9.96T 2.72T338520 41.9M 16.4M storage 9.96T 2.72T348499 43.3M 21.5M storage 9.96T 2.72T398553 49.4M 14.4M storage 9.96T 2.72T346481 43.0M 6.78M If anything, it's slower. The above was without -s 128. The following used that setting: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 9.78T 2.91T 1.98K137 149M 6.92M storage 9.78T 2.91T761577 94.4M 42.6M storage 9.78T 2.91T462411 57.4M 24.6M storage 9.78T 2.91T492497 61.1M 27.6M storage 9.78T 2.91T632446 78.5M 22.5M storage 9.78T 2.91T554414 68.7M 21.8M storage 9.78T 2.91T459434 57.0M 31.4M storage 9.78T 2.91T398570 49.4M 32.7M storage 9.78T 2.91T338495 41.9M 26.5M storage 9.78T 2.91T358526 44.5M 33.3M storage 9.78T 2.91T385555 47.8M 39.8M storage 9.78T 2.91T271453 33.6M 23.3M storage 9.78T 2.91T270456 33.5M 28.8M ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
FYI: this is all on the same box. -- Dan Langille http://langille.org/ On Oct 1, 2010, at 5:56 PM, Artem Belevich fbsdl...@src.cx wrote: Hmm. It did help me a lot when I was replicating ~2TB worth of data over GigE. Without mbuffer things were roughly in the ballpark of your numbers. With mbuffer I've got around 100MB/s. Assuming that you have two boxes connected via ethernet, it would be good to check that nobody generates PAUSE frames. Some time back I've discovered that el-cheapo switch I've been using for some reason could not keep up with traffic bursts and generated tons of PAUSE frames that severely limited throughput. If you're using Intel adapters, check xon/xoff counters in sysctl dev.em.0.mac_stats. If you see them increasing, that may explain slow speed. If you have a switch between your boxes, try bypassing it and connect boxes directly. --Artem On Fri, Oct 1, 2010 at 11:51 AM, Dan Langille d...@langille.org wrote: On Wed, September 29, 2010 2:04 pm, Dan Langille wrote: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 7.67T 5.02T358 38 43.1M 1.96M storage 7.67T 5.02T317475 39.4M 30.9M storage 7.67T 5.02T357533 44.3M 34.4M storage 7.67T 5.02T371556 46.0M 35.8M storage 7.67T 5.02T313521 38.9M 28.7M storage 7.67T 5.02T309457 38.4M 30.4M storage 7.67T 5.02T388589 48.2M 37.8M storage 7.67T 5.02T377581 46.8M 36.5M storage 7.67T 5.02T310559 38.4M 30.4M storage 7.67T 5.02T430611 53.4M 41.3M Now that I'm using mbuffer: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 9.96T 2.73T 2.01K131 151M 6.72M storage 9.96T 2.73T615515 76.3M 33.5M storage 9.96T 2.73T360492 44.7M 33.7M storage 9.96T 2.73T388554 48.3M 38.4M storage 9.96T 2.73T403562 50.1M 39.6M storage 9.96T 2.73T313468 38.9M 28.0M storage 9.96T 2.73T462677 57.3M 22.4M storage 9.96T 2.73T383581 47.5M 21.6M storage 9.96T 2.72T142571 17.7M 15.4M storage 9.96T 2.72T 80598 10.0M 18.8M storage 9.96T 2.72T718503 89.1M 13.6M storage 9.96T 2.72T594517 73.8M 14.1M storage 9.96T 2.72T367528 45.6M 15.1M storage 9.96T 2.72T338520 41.9M 16.4M storage 9.96T 2.72T348499 43.3M 21.5M storage 9.96T 2.72T398553 49.4M 14.4M storage 9.96T 2.72T346481 43.0M 6.78M If anything, it's slower. The above was without -s 128. The following used that setting: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 9.78T 2.91T 1.98K137 149M 6.92M storage 9.78T 2.91T761577 94.4M 42.6M storage 9.78T 2.91T462411 57.4M 24.6M storage 9.78T 2.91T492497 61.1M 27.6M storage 9.78T 2.91T632446 78.5M 22.5M storage 9.78T 2.91T554414 68.7M 21.8M storage 9.78T 2.91T459434 57.0M 31.4M storage 9.78T 2.91T398570 49.4M 32.7M storage 9.78T 2.91T338495 41.9M 26.5M storage 9.78T 2.91T358526 44.5M 33.3M storage 9.78T 2.91T385555 47.8M 39.8M storage 9.78T 2.91T271453 33.6M 23.3M storage 9.78T 2.91T270456 33.5M 28.8M ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
On 10/1/2010 7:00 PM, Artem Belevich wrote: On Fri, Oct 1, 2010 at 3:49 PM, Dan Langilled...@langille.org wrote: FYI: this is all on the same box. In one of the previous emails you've used this command line: # mbuffer -s 128k -m 1G -I 9090 | zfs receive You've used mbuffer in network client mode. I assumed that you did do your transfer over network. If you're running send/receive locally just pipe the data through mbuffer -- zfs send|mbuffer|zfs receive As soon as I opened this email I knew what it would say. # time zfs send storage/bac...@transfer | mbuffer | zfs receive storage/compressed/bacula-mbuffer in @ 197 MB/s, out @ 205 MB/s, 1749 MB total, buffer 0% full $ zpool iostat 10 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 9.78T 2.91T 1.11K336 92.0M 17.3M storage 9.78T 2.91T769436 95.5M 30.5M storage 9.78T 2.91T797853 98.9M 78.5M storage 9.78T 2.91T865962 107M 78.0M storage 9.78T 2.91T828881 103M 82.6M storage 9.78T 2.90T 1023 1.12K 127M 91.0M storage 9.78T 2.90T 1.01K 1.01K 128M 89.3M storage 9.79T 2.90T962 1.08K 119M 89.1M storage 9.79T 2.90T 1.09K 1.25K 139M 67.8M Big difference. :) --Artem -- Dan Langille http://langille.org/ On Oct 1, 2010, at 5:56 PM, Artem Belevichfbsdl...@src.cx wrote: Hmm. It did help me a lot when I was replicating ~2TB worth of data over GigE. Without mbuffer things were roughly in the ballpark of your numbers. With mbuffer I've got around 100MB/s. Assuming that you have two boxes connected via ethernet, it would be good to check that nobody generates PAUSE frames. Some time back I've discovered that el-cheapo switch I've been using for some reason could not keep up with traffic bursts and generated tons of PAUSE frames that severely limited throughput. If you're using Intel adapters, check xon/xoff counters in sysctl dev.em.0.mac_stats. If you see them increasing, that may explain slow speed. If you have a switch between your boxes, try bypassing it and connect boxes directly. --Artem On Fri, Oct 1, 2010 at 11:51 AM, Dan Langilled...@langille.org wrote: On Wed, September 29, 2010 2:04 pm, Dan Langille wrote: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 7.67T 5.02T358 38 43.1M 1.96M storage 7.67T 5.02T317475 39.4M 30.9M storage 7.67T 5.02T357533 44.3M 34.4M storage 7.67T 5.02T371556 46.0M 35.8M storage 7.67T 5.02T313521 38.9M 28.7M storage 7.67T 5.02T309457 38.4M 30.4M storage 7.67T 5.02T388589 48.2M 37.8M storage 7.67T 5.02T377581 46.8M 36.5M storage 7.67T 5.02T310559 38.4M 30.4M storage 7.67T 5.02T430611 53.4M 41.3M Now that I'm using mbuffer: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 9.96T 2.73T 2.01K131 151M 6.72M storage 9.96T 2.73T615515 76.3M 33.5M storage 9.96T 2.73T360492 44.7M 33.7M storage 9.96T 2.73T388554 48.3M 38.4M storage 9.96T 2.73T403562 50.1M 39.6M storage 9.96T 2.73T313468 38.9M 28.0M storage 9.96T 2.73T462677 57.3M 22.4M storage 9.96T 2.73T383581 47.5M 21.6M storage 9.96T 2.72T142571 17.7M 15.4M storage 9.96T 2.72T 80598 10.0M 18.8M storage 9.96T 2.72T718503 89.1M 13.6M storage 9.96T 2.72T594517 73.8M 14.1M storage 9.96T 2.72T367528 45.6M 15.1M storage 9.96T 2.72T338520 41.9M 16.4M storage 9.96T 2.72T348499 43.3M 21.5M storage 9.96T 2.72T398553 49.4M 14.4M storage 9.96T 2.72T346481 43.0M 6.78M If anything, it's slower. The above was without -s 128. The following used that setting: $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 9.78T 2.91T 1.98K137 149M 6.92M storage 9.78T 2.91T761577 94.4M 42.6M storage 9.78T 2.91T462411 57.4M 24.6M storage 9.78T 2.91T492497 61.1M 27.6M storage 9.78T 2.91T632446 78.5M 22.5M storage 9.78T 2.91T554414 68.7M 21.8M storage 9.78T 2.91T459434 57.0M 31.4M storage 9.78T 2.91T398570 49.4M 32.7M storage 9.78T 2.91T338495 41.9M 26.5M storage 9.78T 2.91T358526 44.5M 33.3M storage
zfs send/receive: is this slow?
byte sectors: 16H 63S/T 16383C) ada1 at siisch2 bus 0 scbus2 target 0 lun 0 ada1: Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada2 at siisch3 bus 0 scbus3 target 0 lun 0 ada2: Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada3 at siisch4 bus 0 scbus4 target 0 lun 0 ada3: Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada4 at siisch5 bus 0 scbus5 target 0 lun 0 ada4: Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada5 at siisch6 bus 0 scbus6 target 0 lun 0 ada5: Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada6 at siisch7 bus 0 scbus7 target 0 lun 0 ada6: Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device ada6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada6: Command Queueing enabled ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada7 at ahcich0 bus 0 scbus8 target 0 lun 0 ada7: ST380815AS 4.AAB ATA-7 SATA 2.x device ada7: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada7: Command Queueing enabled ada7: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C) ada8 at ahcich2 bus 0 scbus10 target 0 lun 0 ada8: WDC WD1600AAJS-75M0A0 02.03E02 ATA-8 SATA 2.x device ada8: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada8: Command Queueing enabled ada8: 152587MB (31250 512 byte sectors: 16H 63S/T 16383C) SMP: AP CPU #3 Launched! cd0 at ahcich1 bus 0 scbus9 target 0 lun 0SMP: AP CPU #1 Launched! cd0: TSSTcorp CDDVDW SH-S223C SB01 Removable CD-ROM SCSI-0 device cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)SMP: AP CPU #2 Launched! cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed GEOM_MIRROR: Device mirror/gm0 launched (1/2). GEOM_MIRROR: Device gm0: rebuilding provider ada7. GEOM: mirror/gm0s1: geometry does not match label (16h,63s != 255h,63s). Trying to mount root from ufs:/dev/mirror/gm0s1a WARNING: / was not properly dismounted ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add vfs.zfs.prefetch_disable=0 to /boot/loader.conf. ZFS filesystem version 4 ZFS storage pool version 15 WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted -- Dan Langille -- http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
On 9/29/2010 3:57 PM, Artem Belevich wrote: On Wed, Sep 29, 2010 at 11:04 AM, Dan Langilled...@langille.org wrote: It's taken about 15 hours to copy 800GB. I'm sure there's some tuning I can do. The system is now running: # zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula Try piping zfs data through mbuffer (misc/mbuffer in ports). I've found that it does help a lot to smooth out data flow and increase send/receive throughput even when send/receive happens on the same host. Run it with a buffer large enough to accommodate few seconds worth of write throughput for your target disks. Thanks. I just installed it. I'll use it next time. I don't want to interrupt this one. I'd like to see how long it takes. Then compare. Here's an example: http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/ That looks really good. Thank you. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ACPI Warning: Optional field Pm2ControlBlock has zero address
This this something to be concerned about: ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0x/0x1 (20100331/tbfadt-655) -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ACPI Warning: Optional field Pm2ControlBlock has zero address
On 8/28/2010 8:30 PM, Jeremy Chadwick wrote: On Sat, Aug 28, 2010 at 04:35:58PM -0400, Dan Langille wrote: This this something to be concerned about: ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0x/0x1 (20100331/tbfadt-655) CC'ing freebsd-acpi. OS version is unknown. FreeBSD-Stable 8.1 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: kernel MCA messages
On 8/25/2010 3:11 AM, Andriy Gapon wrote: Have you read the decoded message? Please re-read it. I still recommend reading at least the summary of the RAM ECC research article to make your own judgment about need to replace DRAM. Andriy: What is your interpretation of the decoded message? What is your view on replacing DRAM? What do you conclude from the summary? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: kernel MCA messages
On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff6b0 FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43 FYI, these are occurring every hour, almost to the second. e.g. xx:56:yy, where yy is 09, 10, or 11. Checking logs, I don't see anything that correlates with this point in the hour (i.e 56 minutes past) that doesn't also occur at other times. It seems very odd to occur so regularly. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: kernel MCA messages
On 8/24/2010 7:38 PM, Jeremy Chadwick wrote: On Tue, Aug 24, 2010 at 07:13:23PM -0400, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff6b0 FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43 FYI, these are occurring every hour, almost to the second. e.g. xx:56:yy, where yy is 09, 10, or 11. Checking logs, I don't see anything that correlates with this point in the hour (i.e 56 minutes past) that doesn't also occur at other times. It seems very odd to occur so regularly. 1) Why haven't you replaced the DIMM in Bank 4 -- or better yet, all the DIMMs just to be sure? Do this and see if the problem goes away. If not, no harm done, and you've narrowed it down. For good reason: time and distance. I've not hand the time or opportunity to buy new RAM. Today is Tuesday. The problem appeared about 48 hours ago after upgrading to 8.1 stable from 7.x. The box is in Austin. I'm in Philadelphia. You know the math. ;) When I can get the time to fly to Austin, I will if required. I'm sorry, I'm not meaning to be flippant. I'm just glad I documented as such as I could 4 years ago. 2) What exact manufacturer and model of motherboard is this? If you can provide a link to a User Manual that would be great. This is a box from iXsystems that I obtained back when 6.1-RELEASE was the latest. I know it has four sticks of 2GB. http://www.freebsddiary.org/dual-opteron.php Sadly, many of the links are now invalid. The board is a AccelerTech ATO2161-DC, also known as a RioWorks HDAMA-G. See also: http://www.freebsddiary.org/dual-opteron-dmidecode.txt And we have a close up of the RAM and the m/b: http://www.freebsddiary.org/showpicture.php?id=85 http://www.freebsddiary.org/showpicture.php?id=84 I am quite sure it's very close to this: http://www.accelertech.com/2007/amd_mb/opteron/ato2161i-dc_pic.php With the manual here: http://www.accelertech.com/2007/amd_mb/opteron/ato2161i-dc_manual.php 3) Please go into your system BIOS and find where ECC ChipKill options are available (likely under a Memory, Chipset, or Northbridge section). Please write down and provide here all of the options and what their currently selected values are. 4) Please make sure you're running the latest system BIOS. I've seen on certain Rackable AMD-based systems where Northbridge-related features don't work quite right (at least with Solaris), resulting in atrocious memory performance on the system. A BIOS upgrade solved the problem. 3 4 are just as hard as #1 at the moment. There's a ChipKill feature called ECC BG Scrubbing that's vague in definition, given that it's a background memory scrub that happens at intervals which are unknown to me. Maybe 60 minutes? I don't know. This is why I ask question #3. For John and other devs: I assume the decoded MCA messages indicate with absolute certainty that the ECC error is coming from external DRAM and not, say, bad L1 or L2 cache? Nice question. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: kernel MCA messages
On 8/22/2010 10:05 PM, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff6b0 FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43 And another one: kernel: MCA: Bank 4, Status 0x9459c0014a080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff670 kernel: MCA: Bank 4, Status 0x947ec000d8080a13 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Responder RD Memory kernel: MCA: Address 0xbfa9930 Another one. These errors started appearing after upgrading to 8.1-STABLE from 7.2.. something. I suspect the functionality was added about then -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: kernel MCA messages
On 8/23/2010 7:47 PM, Andriy Gapon wrote: on 24/08/2010 02:43 Dan Langille said the following: On 8/22/2010 10:05 PM, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff6b0 FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43 And another one: kernel: MCA: Bank 4, Status 0x9459c0014a080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff670 kernel: MCA: Bank 4, Status 0x947ec000d8080a13 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Responder RD Memory kernel: MCA: Address 0xbfa9930 Another one. These errors started appearing after upgrading to 8.1-STABLE from 7.2.. something. I suspect the functionality was added about then Please strop the flood :-) Sure. Three emails is hardly a flood. :) Depending on hardware there could be hundreds of such errors per day. Either replace memory modules or learn to live with these messages. I was posting a remark anyone. Thought I'd include one more that I noticed. Surely you can cope. :) -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
kernel MCA messages
What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff6b0 FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: kernel MCA messages
On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff6b0 FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43 And another one: kernel: MCA: Bank 4, Status 0x9459c0014a080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: Address 0x7ff670 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Where's the space? raidz2
: 14680063 start: 2097152 3. Name: mirror/gm0s1d Mediasize: 4294967296 (4.0G) Sectorsize: 512 Mode: r1w1e1 rawtype: 7 length: 4294967296 offset: 7516192768 type: freebsd-ufs index: 4 end: 23068671 start: 14680064 4. Name: mirror/gm0s1e Mediasize: 4294967296 (4.0G) Sectorsize: 512 Mode: r1w1e1 rawtype: 7 length: 4294967296 offset: 11811160064 type: freebsd-ufs index: 5 end: 31457279 start: 23068672 5. Name: mirror/gm0s1f Mediasize: 63920202240 (60G) Sectorsize: 512 Mode: r1w1e1 rawtype: 7 length: 63920202240 offset: 16106127360 type: freebsd-ufs index: 6 end: 156301424 start: 31457280 Consumers: 1. Name: mirror/gm0s1 Mediasize: 80026329600 (75G) Sectorsize: 512 Mode: r5w5e9 Geom name: ada0 fwheads: 16 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada0p1 Mediasize: 2000188135936 (1.8T) Sectorsize: 512 Mode: r1w1e2 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk06-live length: 2000188135936 offset: 1048576 type: freebsd-zfs index: 1 end: 3906619500 start: 2048 Consumers: 1. Name: ada0 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r1w1e3 Geom name: ada6 fwheads: 16 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada6p1 Mediasize: 2000188135936 (1.8T) Sectorsize: 512 Mode: r1w1e2 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk07-live length: 2000188135936 offset: 1048576 type: freebsd-zfs index: 1 end: 3906619500 start: 2048 Consumers: 1. Name: ada6 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r1w1e3 Geom name: ada1 fwheads: 16 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada1p1 Mediasize: 2000188135936 (1.8T) Sectorsize: 512 Mode: r1w1e2 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk01-live length: 2000188135936 offset: 1048576 type: freebsd-zfs index: 1 end: 3906619500 start: 2048 Consumers: 1. Name: ada1 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r1w1e3 Geom name: ada3 fwheads: 16 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada3p1 Mediasize: 2000188135936 (1.8T) Sectorsize: 512 Mode: r1w1e2 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk03-live length: 2000188135936 offset: 1048576 type: freebsd-zfs index: 1 end: 3906619500 start: 2048 Consumers: 1. Name: ada3 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r1w1e3 Geom name: ada4 fwheads: 16 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada4p1 Mediasize: 2000188135936 (1.8T) Sectorsize: 512 Mode: r1w1e2 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk04-live length: 2000188135936 offset: 1048576 type: freebsd-zfs index: 1 end: 3906619500 start: 2048 Consumers: 1. Name: ada4 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r1w1e3 Geom name: ada5 fwheads: 16 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada5p1 Mediasize: 2000188135936 (1.8T) Sectorsize: 512 Mode: r1w1e2 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk05-live length: 2000188135936 offset: 1048576 type: freebsd-zfs index: 1 end: 3906619500 start: 2048 Consumers: 1. Name: ada5 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r1w1e3 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Where's the space? raidz2
On 8/2/2010 7:11 PM, Dan Langille wrote: I recently altered an existing raidz2 pool from using 7 vdevs of about 931G to 1.81TB. In fact, the existing pool used half of each HDD. I then wanted to go to using [almost] all of each HDD. I offline'd each vdev, adjusted the HDD paritions using gpart, then replaced the vdev. After letting the resilver occur, I did the next vdev. The space available after this process did not go up as I expected. I have about 4TB in the pool, not the 8 or 9TB I expected. This fixed it: # df -h FilesystemSizeUsed Avail Capacity Mounted on /dev/mirror/gm0s1a989M508M402M56%/ devfs 1.0K1.0K 0B 100%/dev /dev/mirror/gm0s1e3.9G500K3.6G 0%/tmp /dev/mirror/gm0s1f 58G4.6G 48G 9%/usr /dev/mirror/gm0s1d3.9G156M3.4G 4%/var storage 512G1.7G510G 0%/storage storage/pgsql 512G1.7G510G 0%/storage/pgsql storage/bacula3.7T3.2T510G87%/storage/bacula storage/Retored 510G 39K510G 0%/storage/Retored # zpool export storage # zpool import storage # df -h FilesystemSizeUsed Avail Capacity Mounted on /dev/mirror/gm0s1a989M508M402M56%/ devfs 1.0K1.0K 0B 100%/dev /dev/mirror/gm0s1e3.9G500K3.6G 0%/tmp /dev/mirror/gm0s1f 58G4.6G 48G 9%/usr /dev/mirror/gm0s1d3.9G156M3.4G 4%/var storage 5.0T1.7G5.0T 0%/storage storage/Retored 5.0T 39K5.0T 0%/storage/Retored storage/bacula8.2T3.2T5.0T39%/storage/bacula storage/pgsql 5.0T1.7G5.0T 0%/storage/pgsql -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
zpool destroy causes panic
I'm trying to destroy a zfs array which I recently created. It contains nothing of value. # zpool status pool: storage state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01ONLINE 0 0 0 gpt/disk02ONLINE 0 0 0 gpt/disk03ONLINE 0 0 0 gpt/disk04ONLINE 0 0 0 gpt/disk05ONLINE 0 0 0 /tmp/sparsefile1.img UNAVAIL 0 0 0 corrupted data /tmp/sparsefile2.img UNAVAIL 0 0 0 corrupted data errors: No known data errors Why sparse files? See this post: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1007077+0+archive/2010/freebsd-stable/20100725.freebsd-stable The two tmp files were created via: dd if=/dev/zero of=/tmp/sparsefile1.img bs=1 count=0 oseek=1862g dd if=/dev/zero of=/tmp/sparsefile2.img bs=1 count=0 oseek=1862g And the array created with: zpool create -f storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 \ gpt/disk04 gpt/disk05 /tmp/sparsefile1.img /tmp/sparsefile2.img The -f flag was required to avoid this message: invalid vdev specification use '-f' to override the following errors: mismatched replication level: raidz contains both files and devices I tried to offline one of the sparse files: zpool offline storage /tmp/sparsefile2.img That caused a panic: http://www.langille.org/tmp/zpool-offline-panic.jpg After rebooting, I rm'd both /tmp/sparsefile1.img and /tmp/sparsefile2.img without thinking they were still in the zpool. Now I am unable to destroy the pool. The system panics. I disabled ZFS via /etc/rc.conf, rebooted, recreated the two sparse files, then did a forcestart of zfs. Then I saw: # zpool status pool: storage state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01ONLINE 0 0 0 gpt/disk02ONLINE 0 0 0 gpt/disk03ONLINE 0 0 0 gpt/disk04ONLINE 0 0 0 gpt/disk05ONLINE 0 0 0 /tmp/sparsefile1.img UNAVAIL 0 0 0 corrupted data /tmp/sparsefile2.img UNAVAIL 0 0 0 corrupted data errors: No known data errors Another attempt to destroy the array created a panic. Suggestions as to how to remove this array and get started again? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zpool destroy causes panic
On 7/25/2010 1:58 PM, Dan Langille wrote: I'm trying to destroy a zfs array which I recently created. It contains nothing of value. Oh... I left this out: FreeBSD kraken.unixathome.org 8.0-STABLE FreeBSD 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 d...@kraken.example.org:/usr/obj/usr/src/sys/KRAKEN amd64 # zpool status pool: storage state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 /tmp/sparsefile1.img UNAVAIL 0 0 0 corrupted data /tmp/sparsefile2.img UNAVAIL 0 0 0 corrupted data errors: No known data errors Why sparse files? See this post: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1007077+0+archive/2010/freebsd-stable/20100725.freebsd-stable The two tmp files were created via: dd if=/dev/zero of=/tmp/sparsefile1.img bs=1 count=0 oseek=1862g dd if=/dev/zero of=/tmp/sparsefile2.img bs=1 count=0 oseek=1862g And the array created with: zpool create -f storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 \ gpt/disk04 gpt/disk05 /tmp/sparsefile1.img /tmp/sparsefile2.img The -f flag was required to avoid this message: invalid vdev specification use '-f' to override the following errors: mismatched replication level: raidz contains both files and devices I tried to offline one of the sparse files: zpool offline storage /tmp/sparsefile2.img That caused a panic: http://www.langille.org/tmp/zpool-offline-panic.jpg After rebooting, I rm'd both /tmp/sparsefile1.img and /tmp/sparsefile2.img without thinking they were still in the zpool. Now I am unable to destroy the pool. The system panics. I disabled ZFS via /etc/rc.conf, rebooted, recreated the two sparse files, then did a forcestart of zfs. Then I saw: # zpool status pool: storage state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 /tmp/sparsefile1.img UNAVAIL 0 0 0 corrupted data /tmp/sparsefile2.img UNAVAIL 0 0 0 corrupted data errors: No known data errors Another attempt to destroy the array created a panic. Suggestions as to how to remove this array and get started again? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zpool destroy causes panic
On 7/25/2010 4:37 PM, Volodymyr Kostyrko wrote: 25.07.2010 23:18, Jeremy Chadwick wrote: Footnote: can someone explain to me how ZFS would, upon reboot, know that /tmp/sparsefile[12].img are part of the pool? How would ZFS taste metadata in this situation? Just hacking it. Each ZFS device which is part of the pool tracks all other devices which are part of the pool with their sizes, device ids, last known points. It doesn't know that /tmp/sparsefile[12].img is part of the pool, yet it does know that pool have had some /tmp/sparsefile[12].img before and now they can't be found or current contents doesn't look like ZFS device. Can you try moving current files to /tmp/sparsefile[34].img and then readd them to the pool with zpool replace? One by one please. I do not know what the above paragraph means. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zpool destroy causes panic
On 7/25/2010 4:49 PM, Volodymyr Kostyrko wrote: 25.07.2010 20:58, Dan Langille wrote: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 /tmp/sparsefile1.img UNAVAIL 0 0 0 corrupted data /tmp/sparsefile2.img UNAVAIL 0 0 0 corrupted data 0k, i'll try it from here. UNAVAIL means ZFS can't locate correct vdev for this pool member. Even if this file exists it's not used by ZFS because it lacks ZFS headers/footers. You can (I think so) reinsert empty file to the pool with: # zpool replace storage /tmp/sparsefile1.img /tmp/sparsefile1.img ^- pool ^- ZFS old vdev name ^- current file If you replace both files you can theoretically bring pool to fully consistent state. Also you can use md to convert files to devices: # mdconfig -a -t vnode -f /tmp/sparsefile1.img md0 And you can use md0 with your pool. FYI, tried this, got a panic: errors: No known data errors # mdconfig -a -t vnode -f /tmp/sparsefile1.img md0 # mdconfig -a -t vnode -f /tmp/sparsefile2.img md1 # zpool replace storage /tmp/sparsefile1.img /dev/md0 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zpool destroy causes panic
On 7/25/2010 1:58 PM, Dan Langille wrote: I'm trying to destroy a zfs array which I recently created. It contains nothing of value. # zpool status pool: storage state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 /tmp/sparsefile1.img UNAVAIL 0 0 0 corrupted data /tmp/sparsefile2.img UNAVAIL 0 0 0 corrupted data errors: No known data errors Why sparse files? See this post: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1007077+0+archive/2010/freebsd-stable/20100725.freebsd-stable The two tmp files were created via: dd if=/dev/zero of=/tmp/sparsefile1.img bs=1 count=0 oseek=1862g dd if=/dev/zero of=/tmp/sparsefile2.img bs=1 count=0 oseek=1862g And the array created with: zpool create -f storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 \ gpt/disk04 gpt/disk05 /tmp/sparsefile1.img /tmp/sparsefile2.img The -f flag was required to avoid this message: invalid vdev specification use '-f' to override the following errors: mismatched replication level: raidz contains both files and devices I tried to offline one of the sparse files: zpool offline storage /tmp/sparsefile2.img That caused a panic: http://www.langille.org/tmp/zpool-offline-panic.jpg After rebooting, I rm'd both /tmp/sparsefile1.img and /tmp/sparsefile2.img without thinking they were still in the zpool. Now I am unable to destroy the pool. The system panics. I disabled ZFS via /etc/rc.conf, rebooted, recreated the two sparse files, then did a forcestart of zfs. Then I saw: # zpool status pool: storage state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 /tmp/sparsefile1.img UNAVAIL 0 0 0 corrupted data /tmp/sparsefile2.img UNAVAIL 0 0 0 corrupted data errors: No known data errors Another attempt to destroy the array created a panic. Suggestions as to how to remove this array and get started again? I fixed this by: * reboot zfs_enable=NO in /etc/rc.conf * rm /boot/zfs/zpool.cache * wiping the first and last 16KB of each partition involved in the array Now I'm trying mdconfig instead of sparse files. Making progress, but not all the way there yet. :) -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/24/2010 7:56 AM, Pawel Tyll wrote: Easiest way to create sparse eg 20 GB assuming test.img doesn't exist already You trim posts too much... there is no way to compare without opening another email. Adam wrote: truncate -s 20g test.img ls -sk test.img 1 test.img No no no. Easiest way to do what you want to do: mdconfig -a -t malloc -s 3t -u 0 mdconfig -a -t malloc -s 3t -u 1 In what way is that easier? Now I have /dev/md0 and /dev/md1 as opposed to two sparse files. Just make sure to offline and delete mds ASAP, unless you have 6TB of RAM waiting to be filled ;) - note that with RAIDZ2 you have no redundancy with two fake disks gone, and if going with RAIDZ1 this won't work at all. I can't figure out a safe way (data redundancy all the way) of doing things with only 2 free disks and 3.5TB data - third disk would make things easier, fourth would make them trivial; note that temporary disks 3 and 4 don't have to be 2TB, 1.5TB will do. The lack of redundancy is noted and accepted. Thanks. :) -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 4:11 AM, Dan Langille wrote: On 7/22/2010 4:03 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 3:30 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 2:59 AM, Andrey V. Elsukov wrote: On 22.07.2010 10:32, Dan Langille wrote: I'm not sure of the criteria, but this is what I'm running: atapci0:SiI 3124 SATA300 controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 atapci1:SiI 3124 SATA300 controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 I added ahci_load=YES to loader.conf and rebooted. Now I see: You can add siis_load=YES to loader.conf for SiI 3124. Ahh, thank you. I'm afraid to do that now, before I label my ZFS drives for fear that the ZFS array will be messed up. But I do plan to do that for the system after my plan is implemented. Thank you. :) You may even get hotplug support if you're lucky. :) I just built a box and gave it a spin with the old ata stuff and then with the new (AHCI) stuff. It does perform a bit better and my BIOS claims it supports hotplug with ahci enabled as well... Still have to test that. Well, I don't have anything to support hotplug. All my stuff is internal. http://sphotos.ak.fbcdn.net/hphotos-ak-ash1/hs430.ash1/23778_106837706002537_10289239443_171753_3508473_n.jpg The frankenbox I'm testing on is a retrofitted 1U (it had a scsi backplane, now has none). I am not certain, but I think with 8.1 (which it's running) and all the cam integration stuff, hotplug is possible. Is a special backplane required? I seriously don't know... I'm going to give it a shot though. Oh, you also might get NCQ. Try: [r...@h21 /tmp]# camcontrol tags ada0 (pass0:ahcich0:0:0:0): device openings: 32 # camcontrol tags ada0 (pass0:siisch2:0:0:0): device openings: 31 resending with this: ada{0..4} give the above. # camcontrol tags ada5 (pass5:ahcich0:0:0:0): device openings: 32 That's part of the gmirror array for the OS, along with ad6 which has similar output. And again with this output from one of the ZFS drives: # camcontrol identify ada0 pass0: Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) protocol ATA/ATAPI-8 SATA 2.x device model Hitachi HDS722020ALA330 firmware revision JKAOA28A serial number JK1130YAH531ST WWN 5000cca221d068d5 cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 512, offset 0 LBA supported 268435455 sectors LBA48 supported 3907029168 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM 7200 Feature Support Enable Value Vendor read ahead yes yes write cache yes yes flush cache yes yes overlap no Tagged Command Queuing (TCQ) no no Native Command Queuing (NCQ) yes 32 tags SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management yes no 0/0x00 automatic acoustic management yes no 254/0xFE 128/0x80 media status notification no no power-up in Standby yes no write-read-verify no no 0/0x0 unload no no free-fall no no data set management (TRIM) no Does this support NCQ? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/23/2010 7:42 AM, John Hawkes-Reed wrote: Dan Langille wrote: Thank you to all the helpful discussion. It's been very helpful and educational. Based on the advice and suggestions, I'm going to adjust my original plan as follows. [ ... ] Since I still have the medium-sized ZFS array on the bench, testing this GPT setup seemed like a good idea. bonnie -s 5 The hardware's a Supermicro X8DTL-iF m/b + 12Gb memory, 2x 5502 Xeons, 3x Supermicro USASLP-L8I 3G SAS controllers and 24x Hitachi 2Tb drives. Partitioning the drives with the command-line: gpart add -s 1800G -t freebsd-zfs -l disk00 da0[1] gave the following results with bonnie-64: (Bonnie -r -s 5000|2|5)[2] What test is this? I just installed benchmarks/bonnie and I see no -r option. Right now, I'm trying this: bonnie -s 5 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
gpart -b 34 versus gpart -b 1024
You may have seen my cunning plan: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=883310+0+current/freebsd-stable I've been doing some testing today. The first of my tests comparing partitions aligned on a 4KB boundary are in. I created a 5x2TB zpool, each of which was set up like this: gpart add -b 1024 -s 3906824301 -t freebsd-zfs -l disk01 ada1 or gpart add -b 34 -s 3906824301 -t freebsd-zfs -l disk01 ada1 Repeat for all 5 HDD. And then: zpool create storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 Two Bonnie-64 tests: First, with -b 34: # ~dan/bonnie-64-read-only/Bonnie -s 5000 File './Bonnie.12315', size: 524288 Writing with putc()...done Rewriting...done Writing intelligently...done Reading with getc()...done Reading intelligently...done Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 5 110.6 80.5 115.3 15.1 60.9 8.5 68.8 46.2 326.7 15.3 469 1.4 And then with -b 1024 # ~dan/bonnie-64-read-only/Bonnie -s 5000 File './Bonnie.21095', size: 524288 Writing with putc()...done Rewriting...^[[1~done Writing intelligently...done Reading with getc()...done Reading intelligently...done Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 5 130.9 94.2 118.3 15.6 61.1 8.5 70.1 46.8 241.2 12.7 473 1.4 My reading of this: All M/sec rates are faster except sequential input. Comments? I'll run -s 2 and -s 5 tests overnight and will post them in the morning. Sunday, I'll try creating a 7x2TB array consisting of 5HDD and two sparse files and see how that goes. Here's hoping. Full logs here, including a number of panics: http://beta.freebsddiary.org/zfs-with-gpart.php -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: gpart -b 34 versus gpart -b 1024
On 7/24/2010 10:44 PM, Dan Langille wrote: I'll run -s 2 and -s 5 tests overnight and will post them in the morning. The -s 2 results are in: -b 34: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 20 114.1 82.7 110.9 14.1 62.5 8.9 73.1 48.8 153.6 9.9 195 0.9 -b 1024: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 20 111.0 81.2 114.7 15.1 62.6 8.9 71.9 47.9 135.3 8.7 180 1.1 Hmmm, seems like the first test was better... -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: gpart -b 34 versus gpart -b 1024
On 7/24/2010 10:44 PM, Dan Langille wrote: You may have seen my cunning plan: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=883310+0+current/freebsd-stable I've been doing some testing today. The first of my tests comparing partitions aligned on a 4KB boundary are in. I created a 5x2TB zpool, each of which was set up like this: gpart add -b 1024 -s 3906824301 -t freebsd-zfs -l disk01 ada1 or gpart add -b 34 -s 3906824301 -t freebsd-zfs -l disk01 ada1 Repeat for all 5 HDD. And then: zpool create storage raidz2 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 Two Bonnie-64 tests: First, with -b 34: # ~dan/bonnie-64-read-only/Bonnie -s 5000 File './Bonnie.12315', size: 524288 Writing with putc()...done Rewriting...done Writing intelligently...done Reading with getc()...done Reading intelligently...done Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 5 110.6 80.5 115.3 15.1 60.9 8.5 68.8 46.2 326.7 15.3 469 1.4 And then with -b 1024 # ~dan/bonnie-64-read-only/Bonnie -s 5000 File './Bonnie.21095', size: 524288 Writing with putc()...done Rewriting...^[[1~done Writing intelligently...done Reading with getc()...done Reading intelligently...done Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 5 130.9 94.2 118.3 15.6 61.1 8.5 70.1 46.8 241.2 12.7 473 1.4 My reading of this: All M/sec rates are faster except sequential input. Comments? I'll run -s 2 and -s 5 tests overnight and will post them in the morning. Well, it seems I'm not sleeping yet, so: -b 34 ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 50 113.1 82.4 114.6 15.2 63.4 8.9 72.7 48.2 142.2 9.5 126 0.7 -b 1024 ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- GB M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU M/sec %CPU /sec %CPU 50 110.5 81.0 112.8 15.0 62.8 9.0 72.9 48.5 139.7 9.5 144 0.9 Here, the results aren't much better either... am I not aligning this partition correctly? Missing something else? Or... are they both 4K block aligned? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 9:51 PM, Pawel Tyll wrote: So... the smaller size won't mess things up... If by smaller size you mean smaller size of existing drives/partitions, then growing zpools by replacing smaller vdevs with larger ones is supported and works. What isn't supported is basically everything else: - you can't change number of raid columns (add/remove vdevs from raid) - you can't change number of parity columns (raidz1-2 or 3) - you can't change vdevs to smaller ones, even if pool's free space would permit that. Isn't what I'm doing breaking the last one? Good news is these features are planned/being worked on. If you can attach more drives to your system without disconnecting existing drives, then you can grow your pool pretty much risk-free. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 8:47 PM, Dan Langille wrote: Thank you to all the helpful discussion. It's been very helpful and educational. Based on the advice and suggestions, I'm going to adjust my original plan as follows. NOTE: glabel will not be used. First, create a new GUID Partition Table partition scheme on the HDD: gpart create -s GPT ad0 Let's see how much space we have. This output will be used to determine SOMEVALUE in the next command. gpart show Create a new partition within that scheme: gpart add -b 1024 -s SOMEVALUE -t freebsd-zfs -l disk00 ad0 The -b 1024 ensures alignment on a 4KB boundary. SOMEVALUE will be set so approximately 200MB is left empty at the end of the HDD. That's part more than necessary to accommodate the different actualy size of 2TB HDD. Repeat the above with ad1 to get disk01. Repeat for all other HDD... Then create your zpool: zpool create bigtank gpt/disk00 gpt/disk02 ... etc This plan will be applied to an existing 5 HDD ZFS pool. I have two new empty HDD which will be added to this new array (giving me 7 x 2TB HDD). The array is raidz1 and I'm wondering if I want to go to raidz2. That would be about 10TB and I'm only using up 3.1TB at present. That represents about 4 months of backups. I do not think I can adjust the existing zpool on the fly. I think I need to copy everything elsewhere (i.e the 2 empty drives). Then start the new zpool from scratch. The risk: when the data is on the 2 spare HDD, there is no redundancy. I wonder if my friend Jerry has a spare 2TB HDD I could borrow for the evening. The work is in progress. Updates are at http://beta.freebsddiary.org/zfs-with-gpart.php which will be updated frequently as the work continues. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 9:22 PM, Pawel Tyll wrote: I do not think I can adjust the existing zpool on the fly. I think I need to copy everything elsewhere (i.e the 2 empty drives). Then start the new zpool from scratch. You can, and you should (for educational purposes if not for fun :), unless you wish to change raidz1 to raidz2. Replace, wait for resilver, if redoing used disk then offline it, wipe magic with dd (16KB at the beginning and end of disk/partition will do), carry on with GPT, rinse and repeat with next disk. When last vdev's replace finishes, your pool will grow automagically. Pawell and I had an online chat about part of my strategy. To be clear: I have a 5x2TB raidz1 array. I have 2x2TB empty HDD My goal was to go to raidz2 by: - copy data to empty HDD - redo the zpool to be raidz2 - copy back the data - add in the two previously empty HDD to the zpol I now understand that after a raidz array has been created, you can't add a new HDD to it. I'd like to, but it sounds like you cannot. It is not possible to add a disk as a column to a RAID-Z, RAID-Z2, or RAID-Z3 vdev. http://en.wikipedia.org/wiki/ZFS#Limitations So, it seems I have a 5-HDD zpool and it's going to stay that way. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/23/2010 10:25 PM, Freddie Cash wrote: On Fri, Jul 23, 2010 at 6:33 PM, Dan Langilled...@langille.org wrote: Pawell and I had an online chat about part of my strategy. To be clear: I have a 5x2TB raidz1 array. I have 2x2TB empty HDD My goal was to go to raidz2 by: - copy data to empty HDD - redo the zpool to be raidz2 - copy back the data - add in the two previously empty HDD to the zpol I now understand that after a raidz array has been created, you can't add a new HDD to it. I'd like to, but it sounds like you cannot. It is not possible to add a disk as a column to a RAID-Z, RAID-Z2, or RAID-Z3 vdev. http://en.wikipedia.org/wiki/ZFS#Limitations So, it seems I have a 5-HDD zpool and it's going to stay that way. You can fake it out by using sparse files for members of the new raidz2 vdev (when creating the vdev), then offline the file-based members so that you are running a degraded pool, copy the data to the pool, then replace the file-based members with physical harddrives. So I'm creating a 7 drive pool, with 5 real drives members and two file-based members. I've posted a theoretical method for doing so here: http://forums.freebsd.org/showpost.php?p=93889postcount=7 It's theoretical as I have not investigated how to create sparse files on FreeBSD, nor have I done this. It's based on several posts to the zfs-discuss mailing list where several people have done this on OpenSolaris. I see no downside. There is no risk that it won't work and I'll lose all the data. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/23/2010 10:42 PM, Daniel O'Connor wrote: On 24/07/2010, at 11:55, Freddie Cash wrote: It's theoretical as I have not investigated how to create sparse files on FreeBSD, nor have I done this. It's based on several posts to the zfs-discuss mailing list where several people have done this on OpenSolaris. FYI you would do.. truncate -s 1T /tmp/fake-disk1 mdconfig -a -t vnode -f /tmp/fake-disk1 etc.. Although you'd want to determine the exact size of your real disks from geom and use that. $ dd if=/dev/zero of=/tmp/sparsefile1.img bs=1 count=0 oseek=2000G 0+0 records in 0+0 records out 0 bytes transferred in 0.25 secs (0 bytes/sec) $ ls -l /tmp/sparsefile1.img -rw-r--r-- 1 dan wheel 2147483648000 Jul 23 22:49 /tmp/sparsefile1.img $ ls -lh /tmp/sparsefile1.img -rw-r--r-- 1 dan wheel 2.0T Jul 23 22:49 /tmp/sparsefile1.img -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/23/2010 10:51 PM, Dan Langille wrote: On 7/23/2010 10:42 PM, Daniel O'Connor wrote: On 24/07/2010, at 11:55, Freddie Cash wrote: It's theoretical as I have not investigated how to create sparse files on FreeBSD, nor have I done this. It's based on several posts to the zfs-discuss mailing list where several people have done this on OpenSolaris. FYI you would do.. truncate -s 1T /tmp/fake-disk1 mdconfig -a -t vnode -f /tmp/fake-disk1 etc.. Although you'd want to determine the exact size of your real disks from geom and use that. $ dd if=/dev/zero of=/tmp/sparsefile1.img bs=1 count=0 oseek=2000G 0+0 records in 0+0 records out 0 bytes transferred in 0.25 secs (0 bytes/sec) $ ls -l /tmp/sparsefile1.img -rw-r--r-- 1 dan wheel 2147483648000 Jul 23 22:49 /tmp/sparsefile1.img $ ls -lh /tmp/sparsefile1.img -rw-r--r-- 1 dan wheel 2.0T Jul 23 22:49 /tmp/sparsefile1.img Going a bit further, and actually putting 30MB of data in there: $ rm sparsefile1.img $ dd if=/dev/zero of=/tmp/sparsefile1.img bs=1 count=0 oseek=2000G 0+0 records in 0+0 records out 0 bytes transferred in 0.30 secs (0 bytes/sec) $ ls -lh /tmp/sparsefile1.img -rw-r--r-- 1 dan wheel 2.0T Jul 23 22:59 /tmp/sparsefile1.img $ dd if=/dev/zero of=sparsefile1.img bs=1M count=30 conv=notrunc 30+0 records in 30+0 records out 31457280 bytes transferred in 0.396570 secs (79323405 bytes/sec) $ ls -l sparsefile1.img -rw-r--r-- 1 dan wheel 2147483648000 Jul 23 23:00 sparsefile1.img $ ls -lh sparsefile1.img -rw-r--r-- 1 dan wheel 2.0T Jul 23 23:00 sparsefile1.img $ -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/21/2010 11:39 PM, Adam Vande More wrote: On Wed, Jul 21, 2010 at 10:34 PM, Adam Vande More amvandem...@gmail.com mailto:amvandem...@gmail.com wrote: Also if you have an applicable SATA controller, running the ahci module with give you more speed. Only change one thing a time though. Virtualbox makes a great testbed for this, you don't need to allocate the VM a lot of RAM just make sure it boots and such. I'm not sure of the criteria, but this is what I'm running: atapci0: SiI 3124 SATA300 controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 atapci1: SiI 3124 SATA300 controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 I added ahci_load=YES to loader.conf and rebooted. Now I see: ahci0: ATI IXP700 AHCI SATA controller port 0x8000-0x8007,0x7000-0x7003,0x6000-0x6007,0x5000-0x5003,0x4000-0x400f mem 0xfb3fe400-0xfb3fe7ff irq 22 at device 17.0 on pci0 Which is the onboard SATA from what I can tell, not the controllers I installed to handle the ZFS array. The onboard SATA runs a gmirror array which handles /, /tmp, /usr, and /var (i.e. the OS). ZFS runs only on on my /storage mount point. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 2:59 AM, Andrey V. Elsukov wrote: On 22.07.2010 10:32, Dan Langille wrote: I'm not sure of the criteria, but this is what I'm running: atapci0:SiI 3124 SATA300 controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 atapci1:SiI 3124 SATA300 controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 I added ahci_load=YES to loader.conf and rebooted. Now I see: You can add siis_load=YES to loader.conf for SiI 3124. Ahh, thank you. I'm afraid to do that now, before I label my ZFS drives for fear that the ZFS array will be messed up. But I do plan to do that for the system after my plan is implemented. Thank you. :) -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 3:08 AM, Jeremy Chadwick wrote: On Thu, Jul 22, 2010 at 03:02:33AM -0400, Dan Langille wrote: On 7/22/2010 2:59 AM, Andrey V. Elsukov wrote: On 22.07.2010 10:32, Dan Langille wrote: I'm not sure of the criteria, but this is what I'm running: atapci0:SiI 3124 SATA300 controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 atapci1:SiI 3124 SATA300 controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 I added ahci_load=YES to loader.conf and rebooted. Now I see: You can add siis_load=YES to loader.conf for SiI 3124. Ahh, thank you. I'm afraid to do that now, before I label my ZFS drives for fear that the ZFS array will be messed up. But I do plan to do that for the system after my plan is implemented. Thank you. :) They won't be messed up. ZFS will figure out, using its metadata, which drive is part of what pool despite the device name changing. I now have: siis0: SiI3124 SATA controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 siis1: SiI3124 SATA controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 And my zpool is now: $ zpool status pool: storage state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1ONLINE 0 0 0 ada0ONLINE 0 0 0 ada1ONLINE 0 0 0 ada2ONLINE 0 0 0 ada3ONLINE 0 0 0 ada4ONLINE 0 0 0 Whereas previously, it was ad devices (see http://docs.freebsd.org/cgi/getmsg.cgi?fetch=399538+0+current/freebsd-stable). Thank you (and to Andrey V. Elsukov who posted the same suggestion at the same time you did). I appreciate it. I don't use glabel or GPT so I can't comment on whether or not those work reliably in this situation (I imagine they would, but I keep seeing problem reports on the lists when people have them in use...) Really? The whole basis of the action plan I'm highlighting in this post is to avoid ZFS-related problems when devices get renumbered and ZFS is using device names (e.g. /dev/ad0 instead of labels (e.g. gpt/disk00). -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 3:30 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 2:59 AM, Andrey V. Elsukov wrote: On 22.07.2010 10:32, Dan Langille wrote: I'm not sure of the criteria, but this is what I'm running: atapci0:SiI 3124 SATA300 controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 atapci1:SiI 3124 SATA300 controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 I added ahci_load=YES to loader.conf and rebooted. Now I see: You can add siis_load=YES to loader.conf for SiI 3124. Ahh, thank you. I'm afraid to do that now, before I label my ZFS drives for fear that the ZFS array will be messed up. But I do plan to do that for the system after my plan is implemented. Thank you. :) You may even get hotplug support if you're lucky. :) I just built a box and gave it a spin with the old ata stuff and then with the new (AHCI) stuff. It does perform a bit better and my BIOS claims it supports hotplug with ahci enabled as well... Still have to test that. Well, I don't have anything to support hotplug. All my stuff is internal. http://sphotos.ak.fbcdn.net/hphotos-ak-ash1/hs430.ash1/23778_106837706002537_10289239443_171753_3508473_n.jpg -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 4:03 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 3:30 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 2:59 AM, Andrey V. Elsukov wrote: On 22.07.2010 10:32, Dan Langille wrote: I'm not sure of the criteria, but this is what I'm running: atapci0:SiI 3124 SATA300 controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 atapci1:SiI 3124 SATA300 controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 I added ahci_load=YES to loader.conf and rebooted. Now I see: You can add siis_load=YES to loader.conf for SiI 3124. Ahh, thank you. I'm afraid to do that now, before I label my ZFS drives for fear that the ZFS array will be messed up. But I do plan to do that for the system after my plan is implemented. Thank you. :) You may even get hotplug support if you're lucky. :) I just built a box and gave it a spin with the old ata stuff and then with the new (AHCI) stuff. It does perform a bit better and my BIOS claims it supports hotplug with ahci enabled as well... Still have to test that. Well, I don't have anything to support hotplug. All my stuff is internal. http://sphotos.ak.fbcdn.net/hphotos-ak-ash1/hs430.ash1/23778_106837706002537_10289239443_171753_3508473_n.jpg The frankenbox I'm testing on is a retrofitted 1U (it had a scsi backplane, now has none). I am not certain, but I think with 8.1 (which it's running) and all the cam integration stuff, hotplug is possible. Is a special backplane required? I seriously don't know... I'm going to give it a shot though. Oh, you also might get NCQ. Try: [r...@h21 /tmp]# camcontrol tags ada0 (pass0:ahcich0:0:0:0): device openings: 32 # camcontrol tags ada0 (pass0:siisch2:0:0:0): device openings: 31 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 4:03 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 3:30 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 2:59 AM, Andrey V. Elsukov wrote: On 22.07.2010 10:32, Dan Langille wrote: I'm not sure of the criteria, but this is what I'm running: atapci0:SiI 3124 SATA300 controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 atapci1:SiI 3124 SATA300 controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 I added ahci_load=YES to loader.conf and rebooted. Now I see: You can add siis_load=YES to loader.conf for SiI 3124. Ahh, thank you. I'm afraid to do that now, before I label my ZFS drives for fear that the ZFS array will be messed up. But I do plan to do that for the system after my plan is implemented. Thank you. :) You may even get hotplug support if you're lucky. :) I just built a box and gave it a spin with the old ata stuff and then with the new (AHCI) stuff. It does perform a bit better and my BIOS claims it supports hotplug with ahci enabled as well... Still have to test that. Well, I don't have anything to support hotplug. All my stuff is internal. http://sphotos.ak.fbcdn.net/hphotos-ak-ash1/hs430.ash1/23778_106837706002537_10289239443_171753_3508473_n.jpg The frankenbox I'm testing on is a retrofitted 1U (it had a scsi backplane, now has none). I am not certain, but I think with 8.1 (which it's running) and all the cam integration stuff, hotplug is possible. Is a special backplane required? I seriously don't know... I'm going to give it a shot though. Oh, you also might get NCQ. Try: [r...@h21 /tmp]# camcontrol tags ada0 (pass0:ahcich0:0:0:0): device openings: 32 # camcontrol tags ada0 (pass0:siisch2:0:0:0): device openings: 31 resending with this: ada{0..4} give the above. # camcontrol tags ada5 (pass5:ahcich0:0:0:0): device openings: 32 That's part of the gmirror array for the OS, along with ad6 which has similar output. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 4:03 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 3:30 AM, Charles Sprickman wrote: On Thu, 22 Jul 2010, Dan Langille wrote: On 7/22/2010 2:59 AM, Andrey V. Elsukov wrote: On 22.07.2010 10:32, Dan Langille wrote: I'm not sure of the criteria, but this is what I'm running: atapci0:SiI 3124 SATA300 controller port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef-0xfbef7fff irq 17 at device 4.0 on pci7 atapci1:SiI 3124 SATA300 controller port 0xac00-0xac0f mem 0xfbbffc00-0xfbbffc7f,0xfbbf-0xfbbf7fff irq 19 at device 4.0 on pci3 I added ahci_load=YES to loader.conf and rebooted. Now I see: You can add siis_load=YES to loader.conf for SiI 3124. Ahh, thank you. I'm afraid to do that now, before I label my ZFS drives for fear that the ZFS array will be messed up. But I do plan to do that for the system after my plan is implemented. Thank you. :) You may even get hotplug support if you're lucky. :) I just built a box and gave it a spin with the old ata stuff and then with the new (AHCI) stuff. It does perform a bit better and my BIOS claims it supports hotplug with ahci enabled as well... Still have to test that. Well, I don't have anything to support hotplug. All my stuff is internal. http://sphotos.ak.fbcdn.net/hphotos-ak-ash1/hs430.ash1/23778_106837706002537_10289239443_171753_3508473_n.jpg The frankenbox I'm testing on is a retrofitted 1U (it had a scsi backplane, now has none). I am not certain, but I think with 8.1 (which it's running) and all the cam integration stuff, hotplug is possible. Is a special backplane required? I seriously don't know... I'm going to give it a shot though. Oh, you also might get NCQ. Try: [r...@h21 /tmp]# camcontrol tags ada0 (pass0:ahcich0:0:0:0): device openings: 32 # camcontrol tags ada0 (pass0:siisch2:0:0:0): device openings: 31 resending with this: ada{0..4} give the above. # camcontrol tags ada5 (pass5:ahcich0:0:0:0): device openings: 32 That's part of the gmirror array for the OS, along with ad6 which has similar output. And again with this output from one of the ZFS drives: # camcontrol identify ada0 pass0: Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) protocol ATA/ATAPI-8 SATA 2.x device model Hitachi HDS722020ALA330 firmware revision JKAOA28A serial number JK1130YAH531ST WWN 5000cca221d068d5 cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 512, offset 0 LBA supported 268435455 sectors LBA48 supported 3907029168 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM 7200 Feature Support EnableValue Vendor read ahead yes yes write cacheyes yes flush cacheyes yes overlapno Tagged Command Queuing (TCQ) no no Native Command Queuing (NCQ) yes 32 tags SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management yes no 0/0x00 automatic acoustic management yes no 254/0xFE128/0x80 media status notification no no power-up in Standbyyes no write-read-verify no no 0/0x0 unload no no free-fall no no data set management (TRIM) no -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
Thank you to all the helpful discussion. It's been very helpful and educational. Based on the advice and suggestions, I'm going to adjust my original plan as follows. NOTE: glabel will not be used. First, create a new GUID Partition Table partition scheme on the HDD: gpart create -s GPT ad0 Let's see how much space we have. This output will be used to determine SOMEVALUE in the next command. gpart show Create a new partition within that scheme: gpart add -b 1024 -s SOMEVALUE -t freebsd-zfs -l disk00 ad0 The -b 1024 ensures alignment on a 4KB boundary. SOMEVALUE will be set so approximately 200MB is left empty at the end of the HDD. That's part more than necessary to accommodate the different actualy size of 2TB HDD. Repeat the above with ad1 to get disk01. Repeat for all other HDD... Then create your zpool: zpool create bigtank gpt/disk00 gpt/disk02 ... etc This plan will be applied to an existing 5 HDD ZFS pool. I have two new empty HDD which will be added to this new array (giving me 7 x 2TB HDD). The array is raidz1 and I'm wondering if I want to go to raidz2. That would be about 10TB and I'm only using up 3.1TB at present. That represents about 4 months of backups. I do not think I can adjust the existing zpool on the fly. I think I need to copy everything elsewhere (i.e the 2 empty drives). Then start the new zpool from scratch. The risk: when the data is on the 2 spare HDD, there is no redundancy. I wonder if my friend Jerry has a spare 2TB HDD I could borrow for the evening. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/22/2010 9:22 PM, Pawel Tyll wrote: I do not think I can adjust the existing zpool on the fly. I think I need to copy everything elsewhere (i.e the 2 empty drives). Then start the new zpool from scratch. You can, and you should (for educational purposes if not for fun :), unless you wish to change raidz1 to raidz2. Replace, wait for resilver, if redoing used disk then offline it, wipe magic with dd (16KB at the beginning and end of disk/partition will do), carry on with GPT, rinse and repeat with next disk. When last vdev's replace finishes, your pool will grow automagically. So... the smaller size won't mess things up... -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 7/21/2010 2:54 AM, Charles Sprickman wrote: On Wed, 21 Jul 2010, Charles Sprickman wrote: On Tue, 20 Jul 2010, alan bryan wrote: --- On Mon, 7/19/10, Dan Langille d...@langille.org wrote: From: Dan Langille d...@langille.org Subject: Re: Problems replacing failing drive in ZFS pool To: Freddie Cash fjwc...@gmail.com Cc: freebsd-stable freebsd-stable@freebsd.org Date: Monday, July 19, 2010, 7:07 PM On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Dan, Here's how to do it after the fact: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-07/msg00623.html Two things: -What's the preferred labelling method for disks that will be used with zfs these days? geom_label or gpt labels? I've been using the latter and I find them a little simpler. -I think that if you already are using gpt partitioning, you can add a gpt label after the fact (ie: gpart -i index# -l your_label adaX). gpart list will give you a list of index numbers. Oops. That should be gpart modify -i index# -l your_label adax. I'm not using gpt partitioning. I think I'd like to try that. To do just that, I've ordered two more HDD. They'll be arriving today. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 7/19/2010 10:50 PM, Adam Vande More wrote: On Mon, Jul 19, 2010 at 9:07 PM, Dan Langilled...@langille.org wrote: I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... If you've used whole disks in ZFS, you can't retrofit it if by retrofit you mean an almost painless method of resolving this. GEOM setup stuff generally should happen BEFORE the file system is on it. You would create your partition(s) slightly smaller than the disk, label it, then use the resulting device as your zfs device when creating the pool. If you have an existing full disk install, that means restoring the data after you've done those steps. It works just as well with MBR style partitioning, there's nothing saying you have to use GPT. GPT is just better though in terms of ease of use IMO among other things. FYI, this is exactly what I'm doing to do. I have obtained addition HDD to serve as temporary storage. I will also use them for practicing the commands before destroying the original array. I'll post my plan to the list for review. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Using GTP and glabel for ZFS arrays
I hope my terminology is correct I have a ZFS array which uses raw devices. I'd rather it use glabel and supply the GEOM devices to ZFS instead. In addition, I'll also partition the HDD to avoid using the entire HDD: leave a little bit of space at the start and end. Why use glabel? * So ZFS can find and use the correct HDD should the HDD device ever get renumbered for whatever reason. e.g. /dev/da0 becomes /dev/da6 when you move it to another controller. Why use partitions? * Primarily: two HDD of a given size, say 2TB, do not always provide the same amount of available space. If you use a slightly smaller partition instead of the entire physical HDD, you're much more likely to have a happier experience when it comes time to replace an HDD. * There seems to be a consensus amongst some that leaving the start and and of your HDD empty. Give the rest to ZFS. Things I've read that led me to the above reasons: * http://docs.freebsd.org/cgi/getmsg.cgi?fetch=399538+0+current/freebsd-stable * http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055008.html * http://lists.freebsd.org/pipermail/freebsd-geom/2009-July/003620.html The plan for this plan, I'm going to play with just two HDD, because that's what I have available. Let's assume these two HDD are ad0 and ad1. I am not planning to boot from these HDD; they are for storage only. First, create a new GUID Partition Table partition scheme on the HDD: gpart create -s GPT ad0 Let's see how much space we have. This output will be used to determine SOMEVALUE in the next command. gpart show Create a new partition within that scheme: gpart add -b 34 -s SOMEVALUE -t freebsd-zfs ad0 Why '-b 34'? Randi pointed me to http://en.wikipedia.org/wiki/GUID_Partition_Table where it explains what the first 33 LBA are used for. It's not for us to use here. Where SOMEVALUE is the number of blocks to use. I plan not to use all the available blocks but leave a few hundred MB free at the end. That'll allow for the variance in HDD size. Now, label the thing: glabel label -v disk00 /dev/ad0 Repeat the above with ad1 to get disk01. Repeat for all other HDD... Then create your zpool: zpool create bigtank disk00 disk01 ... etc Any suggestions/comments? Is there any advantage to using the -l option on 'gpart add' instead of the glabel above? Thanks -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Using GTP and glabel for ZFS arrays
On 7/21/2010 11:05 PM, Dan Langille wrote (something close to this): First, create a new GUID Partition Table partition scheme on the HDD: gpart create -s GPT ad0 Let's see how much space we have. This output will be used to determine SOMEVALUE in the next command. gpart show Create a new partition within that scheme: gpart add -b 34 -s SOMEVALUE -t freebsd-zfs ad0 Now, label the thing: glabel label -v disk00 /dev/ad0 Or, is this more appropriate? glabel label -v disk00 /dev/ad0s1 -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems replacing failing drive in ZFS pool
On 7/19/2010 12:15 PM, Freddie Cash wrote: On Mon, Jul 19, 2010 at 8:56 AM, Garrett Mooregarrettmo...@gmail.com wrote: So you think it's because when I switch from the old disk to the new disk, ZFS doesn't realize the disk has changed, and thinks the data is just corrupt now? Even if that happens, shouldn't the pool still be available, since it's RAIDZ1 and only one disk has gone away? I think it's because you pull the old drive, boot with the new drive, the controller re-numbers all the devices (ie da3 is now da2, da2 is now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all the drives have changed, thus corrupting the pool. I've had this happen on our storage servers a couple of times before I started using glabel(8) on all our drives (dead drive on RAID controller, remove drive, reboot for whatever reason, all device nodes are renumbered, everything goes kablooey). Can you explain a bit about how you use glabel(8) in conjunction with ZFS? If I can retrofit this into an exist ZFS array to make things easier in the future... 8.0-STABLE #0: Fri Mar 5 00:46:11 EST 2010 ]# zpool status pool: storage state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10ONLINE 0 0 0 ad12ONLINE 0 0 0 ad14ONLINE 0 0 0 ad16ONLINE 0 0 0 Of course, always have good backups. ;) In my case, this ZFS array is the backup. ;) But I'm setting up a tape library, real soon now -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Authentication tried for XXX with correct key but not from a permitted host
This is more for the record than asking a specific question. Today I upgraded a system to FreeBSD 8.1-PRERELEASE. Then I started seeing these messages when I ssh to said box with an ssh-agent enabled connection: Jul 11 03:43:06 ngaio sshd[30290]: Authentication tried for dan with correct key but not from a permitted host (host=laptop.example.org, ip=10.0.0.100). Jul 11 03:43:07 ngaio sshd[30290]: Authentication tried for dan with correct key but not from a permitted host (host=laptop.example.org, ip=10.0.0.100). Jul 11 03:43:07 ngaio sshd[30290]: Accepted publickey for dan from 10.0.0.100 port 53525 ssh2 My questions were: 1 - how do I set a permitted host? 2 - why is the message logged twice? That asked, I know if I move the key to the top of the ~/.ssh/authorized_keys file, the message is no longer logged. Further investigation reveals that if a line of the form: from=10..etc appears before the key being used to log in, the message will appear. Solution: move the from= line to the bottom of the file. Ugly, but it works. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
FreeBSD 7.3-stable fails to boot under KVM
A bunch of use have colo'd a server at an ISP. We each run our own KVM (I have no other details at present). I've been running FreeBSD 7.3 inside my KVM (everyone else is running Linux. I encountered a problem when I tried to upgrade the install from 7.2-stable to 7.3-stable. The boot process hangs. The last thing shown is the memory in the system. I'll copy/paste from /var/log/messages to demonstrate the point last thing I can see on the VNC screen. Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-STABLE #0: Thu Dec 3 20:37:29 UTC 2009 d...@latens.example.org:/usr/obj/usr/src/sys/LATENS i386 Timecounter i8254 frequency 1193182 Hz quality 0 CPU: QEMU Virtual CPU version 0.9.1 (3008.69-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x623 Stepping = 3 Features=0x78bfbfdFPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2 Features2=0x8001SSE3,b31 AMD Features=0x20100800SYSCALL,NX,LM real memory = 268369920 (255 MB) avail memory = 248524800 (237 MB) ^^^ last thing I see. I can get to the boot loader screen (which is how I booted from kernel.old) Suggestions? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
On 2/16/2010 6:28 AM, Miroslav Lachman wrote: Dan Langille wrote: Daniel O'Connor wrote: [...] Why even bother with the LSI card at all? That board already has 6 SATA slots - depends how many disks you want to use of course. (5 HDs + 1 DVD drive?) Plus two SATA drives in a gmirror for the base OS, and one optical. I want a minimum of 8 slots. I think that 2 HDDs in gmirror just for base OS is an overkill if you want this machine as home storage. You will be fine with booting the base OS from CF card or USB stick. (and you can put two USB flash disks in gmirror if you want redundancy) This way you will save some money, SATA ports/cards and if you will use some kind of fast and big USB stick, you can use part of it as L2ARC for speeding up read performance of ZFS http://www.leidinger.net/blog/2010/02/10/making-zfs-faster/ I have my backup storage machine booted from USB stick (as read-only UFS) with 4x 1TB HDDs in RAIDZ. It is running one and half year without problem. I agree. However, the machine will be primarily storage, but it will also be running PostgreSQL and Bacula. I already have smaller unused SATA drives laying around here. Thank you -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
On Tue, February 16, 2010 2:05 pm, Alexander Motin wrote: Dan Langille wrote: On Wed, February 10, 2010 10:00 pm, Bruce Simpson wrote: On 02/10/10 19:40, Steve Polyack wrote: I haven't had such bad experience as the above, but it is certainly a concern. Using ZFS we simply 'offline' the device, pull, replace with a new one, glabel, and zfs replace. It seems to work fine as long as nothing is accessing the device you are replacing (otherwise you will get a kernel panic a few minutes down the line). m...@freebsd.org has also committed a large patch set to 9-CURRENT which implements proper SATA/AHCI hot-plug support and error-recovery through CAM. I've been running with this patch in 8-STABLE for well over a week now on my desktop w/o issues; I am using main disk for dev, and eSATA disk pack for light multimedia use. MFC to 8.x? Merged. Thank you. :) -- Dan Langille -- http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Ulf Zimmermann wrote: On Sun, Feb 14, 2010 at 07:33:07PM -0500, Dan Langille wrote: Get a dock for holding 2 x 2,5 disks in a single 5,25 slot and put it at the top, in the only 5,25 bay of the case. That sounds very interesting. I just looking around for such a thing, and could not find it. Is there a more specific name? URL? I had an Addonics 5.25 frame for 4x 2.5 SAS/SATA but the small fans in it are unfortunatly of the cheap kind. I ended up using the 2x2.5 to 3.5 frame from Silverstone (for the small Silverstone case I got). Ahh, something like this: http://silverstonetek.com/products/p_contents.php?pno=SDP08area=usa I understand now. Thank you. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Dan Naumov wrote: On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille d...@langille.org wrote: Dan Naumov wrote: On Sun, 14 Feb 2010, Dan Langille wrote: After creating three different system configurations (Athena, Supermicro, and HP), my configuration of choice is this Supermicro setup: 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 2. SuperMicro 5046A $750 (+$43 shipping) 3. LSI SAS 3081E-R $235 4. SATA cables $60 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) 6. Xeon W3520 $310 You do realise how much of a massive overkill this is and how much you are overspending? I appreciate the comments and feedback. I'd also appreciate alternative suggestions in addition to what you have contributed so far. Spec out the box you would build. == Case: Fractal Design Define R2 - 89 euro: http://www.fractal-design.com/?view=productprod=32 Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro: http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H PSU: Corsair 400CX 80+ - 59 euro: http://www.corsair.com/products/cx/default.aspx RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro == Total: ~435 euro The motherboard has 6 native AHCI-capable ports on ICH9R controller and you have a PCI-E slot free if you want to add an additional controller card. Feel free to blow the money you've saved on crazy fast SATA disks and if your system workload is going to have a lot of random reads, then spend 200 euro on a 80gb Intel X25-M for use as a dedicated L2ARC device for your pool. Based on the Fractal Design case mentioned above, I was told about Lian Lia cases, which I think are great. As a result, I've gone with a tower case without hot-swap. The parts are listed at and reproduced below: http://dan.langille.org/2010/02/15/a-full-tower-case/ 1. LIAN LI PC-A71F Black Aluminum ATX Full Tower Computer Case $240 (from mwave) 2. Antec EarthWatts EA650 650W PSU $80 3. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 4. Intel S3200SHV LGA 775 Intel 3200 m/b $200 5. Intel Core2 Quad Q9400 CPU $190 6. SATA cables $22 7. Supermicro LSI MegaRAID 8 Port SAS RAID Controller $118 8. Kingston ValueRAM 4GB (2 x 2GB) 240-Pin DDR2 SDRAM ECC $97 Total cost is about $1020 with shipping. Plus HDD. No purchases yet, but the above is what appeals to me now. Thank you. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Dan Naumov wrote: On Mon, Feb 15, 2010 at 7:14 PM, Dan Langille d...@langille.org wrote: Dan Naumov wrote: On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille d...@langille.org wrote: Dan Naumov wrote: On Sun, 14 Feb 2010, Dan Langille wrote: After creating three different system configurations (Athena, Supermicro, and HP), my configuration of choice is this Supermicro setup: 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 2. SuperMicro 5046A $750 (+$43 shipping) 3. LSI SAS 3081E-R $235 4. SATA cables $60 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) 6. Xeon W3520 $310 You do realise how much of a massive overkill this is and how much you are overspending? I appreciate the comments and feedback. I'd also appreciate alternative suggestions in addition to what you have contributed so far. Spec out the box you would build. == Case: Fractal Design Define R2 - 89 euro: http://www.fractal-design.com/?view=productprod=32 Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro: http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H PSU: Corsair 400CX 80+ - 59 euro: http://www.corsair.com/products/cx/default.aspx RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro == Total: ~435 euro The motherboard has 6 native AHCI-capable ports on ICH9R controller and you have a PCI-E slot free if you want to add an additional controller card. Feel free to blow the money you've saved on crazy fast SATA disks and if your system workload is going to have a lot of random reads, then spend 200 euro on a 80gb Intel X25-M for use as a dedicated L2ARC device for your pool. Based on the Fractal Design case mentioned above, I was told about Lian Lia cases, which I think are great. As a result, I've gone with a tower case without hot-swap. The parts are listed at and reproduced below: http://dan.langille.org/2010/02/15/a-full-tower-case/ 1. LIAN LI PC-A71F Black Aluminum ATX Full Tower Computer Case $240 (from mwave) 2. Antec EarthWatts EA650 650W PSU $80 3. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 4. Intel S3200SHV LGA 775 Intel 3200 m/b $200 5. Intel Core2 Quad Q9400 CPU $190 6. SATA cables $22 7. Supermicro LSI MegaRAID 8 Port SAS RAID Controller $118 8. Kingston ValueRAM 4GB (2 x 2GB) 240-Pin DDR2 SDRAM ECC $97 Total cost is about $1020 with shipping. Plus HDD. No purchases yet, but the above is what appeals to me now. A C2Q CPU makes little sense right now from a performance POV. For the price of that C2Q CPU + LGA775 board you can get an i5 750 CPU and a 1156 socket motherboard that will run circles around that C2Q. You would lose the ECC though, since that requires the more expensive 1366 socket CPUs and boards. ECC RAM appeals and yes, that comes with a cost. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Steve Polyack wrote: On 02/15/10 12:14, Dan Langille wrote: 7. Supermicro LSI MegaRAID 8 Port SAS RAID Controller $118 Dan, I'm not sure about that particular card, but we've never seen that great of performance out of the LSI MegaRAID cards that ship with Dell servers as the PERC. The newest incarnations are better, but I would try to get an Areca. The ones we have tested have displayed fantastic performance. They are fairly expensive in comparison, though. If you're using ZFS in place of the RAID on the LSI MegaRAID, I'd instead recommend other simpler SAS cards which are known to have good driver support. Yes, the card will be used as a straight-through and not use for RAID. ZFS will be running raidz for me, possibly raidz2. Given that, I'm not sure if you're suggesting the3 Areca or something else. In addition, I'm not sure what makes a SAS card simpler and supported. Recommendation? Other cards I have considered include: LSI SAS3041E-R 4 port $120 http://www.google.com/products/catalog?q=lsi+sas+pciehl=encid=1824913543877548833sa=title#p SYBA SY-PEX40008 PCI Express SATA II 4 port $60 http://www.newegg.com/Product/Product.aspx?Item=N82E16816124027 LSISAS1064 chipset - SAS3042e http://www.lsi.com/DistributionSystem/AssetDocument/PCIe_3GSAS_UG.pdf SUPERMICRO AOC-SAT2-MV8 64-bit PCI-X133MHz SATA Controller Card $99 http://www.newegg.com/Product/Product.aspx?Item=N82E16815121009 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Daniel O'Connor wrote: On Tue, 16 Feb 2010, Steve Polyack wrote: I'm not sure about that particular card, but we've never seen that great of performance out of the LSI MegaRAID cards that ship with Dell servers as the PERC. The newest incarnations are better, but I would try to get an Areca. The ones we have tested have displayed fantastic performance. They are fairly expensive in comparison, though. If you're using ZFS in place of the RAID on the LSI MegaRAID, I'd instead recommend other simpler SAS cards which are known to have good driver support. Why even bother with the LSI card at all? That board already has 6 SATA slots - depends how many disks you want to use of course. (5 HDs + 1 DVD drive?) Plus two SATA drives in a gmirror for the base OS, and one optical. I want a minimum of 8 slots. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Daniel O'Connor wrote: On Sun, 14 Feb 2010, Dan Langille wrote: After creating three different system configurations (Athena, Supermicro, and HP), my configuration of choice is this Supermicro setup: 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 2. SuperMicro 5046A $750 (+$43 shipping) 3. LSI SAS 3081E-R $235 4. SATA cables $60 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) 6. Xeon W3520 $310 Total price with shipping $1560 Details and links at http://dan.langille.org/2010/02/14/supermicro/ I'll probably start with 5 HDD in the ZFS array, 2x gmirror'd drives for the boot, and 1 optical drive (so 8 SATA ports). That is f**king expensive for a home setup :) I priced a decent ZFS PC for a small business and it was AUD$2500 including the disks (5x750Gb), case, PSU etc.. Yes, and this one doesn't yet have HDD. Can you supply details of your system? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Dan Naumov wrote: On Sun, 14 Feb 2010, Dan Langille wrote: After creating three different system configurations (Athena, Supermicro, and HP), my configuration of choice is this Supermicro setup: 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 2. SuperMicro 5046A $750 (+$43 shipping) 3. LSI SAS 3081E-R $235 4. SATA cables $60 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) 6. Xeon W3520 $310 You do realise how much of a massive overkill this is and how much you are overspending? I appreciate the comments and feedback. I'd also appreciate alternative suggestions in addition to what you have contributed so far. Spec out the box you would build. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Dmitry Morozovsky wrote: On Wed, 10 Feb 2010, Dmitry Morozovsky wrote: DM other parts are regular SocketAM2+ motherboard, Athlon X4, 8G ram, DM FreeBSD/amd64 well, not exactly regular - it's ASUS M2N-LR-SATA with 10 SATA channels, but I suppose there are comparable in workstation mobo market now... I couldn't find this one for sale, FWIW. But looks interesting. Thanks. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Alexander Motin wrote: Steve Polyack wrote: On 2/10/2010 12:02 AM, Dan Langille wrote: Don't use a port multiplier and this goes away. I was hoping to avoid a PM and using something like the Syba PCI Express SATA II 4 x Ports RAID Controller seems to be the best solution so far. http://www.amazon.com/Syba-Express-Ports-Controller-SY-PEX40008/dp/B002R0DZWQ/ref=sr_1_22?ie=UTF8s=electronicsqid=1258452902sr=1-22 Dan, I can personally vouch for these cards under FreeBSD. We have 3 of them in one system, with almost every port connected to a port multiplier (SiI5xxx PMs). Using the siis(4) driver on 8.0-RELEASE provides very good performance, and supports both NCQ and FIS-based switching (an essential for decent port-multiplier performance). One thing to consider, however, is that the card is only single-lane PCI-Express. The bandwidth available is only 2.5Gb/s (~312MB/sec, slightly less than that of the SATA-2 link spec), so if you have 4 high-performance drives connected, you may hit a bottleneck at the bus. I'd be particularly interested if anyone can find any similar Silicon Image SATA controllers with a PCI-E 4x or 8x interface ;) Here is SiI3124 based card with built-in PCIe x8 bridge: http://www.addonics.com/products/host_controller/adsa3gpx8-4em.asp It is not so cheap, but with 12 disks connected via 4 Port Multipliers it can give up to 1GB/s (4x250MB/s) of bandwidth. Cheaper PCIe x1 version mentioned above gave me up to 200MB/s, that is maximum of what I've seen from PCIe 1.0 x1 controllers. Looking on NCQ and FBS support it can be enough for many real-world applications, that don't need so high linear speeds, but have many concurrent I/Os. Is that the URL you meant to post? 4 Port eSATA PCI-E 8x Controller for Mac Pro. I'd rather use internal connections. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Wes Morgan wrote: On Sun, 14 Feb 2010, Dan Langille wrote: Dan Langille wrote: Hi, I'm looking at creating a large home use storage machine. Budget is a concern, but size and reliability are also a priority. Noise is also a concern, since this will be at home, in the basement. That, and cost, pretty much rules out a commercial case, such as a 3U case. It would be nice, but it greatly inflates the budget. This pretty much restricts me to a tower case. The primary use of this machine will be a backup server[1]. It will do other secondary use will include minor tasks such as samba, CIFS, cvsup, etc. I'm thinking of 8x1TB (or larger) SATA drives. I've found a case[2] with hot-swap bays[3], that seems interesting. I haven't looked at power supplies, but given that number of drives, I expect something beefy with a decent reputation is called for. Whether I use hardware or software RAID is undecided. I I think I am leaning towards software RAID, probably ZFS under FreeBSD 8.x but I'm open to hardware RAID but I think the cost won't justify it given ZFS. Given that, what motherboard and RAM configuration would you recommend to work with FreeBSD [and probably ZFS]. The lists seems to indicate that more RAM is better with ZFS. Thanks. [1] - FYI running Bacula, but that's out of scope for this question [2] - http://www.newegg.com/Product/Product.aspx?Item=N82E16811192058 [3] - nice to have, especially for a failure. After creating three different system configurations (Athena, Supermicro, and HP), my configuration of choice is this Supermicro setup: 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 2. SuperMicro 5046A $750 (+$43 shipping) 3. LSI SAS 3081E-R $235 4. SATA cables $60 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) 6. Xeon W3520 $310 Total price with shipping $1560 Details and links at http://dan.langille.org/2010/02/14/supermicro/ Wow um... That's quite a setup. Do you really need the Xeon W3520? You could get a regular core 2 system for much less and still use the ECC ram (highly recommended). The case you're looking at only has 6 hot-swap bays according to the manuals, although the pictures show 8 (???). Going to http://www.supermicro.com/products/system/tower/5046/SYS-5046A-X.cfm it does say 6 hot-swap and two spare. I'm guessing they say that because the M/B supports only 6 SATA connections: http://www.supermicro.com/products/motherboard/Core2Duo/X58/C7X58.cfm You could shave some off the case and cpu, upgrade your 3081E-R to an ARC-1222 for $200 more and have the hardware raid option. That is a nice card. However, I don't want hardware RAID. I want ZFS. If I was building a tower system, I'd put together something like this: Thank you for the suggestions. Case with 8 hot-swap SATA bays ($250): http://www.newegg.com/Product/Product.aspx?Item=N82E16811192058 Or if you prefer screwless, you can find the case without the 2 hotswap bays and use an icy dock screwless version. I do like this case, it's one I have priced: http://dan.langille.org/2010/02/14/pricing-the-athena/ Intel server board (for ECC support) ($200): http://www.newegg.com/Product/Product.aspx?Item=N82E16813121328 ECC, nice, which is something I've found appealing. SAS controller ($120): http://www.buy.com/prod/supermicro-lsi-megaraid-lsisas1068e-8-port-sas-raid-controller-16mb/q/loc/101/207929556.html Note: You'll need to change or remove the mounting bracket since it is backwards. I was able to find a bracket with matching screw holes on an old nic and secure it to my case. It uses the same chipset as the more expensive 3081E-R, if I remember correctly. I follow what you say, but cannot comprehend why the bracket is backwards. Quad-core CPU ($190): http://www.newegg.com/Product/Product.aspx?Item=N82E16819115131 4x2gb ram sticks (97*2): http://www.newegg.com/Product/Product.aspx?Item=N82E16820139045 same SATA cables for sata to mini-sas, same CD burner. Total cost probably $400 less, which you can use to buy some of the drives. I put this all together, and named it after you (hope you don't mind): http://dan.langille.org/2010/02/14/273/ You're right, $400 less. I also wrote up the above suggestions with a Supermicro case instead: SUPERMICRO CSE-743T-645B Black 4U Pedestal Chassis w/ 645W Power Supply $320 http://www.newegg.com/Product/Product.aspx?Item=N82E16811152047 I like your suggestions with the above case. It is now my preferred solution. For my personal (overkill) setup I have a chenbro 4U chassis with 16 hotswap bays and mini-SAS backplanes, a zippy 2+1 640 watt redundant power supply (sounds like a freight train). I cannot express the joy I felt in ripping out all the little SATA cables and snaking a couple fat 8087s under the fans. 8 of the bays are dedicated to my media array, and the other 8 are there for swapping in and out of backup drives mostly, but the time they REALLY come in handy is when you need to upgrade
Re: hardware for home use large storage
Dmitry Morozovsky wrote: On Mon, 8 Feb 2010, Dan Langille wrote: DL I'm looking at creating a large home use storage machine. Budget is a DL concern, but size and reliability are also a priority. Noise is also a DL concern, since this will be at home, in the basement. That, and cost, DL pretty much rules out a commercial case, such as a 3U case. It would be DL nice, but it greatly inflates the budget. This pretty much restricts me to DL a tower case. [snip] We use the following at work, but it's still pretty cheap and pretty silent: Chieftec WH-02B-B (9x5.25 bays) $130 http://www.ncixus.com/products/33591/WH-02B-B-OP/Chieftec/ but not available $87.96 at http://www.xpcgear.com/chieftec-wh-02b-b-mid-tower-case.html http://www.chieftec.com/wh02b-b.html filled with 2 x Supermicro CSE-MT35T http://www.supermicro.nl/products/accessories/mobilerack/CSE-M35T-1.cfm for regular storage, 2 x raidz1 I could not find a price on that, but guessing at $100 each 1 x Promise SuperSwap 1600 http://www.promise.com/product/product_detail_eng.asp?product_id=169 for changeable external backups $100 from http://www.overstock.com/Electronics/Promise-SuperSwap-1600-Drive-Enclosure/2639699/product.html So that's $390. Not bad. Still need RAM, M/B, PSU, and possibly video. and still have 2 5.25 bays for anything interesting ;-) I'd be filling those three with DVD-RW and two SATA drives in a gmirror configuration. other parts are regular SocketAM2+ motherboard, Athlon X4, 8G ram, FreeBSD/amd64 Let's say $150 for the M/B, $150 for the CPU, and $200 for the RAM. Total is $890. Nice. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Dan Naumov wrote: On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille d...@langille.org wrote: Dan Naumov wrote: On Sun, 14 Feb 2010, Dan Langille wrote: After creating three different system configurations (Athena, Supermicro, and HP), my configuration of choice is this Supermicro setup: 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 2. SuperMicro 5046A $750 (+$43 shipping) 3. LSI SAS 3081E-R $235 4. SATA cables $60 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) 6. Xeon W3520 $310 You do realise how much of a massive overkill this is and how much you are overspending? I appreciate the comments and feedback. I'd also appreciate alternative suggestions in addition to what you have contributed so far. Spec out the box you would build. == Case: Fractal Design Define R2 - 89 euro - http://www.fractal-design.com/?view=productprod=32 That is a nice case. It's one slot short for what I need. The trays are great. I want three more slots for 2xSATA for a gmirror base-OS and an optical drive. As someone mentioned on IRC, there are many similar non hot-swap cases. From the website, I couldn't see this for sale in USA. But converting your price, to US$, it is about $121. Looking around, this case was suggested to me. I like it a lot: LIAN LI PC-A71F Black Aluminum ATX Full Tower Computer Case $240 http://www.newegg.com/Product/Product.aspx?Item=N82E1682244 Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro - http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H Non-ECC RAM, which is something I'd like to have. $175 PSU: Corsair 400CX 80+ - 59 euro - http://www.corsair.com/products/cx/default.aspx http://www.newegg.com/Product/Product.aspx?Item=N82E16817139008 for $50 Is that sufficient power up to 10 SATA HDD and an optical drive? RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro http://www.newegg.com/Product/Product.aspx?Item=N82E16820145238 $82 == Total: ~435 euro With my options, it's about $640 with shipping etc. The motherboard has 6 native AHCI-capable ports on ICH9R controller and you have a PCI-E slot free if you want to add an additional controller card. Feel free to blow the money you've saved on crazy fast SATA disks and if your system workload is going to have a lot of random reads, then spend 200 euro on a 80gb Intel X25-M for use as a dedicated L2ARC device for your pool. I have been playing with the idea of an L2ARC device. They sound crazy cool. Thank you Dan. -- dan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: hardware for home use large storage
Dan Naumov wrote: On Mon, Feb 15, 2010 at 12:42 AM, Dan Naumov dan.nau...@gmail.com wrote: On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille d...@langille.org wrote: Dan Naumov wrote: On Sun, 14 Feb 2010, Dan Langille wrote: After creating three different system configurations (Athena, Supermicro, and HP), my configuration of choice is this Supermicro setup: 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) 2. SuperMicro 5046A $750 (+$43 shipping) 3. LSI SAS 3081E-R $235 4. SATA cables $60 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) 6. Xeon W3520 $310 You do realise how much of a massive overkill this is and how much you are overspending? I appreciate the comments and feedback. I'd also appreciate alternative suggestions in addition to what you have contributed so far. Spec out the box you would build. == Case: Fractal Design Define R2 - 89 euro: http://www.fractal-design.com/?view=productprod=32 Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro: http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H PSU: Corsair 400CX 80+ - 59 euro: http://www.corsair.com/products/cx/default.aspx RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro == Total: ~435 euro The motherboard has 6 native AHCI-capable ports on ICH9R controller and you have a PCI-E slot free if you want to add an additional controller card. Feel free to blow the money you've saved on crazy fast SATA disks and if your system workload is going to have a lot of random reads, then spend 200 euro on a 80gb Intel X25-M for use as a dedicated L2ARC device for your pool. And to expand a bit, if you want that crazy performance without blowing silly amounts of money: Get a dock for holding 2 x 2,5 disks in a single 5,25 slot and put it at the top, in the only 5,25 bay of the case. That sounds very interesting. I just looking around for such a thing, and could not find it. Is there a more specific name? URL? Now add an additional PCI-E SATA controller card, like the often mentioned PCIE SIL3124. http://www.newegg.com/Product/Product.aspx?Item=N82E16816124026 for $35 Now you have 2 x 2,5 disk slots and 8 x 3,5 disk slots, with 6 native SATA ports on the motherboard and more ports on the controller card. Now get 2 x 80gb Intel SSDs and put them into the dock. Now partition each of them in the following fashion: 1: swap: 4-5gb 2: freebsd-zfs: ~10-15gb for root filesystem 3: freebsd-zfs: rest of the disk: dedicated L2ARC vdev GMirror your SSD swap partitions. Make a ZFS mirror pool out of your SSD root filesystem partitions Build your big ZFS pool however you like out of the mechanical disks you have. Add the 2 x ~60gb partitions as dedicated independant L2ARC devices for your SATA disk ZFS pool. Now you have redundant swap, redundant and FAST root filesystem and your ZFS pool of SATA disks has 120gb worth of L2ARC space on the SSDs. The L2ARC vdevs dont need to be redundant, because should an IO error occur while reading off L2ARC, the IO is deferred to the real data location on the pool of your SATA disks. You can also remove your L2ARC vdevs from your pool at will, on a live pool. That is nice. Thank you. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org