Re: ZFS...
Michelle Sullivan wrote: On 02 May 2019, at 03:39, Steven Hartland wrote: On 01/05/2019 15:53, Michelle Sullivan wrote: Paul Mather wrote: On Apr 30, 2019, at 11:17 PM, Michelle Sullivan wrote: Been there done that though with ext2 rather than UFS.. still got all my data back... even though it was a nightmare.. Is that an implication that had all your data been on UFS (or ext2:) this time around you would have got it all back? (I've got that impression through this thread from things you've written.) That sort of makes it sound like UFS is bulletproof to me. Its definitely not (and far from it) bullet proof - however when the data on disk is not corrupt I have managed to recover it - even if it has been a nightmare - no structure - all files in lost+found etc... or even resorting to r-studio in the even of lost raid information etc.. Yes but you seem to have done this with ZFS too, just not in this particularly bad case. There is no r-studio for zfs or I would have turned to it as soon as this issue hit. So as an update, this Company: http://www.klennet.com/ produce a ZFS recovery tool: https://www.klennet.com/zfs-recovery/default.aspx and following several code changes due to my case being an 'edge case' the entire volume (including the zvol - which I previously recovered as it wasn't suffering from the metadata corruption) and all 34 million files is being recovered intact with the entire directory structure. Its only drawback is it's a windows only tool, so I built 'windows on a stick' and it's running from that. The only thing I had to do was physically pull the 'spare' out as the spare already had data on it from being previously swapped in and it confused the hell out of the algorithm that detects the drive order. Regards, Michelle -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Miroslav Lachman wrote: Alan Somers wrote on 2019/05/09 14:50: [...] On 11.3 and even much older releases, you can greatly speed up scrub and resilver by tweaking some sysctls. If you have spinning rust, raise vfs.zfs.top_maxinflight so they'll do fewer seeks. I used to set it to 8192 on machines with 32GB of RAM. Raising vfs.zfs.resilver_min_time_ms to 5000 helps a little, too. I have this in sysctl.conf vfs.zfs.scrub_delay=0 vfs.zfs.top_maxinflight=128 vfs.zfs.resilver_min_time_ms=5000 vfs.zfs.resilver_delay=0 I found it somewhere in the mailinglist discussing this issue in the past. Isn't yours 8192 too much? The machine in question has 4x SATA drives on very dump and slow controller and only 5GB of RAM. Even if I read this vfs.zfs.top_maxinflight: Maximum I/Os per top-level vdev I am still not sure what it really means and how I can "calculate" optimal value. I calculated it by looking at the iops using gstat and then multiplied it by the spindles in use. That seemed to give the optimum. Much lower and the drives had idle time to much and it causes 'pauses' whilst the writes happen. "Tuning" for me had no fine tuning, it seems very sledgehammerish ... big changes are noticeable for better or worse.. small changes you cannot tell by eye, and I guess measuring known operations (such as a controlled environment scrub) might show differing results but I suspect with other things going on these negligible changes are likely to be useless. As Michelle pointed there is drawback when sysctls are optimized for quick scrub, but this machines is only running nightly backup script fetching data from other 20 machines so this scrip sets sysctl back to sane defaults during backup not really.. optimize for scrub should only affect the system whilst the scrub and resilvers are actually happening,.. the rest of the time my systems were not affected (noticeably) my problem was a scrub would kick off and last a couple of weeks (1 week heavily preferrencing the scrub) causing the video streaming to become stuttery .. which watching a movie whilst this was happening was really not good... especially as it wasn't for just a few seconds/minutes/hours. sysctl vfs.zfs.scrub_delay=4 > /dev/null sysctl vfs.zfs.top_maxinflight=32 > /dev/null sysctl vfs.zfs.resilver_min_time_ms=3000 > /dev/null sysctl vfs.zfs.resilver_delay=2 > /dev/null At the and it reloads back optimized settings from sysctel.conf Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 09 May 2019, at 21:27, Bob Bishop wrote: > > >> On 9 May 2019, at 12:17, Michelle Sullivan wrote: >> >> >> >> Michelle Sullivan >> http://www.mhix.org/ >> Sent from my iPad >> >>> On 09 May 2019, at 17:46, Patrick M. Hausen wrote: >>> >>> Hi all, >>> >>>> Am 09.05.2019 um 00:55 schrieb Michelle Sullivan : >>>> No, one disk in the 16 disk zRAID2 ... previously unseen but it could be >>>> the errors have occurred in the last 6 weeks... everytime I reboot it >>>> started resilvering, gets to 761M resilvered and then stops. >>> >>> 16 disks in *one* RAIDZ2 vdev? That might be the cause of your insanely >>> long scrubs. In general it is not recommended though I cannot find the >>> source for that information quickly just now. >> >> I have seen posts on various lists stating don’t go over 8.. I know people >> in Oracle, the word is it should matter... who do you believe? > > Inter alia it depends on the quality/bandwidth of disk controllers. Interestingly, just got windows 7 installed on a usb stick with the windows based zfs recovery tool... now scrubs and resilvers report around 70MB/s on all versions of FreeBSD I have tried (9.3 thru 13-CURRENT), indeed even on my own version with the Broadcom native SAS driver replacing the FreeBSD one... Results are immediately different.. it *says* it’s using 1.6/1.7 cores, ~2G Ram and getting a solid 384MBps (yes B not b) with 100% disk io that’s a massive difference. This is using the windows 7 (sp1) built in driver... I can only guess that has to be pci bus handling differences or the throughput report is wrong. (Note “solid” it is fluctuating between 381 and 386, but 97% (ish - Ie a guess) of the time at 384) > >> Michelle >> >>> >>> Kind regards, >>> Patrick >>> -- >>> punkt.de GmbHInternet - Dienstleistungen - Beratung >>> Kaiserallee 13aTel.: 0721 9109-0 Fax: -100 >>> 76133 Karlsruhei...@punkt.dehttp://punkt.de >>> AG Mannheim 108285Gf: Juergen Egeling >>> >> ___ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > > > -- > Bob Bishop t: +44 (0)118 940 1243 > r...@gid.co.uk m: +44 (0)783 626 4518 > > > > > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 09 May 2019, at 22:50, Alan Somers wrote: > >> On Thu, May 9, 2019 at 5:37 AM Miroslav Lachman <000.f...@quip.cz> wrote: >> >> Dimitry Andric wrote on 2019/05/09 13:02: >>> On 9 May 2019, at 10:32, Miroslav Lachman <000.f...@quip.cz> wrote: >> >> [...] >> >>>> Disks are OK, monitored by smartmontools. There is nothing odd, just the >>>> long long scrubs. This machine was started with 4x 1TB (now 4x 4TB) and >>>> scrub was slow with 1TB disks too. This machine - HP ML110 G8) was my >>>> first machine with ZFS. If I remember it well it was FreeBSD 7.0, now >>>> running 11.2. Scrub was / is always about one week. (I tried some sysctl >>>> tuning without much gain) >>> >>> Unfortunately https://svnweb.freebsd.org/changeset/base/339034, which >>> greatly speeds up scrubs and resilvers, was not in 11.2 (since it was >>> cut at r334458). >>> >>> If you could update to a more recent snapshot, or try the upcoming 11.3 >>> prereleases, you will hopefully see much shorter scrub times. >> >> Thank you. I will try 11-STABLE / 11.3-PRERELEASE soon and let you know >> about the difference. >> >> Kind regards >> Miroslav Lachman > > On 11.3 and even much older releases, you can greatly speed up scrub > and resilver by tweaking some sysctls. If you have spinning rust, > raise vfs.zfs.top_maxinflight so they'll do fewer seeks. I used to > set it to 8192 on machines with 32GB of RAM. Raising > vfs.zfs.resilver_min_time_ms to 5000 helps a little, too. I tried this, but I found that whilst it could speed up the resilver (and scrubs) by as much as 25% it also had a performance hit were reads (particularly streaming video - which is what my server mostly did) would “pause” and “stutter” .. the balance came when I brought it back to around 200 * 15) . It would still stutter when running multiple streams (to the point of heavy load) but that was kinda expected... I tried to keep the load distributed off it and let the other front end servers stream and it seemed to result in a healthy balance. > > -Alan > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 09 May 2019, at 19:41, Borja Marcos wrote: > > > >> On 9 May 2019, at 00:55, Michelle Sullivan wrote: >> >> >> >> This is true, but I am of the thought in alignment with the zFs devs this >> might not be a good idea... if zfs can’t work it out already, the best thing >> to do will probably be get everything off it and reformat… > > That’s true, I would rescue what I could and create the pool again but after > testing the setup thoroughly. > +1 > It would be worth to have a look at the excellent guide offered by the > FreeNAS people. It’s full of excellent advice and a > priceless list of “donts” such as SATA port multipliers, etc. > Yeah already worked out over time port multipliers can’t be good. >> >>> That sound not be hard to write if everything else on the disk has no >>> issues. Don't you say in another message that the system is now returning >>> 100's of drive errors. >> >> No, one disk in the 16 disk zRAID2 ... previously unseen but it could be >> the errors have occurred in the last 6 weeks... everytime I reboot it >> started resilvering, gets to 761M resilvered and then stops. > > That’s a really bad sign. It shouldn’t happen. That’s since the metadata corruption. That is probably part of the problem. > >>> How does that relate the statement =>Everything on >>> the disk is fine except for a little bit of corruption in the freespace map? >> >> Well I think it goes through until it hits that little bit of corruption at >> stops it mounting... then stops again.. >> >> I’m seeing 100s of hard errors at the beginning of one of the drives.. they >> were reported in syslog but only just so could be a new thing. Could be >> previously undetected.. no way to know. > > As for disk monitoring, smartmontools can be pretty good although only as an > indicator. I also monitor my systems using Orca (I wrote a crude “devilator” > many years > ago) and I gather disk I/O statistics using GEOM of which the > read/write/delete/flush times are very valuable. An ailing disk can be > returning valid data but become very slow due to retries. Yes, though often these will show up in syslog (something I monitor religiously... though I concede that when it hits syslog it’s probably already and urgent issue. Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 09 May 2019, at 17:46, Patrick M. Hausen wrote: > > Hi all, > >> Am 09.05.2019 um 00:55 schrieb Michelle Sullivan : >> No, one disk in the 16 disk zRAID2 ... previously unseen but it could be >> the errors have occurred in the last 6 weeks... everytime I reboot it >> started resilvering, gets to 761M resilvered and then stops. > > 16 disks in *one* RAIDZ2 vdev? That might be the cause of your insanely > long scrubs. In general it is not recommended though I cannot find the > source for that information quickly just now. I have seen posts on various lists stating don’t go over 8.. I know people in Oracle, the word is it should matter... who do you believe? Michelle > > Kind regards, > Patrick > -- > punkt.de GmbHInternet - Dienstleistungen - Beratung > Kaiserallee 13aTel.: 0721 9109-0 Fax: -100 > 76133 Karlsruhei...@punkt.dehttp://punkt.de > AG Mannheim 108285Gf: Juergen Egeling > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 09 May 2019, at 21:02, Dimitry Andric wrote: > >> On 9 May 2019, at 10:32, Miroslav Lachman <000.f...@quip.cz> wrote: >> >> Patrick M. Hausen wrote on 2019/05/09 09:46: >>> Hi all, >>>> Am 09.05.2019 um 00:55 schrieb Michelle Sullivan : >>>> No, one disk in the 16 disk zRAID2 ... previously unseen but it could be >>>> the errors have occurred in the last 6 weeks... everytime I reboot it >>>> started resilvering, gets to 761M resilvered and then stops. >>> 16 disks in *one* RAIDZ2 vdev? That might be the cause of your insanely >>> long scrubs. In general it is not recommended though I cannot find the >>> source for that information quickly just now. >> >> Extremely slow scrub is an issue even on 4 disks RAIDZ. I already posted >> about it in the past. This scrub is running from Sunday 3AM. >> Time to go is big lie. Is was "19hXXm" 12 hour ago. >> >> pool: tank0 >> state: ONLINE >> scan: scrub in progress since Sun May 5 03:01:48 2019 >> 10.8T scanned out of 12.7T at 30.4M/s, 18h39m to go >> 0 repaired, 84.72% done >> config: >> >> NAMESTATE READ WRITE CKSUM >> tank0 ONLINE 0 0 0 >> raidz1-0 ONLINE 0 0 0 >> gpt/disk0tank0 ONLINE 0 0 0 >> gpt/disk1tank0 ONLINE 0 0 0 >> gpt/disk2tank0 ONLINE 0 0 0 >> gpt/disk3tank0 ONLINE 0 0 0 >> >> Disks are OK, monitored by smartmontools. There is nothing odd, just the >> long long scrubs. This machine was started with 4x 1TB (now 4x 4TB) and >> scrub was slow with 1TB disks too. This machine - HP ML110 G8) was my first >> machine with ZFS. If I remember it well it was FreeBSD 7.0, now running >> 11.2. Scrub was / is always about one week. (I tried some sysctl tuning >> without much gain) > > Unfortunately https://svnweb.freebsd.org/changeset/base/339034, which > greatly speeds up scrubs and resilvers, was not in 11.2 (since it was > cut at r334458). > > If you could update to a more recent snapshot, or try the upcoming 11.3 > prereleases, you will hopefully see much shorter scrub times. > > -Dimitry > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad On 09 May 2019, at 01:55, Walter Parker wrote: >> >> >> ZDB (unless I'm misreading it) is able to find all 34m+ files and >> verifies the checksums. The problem is in the zfs data structures (one >> definitely, two maybe, metaslabs fail checksums preventing the mounting >> (even read-only) of the volumes.) >> >>> Especially, how to you know >>> before you recovered the data from the drive. >> See above. >> >>> As ZFS meta data is stored >>> redundantly on the drive and never in an inconsistent form (that is what >>> fsck does, it fixes the inconsistent data that most other filesystems >> store >>> when they crash/have disk issues). >> The problem - unless I'm reading zdb incorrectly - is limited to the >> structure rather than the data. This fits with the fact the drive was >> isolated from user changes when the drive was being resilvered so the >> data itself was not being altered .. that said, I am no expert so I >> could easily be completely wrong. >> >> What it sounds like you need is a meta data fixer, not a file recovery > tool. This is true, but I am of the thought in alignment with the zFs devs this might not be a good idea... if zfs can’t work it out already, the best thing to do will probably be get everything off it and reformat... > Assuming the meta data can be fixed that would be the easy route. That’s the thing... I don’t know if it can be easily fixed... more I think the meta data can probably be easily fixed, but I suspect the spacemap can’t and as such if it can’t there is going to be one of two things... either a big hole (or multiple little ones) or the likelihood of new data overwriting partially or in full, old data and this would not be good.. > That sound not be hard to write if everything else on the disk has no > issues. Don't you say in another message that the system is now returning > 100's of drive errors. No, one disk in the 16 disk zRAID2 ... previously unseen but it could be the errors have occurred in the last 6 weeks... everytime I reboot it started resilvering, gets to 761M resilvered and then stops. > How does that relate the statement =>Everything on > the disk is fine except for a little bit of corruption in the freespace map? Well I think it goes through until it hits that little bit of corruption at stops it mounting... then stops again.. I’m seeing 100s of hard errors at the beginning of one of the drives.. they were reported in syslog but only just so could be a new thing. Could be previously undetected.. no way to know. > > >> >>> >>> I have a friend/business partner that doesn't want to move to ZFS because >>> his recovery method is wait for a single drive (no-redundancy, sometimes >> no >>> backup) to fail and then use ddrescue to image the broken drive to a new >>> drive (ignoring any file corruption because you can't really tell without >>> ZFS). He's been using disk rescue programs for so long that he will not >>> move to ZFS, because it doesn't have a disk rescue program. >> >> The first part is rather cavilier .. the second part I kinda >> understand... its why I'm now looking at alternatives ... particularly >> being bitten as badly as I have with an unmountable volume. >> >> On the system I managed for him, we had a system with ZFS crap out. I > restored it from a backup. I continue to believe that people running > systems without backups are living on borrowed time. The idea of relying on > a disk recovery tool is too risky for my taste. > > >>> He has systems >>> on Linux with ext3 and no mirroring or backups. I've asked about moving >>> them to a mirrored ZFS system and he has told me that the customer >> doesn't >>> want to pay for a second drive (but will pay for hours of his time to fix >>> the problem when it happens). You kind of sound like him. >> Yeah..no! I'd be having that on a second (mirrored) drive... like most >> of my production servers. >> >>> ZFS is risky >>> because there isn't a good drive rescue program. >> ZFS is good for some applications. ZFS is good to prevent cosmic ray >> issues. ZFS is not good when things go wrong. ZFS doesn't usually go >> wrong. Think that about sums it up. >> >> When it does go wrong I restore from backups. Therefore my systems don't > have problems. I sorry you had the perfect trifecta that caused you to lose > multiple drives and all your backups at the same time. > > >>> Sun's design was that the >>> system should be redundant by defa
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 09 May 2019, at 03:04, Karl Denninger wrote: > >> On 5/8/2019 11:53, Freddie Cash wrote: >>> On Wed, May 8, 2019 at 9:31 AM Karl Denninger wrote: >>> >>> I have a system here with about the same amount of net storage on it as >>> you did. It runs scrubs regularly; none of them take more than 8 hours >>> on *any* of the pools. The SSD-based pool is of course *much* faster >>> but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it >>> kicks off automatically at 2:00 AM when the time comes but is complete >>> before noon. I run them on 14 day intervals. >>> >> (description elided) > > That is a /lot /bigger pool than either Michelle or I are describing. Not quite... My pool is 16*3T SATA (real spindles not SSD and no cache) =48T raw.. It is storage, remember write once or twice, read lots elsewhere in the thread? > > We're both in the ~20Tb of storage space area 20T +6T zvol was what was left on it whilst shuffling stuff around.. I had already moved off 8T of data.. the zvol is still fine and accessible. > . You're running 5-10x > that in usable space in some of these pools and yet seeing ~2 day scrub > times on a couple of them (that is, the organization looks pretty > reasonable given the size and so is the scrub time), one that's ~5 days > and likely has some issues with parallelism and fragmentation, and then, > well, two awfuls which are both dedup-enabled. > > -- > Karl Denninger > k...@denninger.net <mailto:k...@denninger.net> > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Paul Mather wrote: On May 8, 2019, at 9:59 AM, Michelle Sullivan wrote: Paul Mather wrote: due to lack of space. Interestingly have had another drive die in the array - and it doesn't just have one or two sectors down it has a *lot* - which was not noticed by the original machine - I moved the drive to a byte copier which is where it's reporting 100's of sectors damaged... could this be compounded by zfs/mfi driver/hba not picking up errors like it should? Did you have regular pool scrubs enabled? It would have picked up silent data corruption like this. It does for me. Yes, every month (once a month because, (1) the data doesn't change much (new data is added, old it not touched), and (2) because to complete it took 2 weeks.) Do you also run sysutils/smartmontools to monitor S.M.A.R.T. attributes? Although imperfect, it can sometimes signal trouble brewing with a drive (e.g., increasing Reallocated_Sector_Ct and Current_Pending_Sector counts) that can lead to proactive remediation before catastrophe strikes. not Automatically Unless you have been gathering periodic drive metrics, you have no way of knowing whether these hundreds of bad sectors have happened suddenly or slowly over a period of time. no, it something i have thought about but been unable to spend the time on. -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Paul Mather wrote: due to lack of space. Interestingly have had another drive die in the array - and it doesn't just have one or two sectors down it has a *lot* - which was not noticed by the original machine - I moved the drive to a byte copier which is where it's reporting 100's of sectors damaged... could this be compounded by zfs/mfi driver/hba not picking up errors like it should? Did you have regular pool scrubs enabled? It would have picked up silent data corruption like this. It does for me. Yes, every month (once a month because, (1) the data doesn't change much (new data is added, old it not touched), and (2) because to complete it took 2 weeks.) Michelle -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Borja Marcos via freebsd-stable wrote: On 8 May 2019, at 05:09, Walter Parker wrote: Would a disk rescue program for ZFS be a good idea? Sure. Should the lack of a disk recovery program stop you from using ZFS? No. If you think so, I suggest that you have your data integrity priorities in the wrong order (focusing on small, rare events rather than the common base case). ZFS is certainly different from other flesystems. Its self healing capabilities help it survive problems that would destroy others. But if you reach a level of damage past that “tolerable” threshold consider yourself dead. bingo. Is it possible at all to write an effective repair tool? It would be really complicated. which is why I don't think a 'repair tool' is the correct way to go.. I get the ZFS devs saying 'no' to it, I really do. A tool to scan and salvage (if possible) the data on it is what it needs I think... copy off, rebuild the structure (reformat) and copy back. This tool is what I was pointed at: https://www.klennet.com/zfs-recovery/default.aspx ... no idea if it works yet.. but if it does what it says it does it is the 'missing link' I'm looking for... just I am having issues getting Windows 7 with SP1 on a USB stick to get .net 4.5 on it to run the software... :/ (only been at it 2 days though, so time yet.) By the way, ddrescue can help in a multiple drive failure scenery with ZFS. Been there done that - that's how I rescued it when it was damaged in shipping.. though I think I used 'recoverdisk' rather than ddrescue ... pretty much the same thing if not the same code. sector copied all three dead drives to new drives, put the three dead back in, brought them back online and then let it resilver... the data was recovered intact and not reporting any permanent errors. If some of the drives are showing the typical problem of “flaky” sectors with a lot of retries slowing down the whole pool you can shut down the system or at least export the pool, copy the required drive/s to fresh ones, replace the flaky drives and try to import the pool. I would first do the experiment to make sure it’s harmless, but ZFS relies on labels written on the disks to import a pool regardless of disk controller topology, devices names, uuids, or whatever. So a full disk copy should work. Don't need to test it... been there done that - it works. Michelle, were you doing periodic scrubs? I’m not sure you mentioned it. Yes though once a month as it took 2 weeks to complete. Michelle -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Karl Denninger wrote: On 5/7/2019 00:02, Michelle Sullivan wrote: The problem I see with that statement is that the zfs dev mailing lists constantly and consistently following the line of, the data is always right there is no need for a “fsck” (which I actually get) but it’s used to shut down every thread... the irony is I’m now installing windows 7 and SP1 on a usb stick (well it’s actually installed, but sp1 isn’t finished yet) so I can install a zfs data recovery tool which reports to be able to “walk the data” to retrieve all the files... the irony eh... install windows7 on a usb stick to recover a FreeBSD installed zfs filesystem... will let you know if the tool works, but as it was recommended by a dev I’m hopeful... have another array (with zfs I might add) loaded and ready to go... if the data recovery is successful I’ll blow away the original machine and work out what OS and drive setup will be safe for the data in the future. I might even put FreeBSD and zfs back on it, but if I do it won’t be in the current Zraid2 config. Meh. Hardware failure is, well, hardware failure. Yes, power-related failures are hardware failures. Never mind the potential for /software /failures. Bugs are, well, bugs. And they're a real thing. Never had the shortcomings of UFS bite you on an "unexpected" power loss? Well, I have. Is ZFS absolutely safe against any such event? No, but it's safe*r*. Yes and no ... I'll explain... I've yet to have ZFS lose an entire pool due to something bad happening, but the same basic risk (entire filesystem being gone) Everytime I have seen this issue (and it's been more than once - though until now recoverable - even if extremely painful) - its always been during a resilver of a failed drive and something happening... panic, another drive failure, power etc.. any other time its rock solid... which is the yes and no... under normal circumstances zfs is very very good and seems as safe as or safer than UFS... but my experience is ZFS has one really bad flaw.. if there is a corruption in the metadata - even if the stored data is 100% correct - it will fault the pool and thats it it's gone barring some luck and painful recovery (backups aside) ... this other file systems also suffer but there are tools that *majority of the time* will get you out of the s**t with little pain. Barring this windows based tool I haven't been able to run yet, zfs appears to have nothing. has occurred more than once in my IT career with other filesystems -- including UFS, lowly MSDOS and NTFS, never mind their predecessors all the way back to floppy disks and the first 5Mb Winchesters. Absolutely, been there done that.. and btrfs...*ouch* still as bad.. however with the only one btrfs install I had (I didn't knopw it was btrfs underneath, but netgear NAS...) I was still able to recover the data even though it had screwed the file system so bad I vowed never to consider or use it again on anything ever... I learned a long time ago that two is one and one is none when it comes to data, and WHEN two becomes one you SWEAT, because that second failure CAN happen at the worst possible time. and does.. As for RaidZ2 .vs. mirrored it's not as simple as you might think. Mirrored vdevs can only lose one member per mirror set, unless you use three-member mirrors. That sounds insane but actually it isn't in certain circumstances, such as very-read-heavy and high-performance-read environments. I know - this is why I don't use mirrored - because wear patterns will ensure both sides of the mirror are closely matched. The short answer is that a 2-way mirrored set is materially faster on reads but has no acceleration on writes, and can lose one member per mirror. If the SECOND one fails before you can resilver, and that resilver takes quite a long while if the disks are large, you're dead. However, if you do six drives as a 2x3 way mirror (that is, 3 vdevs each of a 2-way mirror) you now have three parallel data paths going at once and potentially six for reads -- and performance is MUCH better. A 3-way mirror can lose two members (and could be organized as 3x2) but obviously requires lots of drive slots, 3x as much *power* per gigabyte stored (and you pay for power twice; once to buy it and again to get the heat out of the room where the machine is.) my problem (as always) is slots not so much the power. Raidz2 can also lose 2 drives without being dead. However, it doesn't get any of the read performance improvement *and* takes a write performance penalty; Z2 has more write penalty than Z1 since it has to compute and write two parity entries instead of one, although in theory at least it can parallel those parity writes -- albeit at the cost of drive bandwidth congestion (e.g. interfering with other accesses to the same disk at the same time.) In short RaidZx performs about as "well" as the *slowest* disk in the set. Which
Re: ZFS...
Paul Mather wrote: On May 7, 2019, at 1:02 AM, Michelle Sullivan wrote: On 07 May 2019, at 10:53, Paul Mather wrote: On May 6, 2019, at 10:14 AM, Michelle Sullivan wrote: My issue here (and not really what the blog is about) FreeBSD is defaulting to it. You've said this at least twice now in this thread so I'm assuming you're asserting it to be true. As of FreeBSD 12.0-RELEASE (and all earlier releases), FreeBSD does NOT default to ZFS. The images distributed by freebsd.org, e.g., Vagrant boxes, ARM images, EC2 instances, etc., contain disk images where FreeBSD resides on UFS. For example, here's what you end up with when you launch a 12.0-RELEASE instance using defaults on AWS (us-east-1 region: ami-03b0f822e17669866): root@freebsd:/usr/home/ec2-user # gpart show => 3 20971509 ada0 GPT (10G) 3 123 1 freebsd-boot (62K) 126 20971386 2 freebsd-ufs (10G) And this is what you get when you "vagrant up" the freebsd/FreeBSD-12.0-RELEASE box: root@freebsd:/home/vagrant # gpart show => 3 65013755 ada0 GPT (31G) 3 123 1 freebsd-boot (62K) 126 2097152 2 freebsd-swap (1.0G) 2097278 62914560 3 freebsd-ufs (30G) 65011838 1920- free - (960K) When you install from the 12.0-RELEASE ISO, the first option listed during the partitioning stage is "Auto (UFS) Guided Disk Setup". The last option listed---after "Open a shell and partition by hand" is "Auto (ZFS) Guided Root-on-ZFS". In other words, you have to skip over UFS and manual partitioning to select the ZFS install option. So, I don't see what evidence there is that FreeBSD is defaulting to ZFS. It hasn't up to now. Will FreeBSD 13 default to ZFS? Umm.. well I install by memory stick images and I had a 10.2 and an 11.0 both of which had root on zfs as the default.. I had to manually change them. I haven’t looked at anything later... so did something change? Am I in cloud cookoo land? I don't know about that, but you may well be misremembering. I just pulled down the 10.2 and 11.0 installers from http://ftp-archive.freebsd.org/pub/FreeBSD-Archive/old-releases and in both cases the choices listed in the "Partitioning" step are the same as in the current 12.0 installer: "Auto (UFS) Guided Disk Setup" is listed first and selected by default. "Auto (ZFS) Guided Root-on-ZFS" is listed last (you have to skip past other options such as manually partitioning by hand to select it). I'm confident in saying that ZFS is (or was) not the default partitioning option in either 10.2 or 11.0 as officially released by FreeBSD. Did you use a custom installer you made yourself when installing 10.2 or 11.0? it was an emergency USB stick.. so downloaded straight from the website. My process is boot, select "manual" (so I can set single partition and a swap partition as historically it's done other things) select the whole disk and create partition - this is where I saw it... 'freebsd-zfs' as the default. Second 'create' defaults to 'freebsd-swap' which is always correct. Interestingly the -CURRENT installer just says, "freebsd" and not either -ufs or -zfs ... what ever that defaults to I don't know. FreeBSD used to be targeted at enterprise and devs (which is where I found it)... however the last few years have been a big push into the consumer (compete with Linux) market.. so you have an OS that concerns itself with the desktop and upgrade after upgrade after upgrade (not just patching security issues, but upgrades as well.. just like windows and OSX)... I get it.. the money is in the keeping of the user base.. but then you install a file system which is dangerous on a single disk by default... dangerous because it’s trusted and “can’t fail” .. until it goes titsup.com and then the entire drive is lost and all the data on it.. it’s the double standard... advocate you need ECC ram, multiple vdevs etc, then single drive it.. sorry.. which one is it? Gaarrrhhh! As people have pointed out elsewhere in this thread, it's false to claim that ZFS is unsafe on consumer hardware. It's no less safe than UFS on single-disk setups. Because anecdote is not evidence, I will refrain from saying, "I've lost far more data on UFS than I have on ZFS (especially when SUJ was shaking out its bugs)..." >;-) What I will agree with is that, probably due to its relative youth, ZFS has less forensics/data recovery tools than UFS. I'm sure this will improve as time goes on. (I even posted a link to an article describing someone adding ZFS support to a forensics toolkit earlier in this thread.) The problem I see with that statement is that the zfs dev mailing lists constantly and consistently following the line of, the data is always right there is no need for a “fsck”
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 07 May 2019, at 10:53, Paul Mather wrote: > >> On May 6, 2019, at 10:14 AM, Michelle Sullivan wrote: >> >> My issue here (and not really what the blog is about) FreeBSD is defaulting >> to it. > > You've said this at least twice now in this thread so I'm assuming you're > asserting it to be true. > > As of FreeBSD 12.0-RELEASE (and all earlier releases), FreeBSD does NOT > default to ZFS. > > The images distributed by freebsd.org, e.g., Vagrant boxes, ARM images, EC2 > instances, etc., contain disk images where FreeBSD resides on UFS. For > example, here's what you end up with when you launch a 12.0-RELEASE instance > using defaults on AWS (us-east-1 region: ami-03b0f822e17669866): > > root@freebsd:/usr/home/ec2-user # gpart show > => 3 20971509 ada0 GPT (10G) > 3 123 1 freebsd-boot (62K) > 126 20971386 2 freebsd-ufs (10G) > > And this is what you get when you "vagrant up" the > freebsd/FreeBSD-12.0-RELEASE box: > > root@freebsd:/home/vagrant # gpart show > => 3 65013755 ada0 GPT (31G) > 3 123 1 freebsd-boot (62K) > 126 2097152 2 freebsd-swap (1.0G) > 2097278 62914560 3 freebsd-ufs (30G) > 65011838 1920- free - (960K) > > > When you install from the 12.0-RELEASE ISO, the first option listed during > the partitioning stage is "Auto (UFS) Guided Disk Setup". The last option > listed---after "Open a shell and partition by hand" is "Auto (ZFS) Guided > Root-on-ZFS". In other words, you have to skip over UFS and manual > partitioning to select the ZFS install option. > > So, I don't see what evidence there is that FreeBSD is defaulting to ZFS. It > hasn't up to now. Will FreeBSD 13 default to ZFS? > Umm.. well I install by memory stick images and I had a 10.2 and an 11.0 both of which had root on zfs as the default.. I had to manually change them. I haven’t looked at anything later... so did something change? Am I in cloud cookoo land? > >> FreeBSD used to be targeted at enterprise and devs (which is where I found >> it)... however the last few years have been a big push into the consumer >> (compete with Linux) market.. so you have an OS that concerns itself with >> the desktop and upgrade after upgrade after upgrade (not just patching >> security issues, but upgrades as well.. just like windows and OSX)... I get >> it.. the money is in the keeping of the user base.. but then you install a >> file system which is dangerous on a single disk by default... dangerous >> because it’s trusted and “can’t fail” .. until it goes titsup.com and then >> the entire drive is lost and all the data on it.. it’s the double >> standard... advocate you need ECC ram, multiple vdevs etc, then single drive >> it.. sorry.. which one is it? Gaarrrhhh! > > > As people have pointed out elsewhere in this thread, it's false to claim that > ZFS is unsafe on consumer hardware. It's no less safe than UFS on > single-disk setups. > > Because anecdote is not evidence, I will refrain from saying, "I've lost far > more data on UFS than I have on ZFS (especially when SUJ was shaking out its > bugs)..." >;-) > > What I will agree with is that, probably due to its relative youth, ZFS has > less forensics/data recovery tools than UFS. I'm sure this will improve as > time goes on. (I even posted a link to an article describing someone adding > ZFS support to a forensics toolkit earlier in this thread.) The problem I see with that statement is that the zfs dev mailing lists constantly and consistently following the line of, the data is always right there is no need for a “fsck” (which I actually get) but it’s used to shut down every thread... the irony is I’m now installing windows 7 and SP1 on a usb stick (well it’s actually installed, but sp1 isn’t finished yet) so I can install a zfs data recovery tool which reports to be able to “walk the data” to retrieve all the files... the irony eh... install windows7 on a usb stick to recover a FreeBSD installed zfs filesystem... will let you know if the tool works, but as it was recommended by a dev I’m hopeful... have another array (with zfs I might add) loaded and ready to go... if the data recovery is successful I’ll blow away the original machine and work out what OS and drive setup will be safe for the data in the future. I might even put FreeBSD and zfs back on it, but if I do it won’t be in the current Zraid2 config. > > Cheers, > > Paul. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 06 May 2019, at 22:23, Walter Cramer wrote: > >> On Mon, 6 May 2019, Patrick M. Hausen wrote: >> >> Hi! >> >>> Am 30.04.2019 um 18:07 schrieb Walter Cramer : > >>> With even a 1Gbit ethernet connection to your main system, savvy use of >>> (say) rsync (net/rsync in Ports), and the sort of "know your data / divide >>> & conquer" tactics that Karl mentions, you should be able to complete >>> initial backups (on both backup servers) in <1 month. After that - rsync >>> can generally do incremental backups far, far faster. >> >> ZFS can do incremental snapshots and send/receive much faster than rsync on >> the file level. And e.g. FreeNAS comes with all the bells and whistles >> already in place - just a matter of point and click to replicate one set of >> datasets on one server to another one … >> > True. But I was making a brief suggestion to Michelle - who does not seem to > be a trusting fan of ZFS - hoping that she might actually implement it, I implemented it for 8 years. It’s great on enterprise hardware in enterprise dcs (except when it isn’t, but that’s a rare occurrence..as I have found).. but it is (in my experience) an absolute f***ing disaster waiting to happen on any consumer hardware... how many laptops do you know with more than one drive? My issue here (and not really what the blog is about) FreeBSD is defaulting to it. FreeBSD used to be targeted at enterprise and devs (which is where I found it)... however the last few years have been a big push into the consumer (compete with Linux) market.. so you have an OS that concerns itself with the desktop and upgrade after upgrade after upgrade (not just patching security issues, but upgrades as well.. just like windows and OSX)... I get it.. the money is in the keeping of the user base.. but then you install a file system which is dangerous on a single disk by default... dangerous because it’s trusted and “can’t fail” .. until it goes titsup.com and then the entire drive is lost and all the data on it.. it’s the double standard... advocate you need ECC ram, multiple vdevs etc, then single drive it.. sorry.. which one is it? Gaarrrhhh! Back to installing windows 7 (yes really!) and the zfs file recovery tool someone made... (yes really!) > or something similar. Or at least an already-tediously-long mailing list > thread would end. Rsync is good enough for her situation, and would let her > use UFS on her off-site backup servers, if she preferred that. Upon reflection as most data on the drive is write once read lots, yes I should have. This machine is mostly used as a large media server, media is put on, it is cataloged and moved around to logical places, then it never changes until it’s deleted. I made the mistake of moving stuff onto it to reshuffle the main data server when it died... I have no backups of some critical data that’s why I’m p**sed.. it’s not FreeBSD or ZFSs fault, it’s my own stupidity for trusting ZFS would be good for a couple of weeks whilst I got everything organized... Michelle >> >> *Local* replication is a piece of cake today, if you have the hardware. >> >> Kind regards, >> Patrick >> -- >> punkt.de GmbHInternet - Dienstleistungen - Beratung >> Kaiserallee 13aTel.: 0721 9109-0 Fax: -100 >> 76133 Karlsruhei...@punkt.dehttp://punkt.de >> AG Mannheim 108285Gf: Juergen Egeling > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Pete French wrote: On 05/05/2019 04:06, Michelle Sullivan wrote: Which I find interesting in itself as I have a machine running 9.3 which started life as a 5.x (which tells you how old it is) and it’s still running on the same *compaq* raid5 with UFS on it... with the original drives, with a hot spare that still hasn’t been used... and the only thing done to it hardware wise is I replaced the motherboard 12 months ago as it just stopped POSTing and couldn’t work out what failed...never had a drive corruption barring the fscks following hard power issues... it went with me from Brisbane to Canberra, back to Brisbane by back of car, then to Malta, back from Malta and is still downstairs... it’s my primary MX server and primary resolver for home and handles around 5k email per day.. Heh, Ok, thats cool :-) Some of my old HP RAID systems started life as Compaq ones - you never installed the firmware update which simply changed the name it printed on boot then ? Umm, does it change the big startup "COMPAQ" graphic? If not then dunno... if it does... nope :) My personal server with the dead battery has been going at least 12 years. Had to replace the drives (and HP SAS drives are still silly prices sadly), one of the onboard ether ports has died, but otherwise still going strong. IIRC i've put 3 new clock batteries in over the years... and it's all SCSI... 18GB (no SAS on the machine) :P ... (in fact, 32bit and not capable of driving a SAS card - unless you can get PCI or ISA SAS cards :P ) Not had the long distance travel of yours though. I did ship some machines to Jersey once, but boat, and all the drives which had been on the crossing failed one by one within a few months of arriving. Makes me wonder how rough the sea that crossing actually was. The biggest issue I had was the idiots who unloaded the container at Customs.. not saying much except they loaded it backwards (literally) ... a 3KVA ups (with batteries in it) was put at the top and by the time it got from Botany to me it had made its way to the bottom... Those were in a Compaq RAID pedestal too. After that I shipped machines, but took the drives in my hand luggage on planes always. Actiually, not sure they would let me do that these days, havent triued in years. Good question. -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 05 May 2019, at 05:36, Chris wrote: > > Sorry t clarify, Michelle I do believe your tail of events, just I > meant that it reads like a tale as its so unusual. There are multiple separate instances of problems over 8 years, but the final killer was without a doubt a catalog of disasters.. > > I also agree that there probably at this point of time should be more > zfs tools written for the few situations that do happen when things > get broken. This is my thought..though I am in agreement with the devs that a ZFS “fsck” is not the way to go. I think we (anyone using zfs) needs to have a “salvage what data you can to elsewhere” type tool... I am yet to explore the one written under windows that a dev sent me to see if that works (only because of the logistics of getting a windows 7 image on a USB drive that I can put into the server for recovery attempting.). If it works a version for command line would be the real answer to my prayers (and others I imagine.) > > Although I still standby my opinion I consider ZFS a huge amount more > robust than UFS, UFS always felt like I only had to sneeze the wrong > way and I would get issues. There was even one occasion simply > installing the OS on its defaults, gave me corrupted data on UFS (9.0 > release had nasty UFS journalling bug which corrupted data without any > power cuts etc.). Which I find interesting in itself as I have a machine running 9.3 which started life as a 5.x (which tells you how old it is) and it’s still running on the same *compaq* raid5 with UFS on it... with the original drives, with a hot spare that still hasn’t been used... and the only thing done to it hardware wise is I replaced the motherboard 12 months ago as it just stopped POSTing and couldn’t work out what failed...never had a drive corruption barring the fscks following hard power issues... it went with me from Brisbane to Canberra, back to Brisbane by back of car, then to Malta, back from Malta and is still downstairs... it’s my primary MX server and primary resolver for home and handles around 5k email per day.. > > In future I suggest you use mirror if the data matters. I know it > costs more in capacity for redundancy but in todays era of large > drives its the only real sensible option. Now it is and it was on my list of things to start just before this happened... in fact I have already got 4*6T drives to copy everything off ready to rebuild the entire pool with 16*6T drives in a raid 10 like config... the power/corruption beat me to it. > > On the drive failures you have clearly been quite unlucky, and the > other stuff is unusual. > Drive failure wise, I think my “luck” has been normal... remember this is an 8 year old system drives are only certified for 3 years... getting 5 years when 24x7 is not bad (especially considering its workload). The problem has always been how zfs copes, and this has been getting better overtime, but this metadata corruption is something I have seen similar before and that is where I have a problem with it... (especially when zfs devs start making statements about how the system is always right and everything else is because of hardware and if you’re not running enterprise hardware you deserve what you get... then advocating installing it on laptops etc..!) > Best of luck Thanks, I’ll need it as my changes to the code did not allow the mount though it did allow zdb to parse the drive... guess what I thought was there in zdb is not the same code in the zfs module. Michelle > >> On Sat, 4 May 2019 at 09:54, Pete French wrote: >> >> >> >>> On 04/05/2019 01:05, Michelle Sullivan wrote: >>> New batteries are only $19 on eBay for most battery types... >> >> Indeed, my problem is actual physical access to the machine, which I >> havent seen in ten years :-) I even have a relacement server sitting >> behind my desk which we never quite got around to installing. I think >> the next move it makes will be to the cloud though, so am not too worried. >> >> ___ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
New batteries are only $19 on eBay for most battery types... Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 03 May 2019, at 23:30, Pete French wrote: > > > >> On 03/05/2019 14:07, Michelle Sullivan wrote: >> I don’t think it will do that in write through.. it will everytime in write >> back. > > Yes, it really shouldnt do that. > > My server is so old that the battery on the RAID has failed, which > definitely makes it go into write though mode. However not an ideal solution > ;-) > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 03 May 2019, at 22:51, Kevin P. Neal wrote: > > On Fri, May 03, 2019 at 08:25:08PM +1000, Michelle Sullivan wrote: >>> On 03 May 2019, at 20:09, Borja Marcos via freebsd-stable >>> wrote: >>> >>> >>> >>>> On 3 May 2019, at 11:55, Pete French wrote: >>>> >>>> >>>> >>>>> On 03/05/2019 08:09, Borja Marcos via freebsd-stable wrote: >>>>> >>>>> The right way to use disks is to give ZFS access to the plain CAM >>>>> devices, not thorugh some so-called JBOD on a RAID >>>>> controller which, at least for a long time, has been a *logical* “RAID0” >>>>> volume on a single disk. That additional layer can >>>>> completely break the semantics of transaction writes and cache flushes. >>>>> With some older cards it can be tricky to achieve, from patching source >>>>> drivers to enabling a sysctl tunable or even >>>>> flashing the card to turn it into a plain HBA with no RAID features (or >>>>> minimal ones). >>>> >>>> Oddly enough I got bitten by something like this yesteray. I have a >>>> machine containing an HP P400 RAID controller, which is nice enough, but I >>>> run ZFS so I have made the drives all into RAID-0 as being as close as I >>>> can get to accessing the raw SAS drives. >> >> I got bitten by that on this hardware originally... switching to raid-0 and >> separate drives then switching to write-through (not write back and >> definitely not write back with bad bbu) seemed to solve it. > > I have an old Dell R610 with a PERC 6/i and Megaraid SAS driver Ver 4.23. > When I use mfiutil to set caching to write through it still goes through > the cache. Which means that if a drive fails and the machine reboots the > firmware stops the boot because it has data in the cache that it wants > to store on the failed drive. So a normal failure of a drive in a three > way ZFS mirror that shouldn't cause a loss of service actually does. > I don’t think it will do that in write through.. it will everytime in write back. > Thumbs down to RAID cards. > -- > Kevin P. Nealhttp://www.pobox.com/~kpn/ > > "Good grief, I've just noticed I've typed in a rant. Sorry chaps!" >Keir Finlow Bates, circa 1998 > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 03 May 2019, at 20:09, Borja Marcos via freebsd-stable > wrote: > > > >> On 3 May 2019, at 11:55, Pete French wrote: >> >> >> >>> On 03/05/2019 08:09, Borja Marcos via freebsd-stable wrote: >>> >>> The right way to use disks is to give ZFS access to the plain CAM devices, >>> not thorugh some so-called JBOD on a RAID >>> controller which, at least for a long time, has been a *logical* “RAID0” >>> volume on a single disk. That additional layer can >>> completely break the semantics of transaction writes and cache flushes. >>> With some older cards it can be tricky to achieve, from patching source >>> drivers to enabling a sysctl tunable or even >>> flashing the card to turn it into a plain HBA with no RAID features (or >>> minimal ones). >> >> Oddly enough I got bitten by something like this yesteray. I have a machine >> containing an HP P400 RAID controller, which is nice enough, but I run ZFS >> so I have made the drives all into RAID-0 as being as close as I can get to >> accessing the raw SAS drives. I got bitten by that on this hardware originally... switching to raid-0 and separate drives then switching to write-through (not write back and definitely not write back with bad bbu) seemed to solve it. Michelle >> >> BSD seems them as da0, da1, da2, da3 - but the RAID controller oly presents >> one of them to the BIOS, so my booting has to be all from that drive. This >> has been da0 for as long as I can remember, but yesteday it decided to start >> using what BSD sees as da1. Of course this is very hard to recognise as da0 >> and da1 are pretty much mirrors of each other. Spent a long time trying to >> work out why the fixes I was applying to da0 were not being used at boot >> time. > > Hmm What happens when you do a “camcontrol devlist”? > > Camcontrol tags da0 -v? > > How is the controller recognized by FreeBSD? For some of them it’s possible > to instruct the controller to present the physical devices to CAM. Of course > you need to be careful to avoid any logical volume configuration in that > case. > > But I would only tinker with this at system installation time, making such a > change on a running system with valid data can be disastrous. > > For mfi recognized cards there is a tunable: hw.mfi.allow_cam_disk_passthrough > > For aac cards it was a matter of commenting a couple of source code lines in > the driver (at your own risk of course). I’ve been running a > server for years doing that. > > > > > > > Borja. > > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 03 May 2019, at 04:04, N.J. Mann wrote: > > Hi, > > > I will ignore the insult and just say again "come into contact". Yes, > I do know what I am talking about and have even seen it happen. > It wasn’t an insult (certainly not intended).. and I did say, “sorta impossible” and there are a whole series of events that would have to be a precursor for it to happen.. all of which I’d consider highly unlikely (less than 0.0001% chance) where they happened and were not corrected before an event happened... even if you were to actually drop the phase on a floating terminal and take out the probability of that happening... Ie you’d have to cut/drop three safety connections, have them go unnoticed and then drop cables in two directions on a pole before you drop you phase on the connection... I should say.. bit of a reach... but then having a transformer blow up, and someone take out a power pole and have 11kv line drop on a 240v phase within 6 hours whilst resilvering a drive would be a reach as well... so point conceded, it could happen. > > Best wishes, >Nick. > -- > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 03 May 2019, at 03:18, N.J. Mann wrote: > > Hi, > > > On Friday, May 03, 2019 03:00:05 +1000 Michelle Sullivan > wrote: >>>> I am sorry to hear about your loss of data, but where does the 11kV come >>>> from? >>>> I can understand 415V, i.e. two phases in contact, but the type of overhead >>>> lines in the pictures you reference are three phase each typically 240V to >>>> neutral and 415V between two phases. >>>> >>> Bottom lines on the power pole are normal 240/415 .. top lines are the 11KV >>> distribution network. >> >> Oh and just so you know, it’s sorta impossible to get 415 down a 240v >> connection > > No it is not. As I said, if two phases come into contact you can have 415v > between > live and neutral. > > You’re not an electrician then.. the connection point on my house has the earth connected to the return on the pole and that also connected to the ground stake (using 16mm copper). You’d have to cut that link before dropping a phase on the return to get 415 past the distribution board... sorta impossible... cut the ground link first then it’s possible... but as every connection has the same, that’s a lot of ground links to cut to make it happen... unless you drop the return on both sizes of your pole and your ground stake and then drop a phase on that floating terminal ... > Best wishes, >Nick. > -- > > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 03 May 2019, at 02:24, Michelle Sullivan wrote: > > N.J. Mann wrote: >> Hi, >> >> >> On Thursday, May 02, 2019 09:27:36 +1000 Michelle Sullivan >> wrote: >>>> On 02 May 2019, at 02:16, Chris wrote: >>>> >>>> Your story is so unusual I am wondering if its not fiction, I mean all >>>> sorts of power cuts where it just so happens the UPS fails every time, >>> The only “fiction” is the date.. was the 10th not the 19th March... >>> https://www.southcoastregister.com.au/story/5945663/homes-left-without-power-after-electrical-pole-destroyed-in-sanctuary-point-accident/ >>> >>> UPSes do glitch out sometimes, but rarely.. they do have problems when >>> 11kv comes down a 240v line though... >> I am sorry to hear about your loss of data, but where does the 11kV come >> from? >> I can understand 415V, i.e. two phases in contact, but the type of overhead >> lines in the pictures you reference are three phase each typically 240V to >> neutral and 415V between two phases. >> > Bottom lines on the power pole are normal 240/415 .. top lines are the 11KV > distribution network. > Oh and just so you know, it’s sorta impossible to get 415 down a 240v connection, for that to happen you’d need to disconnect any return (neutral) and then connect it to another phase... and as most connections are TNE that means dropping the cables on the incoming connection and having a really dodgy connection. It is very unusual to get 11kv down a distribution phase and pretty much the only time I have ever seen it is when a cable is cut (a 11kv line) and it falls onto a supply cable... lightning strikes are the most common case. The second most common (and they are really not) is someone taking out a pole. The system is designed to drop power out instantly when it happens which is why it is rare to do damage but we (sparkles) all know that it’s milliseconds not “instantly”. Unfortunate part of this is if they hit the power pole next up the road.. there was no 11kv distribution cables.. as you can’t see in the photos the pole on the other side of the road had a transformer (not the same one that blew the night before...) It was a classic example of Murphy’s law, (if something can go wrong to make @ bad situation worse, it will).. if it wasn’t in the middle of a resilver it would not have had this issue. If a transformer hadn’t blown on the network 6 hours before I wouldn’t have even been connected to that substation. If the transformer that blew hadn’t ZFS would have just don’t the rollback I did the night before... if I hadn’t gone to the server room and checked everything and restarted the resilver it wouldn’t have been doing anything to the drives. If the UPSes hadn’t failed on the night it would have probably been waiting for me to rollback the 5 seconds when the pole was taken out...etc...etc... So many... “if this didn’t happens it would have been ok...” > >> Best wishes, >> Nick. > > > -- > Michelle Sullivan > http://www.mhix.org/ > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
N.J. Mann wrote: Hi, On Thursday, May 02, 2019 09:27:36 +1000 Michelle Sullivan wrote: On 02 May 2019, at 02:16, Chris wrote: Your story is so unusual I am wondering if its not fiction, I mean all sorts of power cuts where it just so happens the UPS fails every time, The only “fiction” is the date.. was the 10th not the 19th March... https://www.southcoastregister.com.au/story/5945663/homes-left-without-power-after-electrical-pole-destroyed-in-sanctuary-point-accident/ UPSes do glitch out sometimes, but rarely.. they do have problems when 11kv comes down a 240v line though... I am sorry to hear about your loss of data, but where does the 11kV come from? I can understand 415V, i.e. two phases in contact, but the type of overhead lines in the pictures you reference are three phase each typically 240V to neutral and 415V between two phases. Bottom lines on the power pole are normal 240/415 .. top lines are the 11KV distribution network. Best wishes, Nick. -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 02 May 2019, at 09:46, Michelle Sullivan wrote: > > What I do know is in the second round -FfX wouldn’t work, *after the second round ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 02 May 2019, at 03:39, Steven Hartland wrote: > > > >> On 01/05/2019 15:53, Michelle Sullivan wrote: >> Paul Mather wrote: >>>> On Apr 30, 2019, at 11:17 PM, Michelle Sullivan wrote: >>>> >>>> Been there done that though with ext2 rather than UFS.. still got all my >>>> data back... even though it was a nightmare.. >>> >>> >>> Is that an implication that had all your data been on UFS (or ext2:) this >>> time around you would have got it all back? (I've got that impression >>> through this thread from things you've written.) That sort of makes it >>> sound like UFS is bulletproof to me. >> >> Its definitely not (and far from it) bullet proof - however when the data on >> disk is not corrupt I have managed to recover it - even if it has been a >> nightmare - no structure - all files in lost+found etc... or even resorting >> to r-studio in the even of lost raid information etc.. > Yes but you seem to have done this with ZFS too, just not in this > particularly bad case. > There is no r-studio for zfs or I would have turned to it as soon as this issue hit. > If you imagine that the in memory update for the metadata was corrupted and > then written out to disk, which is what you seem to have experienced with > your ZFS pool, then you'd be in much the same position. >> >> This case - from what my limited knowledge has managed to fathom is a >> spacemap has become corrupt due to partial write during the hard power >> failure. This was the second hard outage during the resilver process >> following a drive platter failure (on a ZRAID2 - so single platter failure >> should be completely recoverable all cases - except hba failure or other >> corruption which does not appear to be the case).. the spacemap fails >> checksum (no surprises there being that it was part written) however it >> cannot be repaired (for what ever reason))... how I get that this is an >> interesting case... one cannot just assume anything about the corrupt >> spacemap... it could be complete and just the checksum is wrong, it could be >> completely corrupt and ignorable.. but what I understand of ZFS (and please >> watchers chime in if I'm wrong) the spacemap is just the freespace map.. if >> corrupt or missing one cannot just 'fix it' because there is a very good >> chance that the fix would corrupt something that is actually allocated and >> therefore the best solution would be (to "fix it") would be consider it 100% >> full and therefore 'dead space' .. but zfs doesn't do that - probably a good >> thing - the result being that a drive that is supposed to be good (and zdb >> reports some +36m objects there) becomes completely unreadable ... my >> thought (desire/want) on a 'walk' tool would be a last resort tool that >> could walk the datasets and send them elsewhere (like zfs send) so that I >> could create a new pool elsewhere and send the data it knows about to >> another pool and then blow away the original - if there are corruptions or >> data missing, thats my problem it's a last resort.. but in the case the >> critical structures become corrupt it means a local recovery option is >> enabled.. it means that if the data is all there and the corruption is just >> a spacemap one can transfer the entire drive/data to a new pool whilst the >> original host is rebuilt... this would *significantly* help most people with >> large pools that have to blow them away and re-create the pools because of >> errors/corruptions etc... and with the addition of 'rsync' (the checksumming >> of files) it would be trivial to just 'fix' the data corrupted or missing >> from a mirror host rather than transferring the entire pool from (possibly) >> offsite > > From what I've read that's not a partial write issue, as in that case the > pool would have just rolled back. It sounds more like the write was > successful but the data in that write was trashed due to your power incident > and that was replicated across ALL drives. > I think this might be where the problem started.. it was already rolling back from the first power issue (it did exactly what was expected and programmed, it rolled back 5 seconds.. which as no-one had write access to it from the start of the resilver I really didn’t care as the only changes were the resilver itself.). Now you assertion/musing maybe correct... all drives got trashed data.. I think not but unless we get into it and examine it I think we won’t know. What I do know is in the second round
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 02 May 2019, at 02:16, Chris wrote: > > Your story is so unusual I am wondering if its not fiction, I mean all > sorts of power cuts where it just so happens the UPS fails every time, The only “fiction” is the date.. was the 10th not the 19th March... https://www.southcoastregister.com.au/story/5945663/homes-left-without-power-after-electrical-pole-destroyed-in-sanctuary-point-accident/ UPSes do glitch out sometimes, but rarely.. they do have problems when 11kv comes down a 240v line though... > then you decide to ship a server halfway round the world, and on top > of that you get a way above average rate of hard drive failures. But > aside from all this you managed to recover multiple times. > Incorrect.. I shipped the server around the world 18 months ago.. (Oct 2017).. before that the power problems... well Malta is legendary for them... This was the last to hit me before I moved back home to Australia after being there for 8 years... https://www.timesofmalta.com/articles/view/20170826/local/hundreds-of-households-without-power-after-fire-in-distribution-centre.656490 > ZFS is never claimed to be a get out of jail free card, but it did > survive in your case multiple times, I suggest tho if you value > redundancy, do not use RAIDZ but use Mirror instead. I dont know why > people keep persisting with raid 5/6 now days with drives as large as > they are. Could that be because at the time of building it, the largest drives were 4T... had the 6T drives been available to me I would have mirrored 6T drives instead of zRAID2 ing 16x3T drives. > > I have used ZFS since the days of FreeBSD 8.x and its resilience > compared to the likes of ext is astounding and especially compared to > UFS. I’m not disputing its resilience to errors in the file data, it is rather good, but when it comes to the metadata that’s when I have always had problems.. it’s ok until it isn’t, then it’s lucky if you can get it back... and until now I’ve had that luck. > > Before marking it down think how would UFS or ext have managed the > scenarios you presented in your blog. Well I have 2 servers with zfs, the rest are UFS or HPFS... the only other Issue I had was a (mirrored drive) with HPFS... it got corrupted where it (the FSCK like tools) couldn’t fix it... but the drive was still accessible and the backups were on the zfs drives... (timemachines in zvols over iscsi)... I didn’t need to go to timemachine (though I did check the data for consistency after “restore”) .. I got new drives, replaced then, mirrored them, then copied over everything except the OS by mounting one of the drives in an external caddy... solved the underlying “unfixable” error in the HPFS structures... > > Also think about where you hosting your data with all your power > failures and the UPS equipment you utilise as well. Well I have insurance quotes for new UPSes that I’m waiting on replacement so that’s sorta mute... I could post the images of them here if you like or don’t believe me? Seriously it is unusual I get it, but all my ZFS problems have been due to failures whilst resilvering ... it’s always (over 8 years of running these servers) resilvering tHat does it... it’ll be happily progressing and another drive fails, power goes out, kernel panic, etc... then the problems start, and if it does it twice you better start praying. This is my experience. Michelle > >> On Mon, 29 Apr 2019 at 16:26, Michelle Sullivan wrote: >> >> I know I'm not going to be popular for this, but I'll just drop it here >> anyhow. >> >> http://www.michellesullivan.org/blog/1726 >> >> Perhaps one should reconsider either: >> >> 1. Looking at tools that may be able to recover corrupt ZFS metadata, or >> 2. Defaulting to non ZFS filesystems on install. >> >> -- >> Michelle Sullivan >> http://www.mhix.org/ >> >> ___ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Paul Mather wrote: On Apr 30, 2019, at 11:17 PM, Michelle Sullivan wrote: Been there done that though with ext2 rather than UFS.. still got all my data back... even though it was a nightmare.. Is that an implication that had all your data been on UFS (or ext2:) this time around you would have got it all back? (I've got that impression through this thread from things you've written.) That sort of makes it sound like UFS is bulletproof to me. Its definitely not (and far from it) bullet proof - however when the data on disk is not corrupt I have managed to recover it - even if it has been a nightmare - no structure - all files in lost+found etc... or even resorting to r-studio in the even of lost raid information etc.. There are levels of corruption. Maybe what you suffered would have taken down UFS, too? Pretty sure not - and even if it would have - with the files intact I have always been able to recover them... r-studio being the last resort. I guess there's no way to know unless there's some way you can recreate exactly the circumstances that took down your original system (but this time your data on UFS). ;-) True. This case - from what my limited knowledge has managed to fathom is a spacemap has become corrupt due to partial write during the hard power failure. This was the second hard outage during the resilver process following a drive platter failure (on a ZRAID2 - so single platter failure should be completely recoverable all cases - except hba failure or other corruption which does not appear to be the case).. the spacemap fails checksum (no surprises there being that it was part written) however it cannot be repaired (for what ever reason))... how I get that this is an interesting case... one cannot just assume anything about the corrupt spacemap... it could be complete and just the checksum is wrong, it could be completely corrupt and ignorable.. but what I understand of ZFS (and please watchers chime in if I'm wrong) the spacemap is just the freespace map.. if corrupt or missing one cannot just 'fix it' because there is a very good chance that the fix would corrupt something that is actually allocated and therefore the best solution would be (to "fix it") would be consider it 100% full and therefore 'dead space' .. but zfs doesn't do that - probably a good thing - the result being that a drive that is supposed to be good (and zdb reports some +36m objects there) becomes completely unreadable ... my thought (desire/want) on a 'walk' tool would be a last resort tool that could walk the datasets and send them elsewhere (like zfs send) so that I could create a new pool elsewhere and send the data it knows about to another pool and then blow away the original - if there are corruptions or data missing, thats my problem it's a last resort.. but in the case the critical structures become corrupt it means a local recovery option is enabled.. it means that if the data is all there and the corruption is just a spacemap one can transfer the entire drive/data to a new pool whilst the original host is rebuilt... this would *significantly* help most people with large pools that have to blow them away and re-create the pools because of errors/corruptions etc... and with the addition of 'rsync' (the checksumming of files) it would be trivial to just 'fix' the data corrupted or missing from a mirror host rather than transferring the entire pool from (possibly) offsite Regards, -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Paul Mather wrote: On Apr 30, 2019, at 8:14 PM, Michelle Sullivan wrote: Michelle Sullivan http://www.mhix.org/ Sent from my iPad On 01 May 2019, at 01:15, Karl Denninger wrote: IMHO non-ECC memory systems are ok for personal desktop and laptop machines where loss of stored data requiring a restore is acceptable (assuming you have a reasonable backup paradigm for same) but not for servers and *especially* not for ZFS storage. I don't like the price of ECC memory and I really don't like Intel's practices when it comes to only enabling ECC RAM on their "server" class line of CPUs either but it is what it is. Pay up for the machines where it matters. And the irony is the FreeBSD policy to default to zfs on new installs using the complete drive.. even when there is only one disk available and regardless of the cpu or ram class... with one usb stick I have around here it attempted to use zfs on one of my laptops. ZFS has MUCH more to recommend it than just the "self-healing" properties discussed in this thread. Its pooled storage model, good administration and snapshot/clone support (enabling features such as boot environments) make it preferable over UFS as a default file system. You can even gain the benfits of self-healing (for silent data corruption) for single-drive systems via "copies=2" or "copies=3" on file sets. Damned if you do, damned if you don’t comes to mind. Not really. Nobody is forcing anyone only to use ZFS as a choice of file system. As you say above, it is a default (a very sensible one, IMHO, but even then, it's not really a default). If you believe ZFS is not right for you, do a UFS installation instead. BTW, I disagree that you need top-notch server-grade hardware to use ZFS. Its design embodies the notion of being distrustful of the hardware on which it is running, and it is targeted to be able to survive consumer hardware (as has been pointed out elsewhere in this thread), e.g., HBAs without BBUs. I am using ZFS on a Raspberry Pi with an external USB drive. How's that for server-grade hardware? :-) Was I drunk posting again? I thought others were advocating that server grade hardware was suitable for ZFS and if you are using consumer grade, you get what you pay for and dont blame ZFS etc.. This is an interesting issue... 2 thoughts of mind... ZFS safe to use on COnsumer hardware or not? ECC necessary or not? "The data on disk is always right" or not? .. it can't be all of the above by the very nature of the arguments that all seem to be against me, but not against each other... even though they are directly in conflict (and I have seen this on other ZFS lists... usually just before or after the justification for no 'FSCK for ZFS' (which after looking deeply into how ZFS works, I mostly agree with - which I stated earlier) - though a 'ZFS walk' tool may be the compromise that satisfies those who believe that an FSCK should be available and usually have no idea why it probably can never happen I will point out that someone from this thread messaged me this: https://www.klennet.com/zfs-recovery/default.aspx - which seems to be exactly what I'm talking about - a 'zfs walk' sorta told however... it's winblows only ... but if it does what it says on the packet.. this would probably be the missing link that would appease most ZFS detractors and people like me - who think ZFS is good for those with server grade hardware, but really not a good idea for the general linux user... :) (*waits for the flames*) -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 01 May 2019, at 12:37, Karl Denninger wrote: > > On 4/30/2019 20:59, Michelle Sullivan wrote >>> On 01 May 2019, at 11:33, Karl Denninger wrote: >>> >>>> On 4/30/2019 19:14, Michelle Sullivan wrote: >>>> >>>> Michelle Sullivan >>>> http://www.mhix.org/ >>>> Sent from my iPad >>>> >>> Nope. I'd much rather *know* the data is corrupt and be forced to >>> restore from backups than to have SILENT corruption occur and perhaps >>> screw me 10 years down the road when the odds are my backups have >>> long-since been recycled. >> Ahh yes the be all and end all of ZFS.. stops the silent corruption of >> data.. but don’t install it on anything unless it’s server grade with >> backups and ECC RAM, but it’s good on laptops because it protects you from >> silent corruption of your data when 10 years later the backups have >> long-since been recycled... umm is that not a circular argument? >> >> Don’t get me wrong here.. and I know you (and some others are) zfs in the DC >> with 10s of thousands in redundant servers and/or backups to keep your >> critical data corruption free = good thing. >> >> ZFS on everything is what some say (because it prevents silent corruption) >> but then you have default policies to install it everywhere .. including >> hardware not equipped to function safely with it (in your own arguments) and >> yet it’s still good because it will still prevent silent corruption even >> though it relies on hardware that you can trust... umm say what? >> >> Anyhow veered way way off (the original) topic... >> >> Modest (part consumer grade, part commercial) suffered irreversible data >> loss because of a (very unusual, but not impossible) double power outage.. >> and no tools to recover the data (or part data) unless you have some form of >> backup because the file system deems the corruption to be too dangerous to >> let you access any of it (even the known good bits) ... >> >> Michelle > > IMHO you're dead wrong Michelle. I respect your opinion but disagree > vehemently. I guess we’ll have to agree to disagree then, but I think your attitude to pronounce me “dead wrong” is short sighted, because it strikes of “I’m right because ZFS is the answer to all problems.” .. I’ve been around in the industry long enough to see a variety of issues... some disasters, some not so... I also should know better than to run without backups but financial constraints precluded me as will for many non commercial people. > > I run ZFS on both of my laptops under FreeBSD. Both have > non-power-protected SSDs in them. Neither is mirrored or Raidz-anything. > > So why run ZFS instead of UFS? > > Because a scrub will detect data corruption that UFS cannot detect *at all.* I get it, I really do, but that balances out against, if you can’t rebuild it make sure you have (tested and working) backups and be prepared for downtime when such corruption does occur. > > It is a balance-of-harms test and you choose. I can make a very clean > argument that *greater information always wins*; that is, I prefer in > every case to *know* I'm screwed rather than not. I can defend against > being screwed with some amount of diligence but in order for that > diligence to be reasonable I have to know about the screwing in a > reasonable amount of time after it happens. Not disagreeing (and have not been.) > > You may have never had silent corruption bite you. I have... but not with data on disks.. most of my silent corruption issues have been with a layer or two above the hardware... like subversion commits overwriting previous commits without notification (damn I wish I could reliably replicate it!) > I have had it happen > several times over my IT career. If that happens to you the odds are > that it's absolutely unrecoverable and whatever gets corrupted is > *gone.* Every drive corruption I have suffered in my career I have been able to recover, all or partial data except where the hardware itself was totally hosed (Ie clean room options only available)... even with brtfs.. yuk.. puck.. yuk.. oh what a mess that was... still get nightmares on that one... but I still managed to get most of the data off... in fact I put it onto this machine I currently have problems with.. so after the nightmare of brtfs looks like zfs eventually nailed me. > The defensive measures against silent corruption require > retention of backup data *literally forever* for the entire useful life > of the information because from the point of corruption forward *the > b
Re: ZFS...
Xin LI wrote: On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <mailto:miche...@sorbs.net>> wrote: but in my recent experience 2 issues colliding at the same time results in disaster Do we know exactly what kind of corruption happen to your pool? If you see it twice in a row, it might suggest a software bug that should be investigated. Oh I did spot one interesting bug... though it is benign... Check out the following (note the difference between 'zpool status' and 'zpool status -v'): root@colossus:/mnt # zpool status pool: storage state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Apr 29 20:22:03 2019 6.54T scanned at 0/s, 6.54T issued at 0/s, 28.8T total 445G resilvered, 22.66% done, no estimated completion time config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 5 raidz2-0 ONLINE 0 020 mfid11 ONLINE 0 0 0 mfid10 ONLINE 0 0 0 mfid8 ONLINE 0 0 0 mfid7 ONLINE 0 0 0 mfid0 ONLINE 0 0 0 mfid5 ONLINE 0 0 0 mfid4 ONLINE 0 0 0 mfid3 ONLINE 0 0 0 mfid2 ONLINE 0 0 0 mfid14 ONLINE 0 0 0 mfid15 ONLINE 0 0 0 mfid6 ONLINE 0 0 0 mfid9 ONLINE 0 0 0 mfid13 ONLINE 0 0 0 mfid1 ONLINE 0 0 0 errors: 4 data errors, use '-v' for a list root@colossus:/mnt # zpool status pool: storage state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Apr 29 20:22:03 2019 6.54T scanned at 0/s, 6.54T issued at 0/s, 28.8T total 445G resilvered, 22.66% done, no estimated completion time config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 5 raidz2-0 ONLINE 0 020 mfid11 ONLINE 0 0 0 mfid10 ONLINE 0 0 0 mfid8 ONLINE 0 0 0 mfid7 ONLINE 0 0 0 mfid0 ONLINE 0 0 0 mfid5 ONLINE 0 0 0 mfid4 ONLINE 0 0 0 mfid3 ONLINE 0 0 0 mfid2 ONLINE 0 0 0 mfid14 ONLINE 0 0 0 mfid15 ONLINE 0 0 0 mfid6 ONLINE 0 0 0 mfid9 ONLINE 0 0 0 mfid13 ONLINE 0 0 0 mfid1 ONLINE 0 0 0 errors: 4 data errors, use '-v' for a list root@colossus:/mnt # zpool status -v pool: storage state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Apr 29 20:22:03 2019 6.54T scanned at 0/s, 6.54T issued at 0/s, 28.8T total 445G resilvered, 22.66% done, no estimated completion time config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 5 raidz2-0 ONLINE 0 020 mfid11 ONLINE 0 0 0 mfid10 ONLINE 0 0 0 mfid8 ONLINE 0 0 0 mfid7 ONLINE 0 0 0 mfid0 ONLINE 0 0 0 mfid5 ONLINE 0 0 0 mfid4 ONLINE 0 0 0 mfid3 ONLINE 0 0 0 mfid2 ONLINE 0 0 0 mfid14 ONLINE 0 0 0 mfid15 ONLINE 0 0 0 mfid6 ONLINE 0 0 0 mfid9 ONLINE 0 0 0 mfid13 ONLINE 0 0 0 mfid1 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: :<0x3e> :<0x5d> storage:<0x0> storage@now:<0x0> root@colossus:/mnt # zpool status -v pool: storage state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Apr 29 20:22:03 2019 6.54T scanned at 0/s, 6.54T issued at 0/s, 28.8T total 445G resilvered, 22.66% done, no estimated completion time config: NAMESTATE READ WRITE CKSUM storage ONLINE 0 0 7 raidz2-0 ONLINE 0 028 mfid11 ONLINE 0
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 01 May 2019, at 11:33, Karl Denninger wrote: > > >> On 4/30/2019 19:14, Michelle Sullivan wrote: >> >> Michelle Sullivan >> http://www.mhix.org/ >> Sent from my iPad >> >>> On 01 May 2019, at 01:15, Karl Denninger wrote: >>> >>> >>> IMHO non-ECC memory systems are ok for personal desktop and laptop >>> machines where loss of stored data requiring a restore is acceptable >>> (assuming you have a reasonable backup paradigm for same) but not for >>> servers and *especially* not for ZFS storage. I don't like the price of >>> ECC memory and I really don't like Intel's practices when it comes to >>> only enabling ECC RAM on their "server" class line of CPUs either but it >>> is what it is. Pay up for the machines where it matters. >> And the irony is the FreeBSD policy to default to zfs on new installs using >> the complete drive.. even when there is only one disk available and >> regardless of the cpu or ram class... with one usb stick I have around here >> it attempted to use zfs on one of my laptops. >> >> Damned if you do, damned if you don’t comes to mind. >> > Nope. I'd much rather *know* the data is corrupt and be forced to > restore from backups than to have SILENT corruption occur and perhaps > screw me 10 years down the road when the odds are my backups have > long-since been recycled. Ahh yes the be all and end all of ZFS.. stops the silent corruption of data.. but don’t install it on anything unless it’s server grade with backups and ECC RAM, but it’s good on laptops because it protects you from silent corruption of your data when 10 years later the backups have long-since been recycled... umm is that not a circular argument? Don’t get me wrong here.. and I know you (and some others are) zfs in the DC with 10s of thousands in redundant servers and/or backups to keep your critical data corruption free = good thing. ZFS on everything is what some say (because it prevents silent corruption) but then you have default policies to install it everywhere .. including hardware not equipped to function safely with it (in your own arguments) and yet it’s still good because it will still prevent silent corruption even though it relies on hardware that you can trust... umm say what? Anyhow veered way way off (the original) topic... Modest (part consumer grade, part commercial) suffered irreversible data loss because of a (very unusual, but not impossible) double power outage.. and no tools to recover the data (or part data) unless you have some form of backup because the file system deems the corruption to be too dangerous to let you access any of it (even the known good bits) ... Michelle > Karl Denninger > k...@denninger.net <mailto:k...@denninger.net> > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 01 May 2019, at 01:15, Karl Denninger wrote: > > > IMHO non-ECC memory systems are ok for personal desktop and laptop > machines where loss of stored data requiring a restore is acceptable > (assuming you have a reasonable backup paradigm for same) but not for > servers and *especially* not for ZFS storage. I don't like the price of > ECC memory and I really don't like Intel's practices when it comes to > only enabling ECC RAM on their "server" class line of CPUs either but it > is what it is. Pay up for the machines where it matters. And the irony is the FreeBSD policy to default to zfs on new installs using the complete drive.. even when there is only one disk available and regardless of the cpu or ram class... with one usb stick I have around here it attempted to use zfs on one of my laptops. Damned if you do, damned if you don’t comes to mind. Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Walter Cramer wrote: Brief "Old Man" summary/perspective here... Computers and hard drives are complex, sensitive physical things. They, or the data on them, can be lost to fire, flood, lightning strikes, theft, transportation screw-ups, and more. Mass data corruption by faulty hardware or software is mostly rare, but does happen. Then there's the users - authorized or not - who are inept or malicious. Yup You can spent a fortune to make loss of the "live" data in your home server / server room / data center very unlikely. Is that worth the time and money? Depends on the business case. At any scale, it's best to have a manager - who understands both computers and the bottom line - keep a close eye on this. That would sorta be my point.. (and yet default FreeBSD install - can't remember wihch version, could be everything current - is to push everything on a filesystem that relies on perfect (or as near as it) hardware where you know, we all know, that the target hardware is consumer grade).. I have 2 machines - almost identical here.. the differences only being the motherboard, CPU and RAM.. the cases are both identical, the PSUs are to, even the drives and controllers are (for the zfs part at least).. Both Supermicro cases with dual psus, both with 16x ST3VN* drives (Iron Wolf, NAS drives) both with LSI HBAs, both with kingston dual 128GB flash drives mirrored for the base OS. One with 32G non ECC RAM and an onboard RAID for the OS drives, the other with a supermicro board and 16GB ECC RAM and FreeBSD (Geom) mirroring for the OS drive. "Real" protection from data loss means multiple off-site and generally off-line backups. You could spend a fortune on that, too...but for your use case (~21TB in an array that could hold ~39TB, and what sounds like a "home power user" budget), I'd say to put together two "backup servers" - cheap little (aka transportable) FreeBSD systems with, say 7x6GB HD's, raidz1. At the time 6TB ("T" :) ) were not available - 3's and 4's where the top available... it's just rolled on with replacements since. With even a 1Gbit ethernet connection to your main system, savvy use of (say) rsync (net/rsync in Ports), and the sort of "know your data / divide & conquer" tactics that Karl mentions, you should be able to complete initial backups (on both backup servers) in <1 month. After that - rsync can generally do incremental backups far, far faster. How often you gently haul the backup servers to/from your off-site location(s) depends on a bunch of factors - backup frequency, cost of bandwidth, etc. 2xbonded Gig connections each server... offsite backup not possible (feasible) due to the Australian mess that they call broadband (12MBps max - running down a hill, on a good day with the wind behind you, and 30 hail Marys every 20 yards) Never skimp on power supplies. Hence, dual supermicro, with dual 6 kva HP UPSs with batteries replaced every 36 months and a generator. -Walter [Credits: Nothing above is original. Others have already made most of my points in this thread. It's pretty much all decades-old computer wisdom in any case.] Yup, I know the drill. Michelle On Tue, 30 Apr 2019, Michelle Sullivan wrote: Karl Denninger wrote: On 4/30/2019 05:14, Michelle Sullivan wrote: On 30 Apr 2019, at 19:50, Xin LI wrote: On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan wrote: but in my recent experience 2 issues colliding at the same time results in disaster Do we know exactly what kind of corruption happen to your pool? If you see it twice in a row, it might suggest a software bug that should be investigated. All I know is it’s a checksum error on a meta slab (122) and from what I can gather it’s the spacemap that is corrupt... but I am no expert. I don’t believe it’s a software fault as such, because this was cause by a hard outage (damaged UPSes) whilst resilvering a single (but completely failed) drive. ...and after the first outage a second occurred (same as the first but more damaging to the power hardware)... the host itself was not damaged nor were the drives or controller. . Note that ZFS stores multiple copies of its essential metadata, and in my experience with my old, consumer grade crappy hardware (non-ECC RAM, with several faulty, single hard drive pool: bad enough to crash almost monthly and damages my data from time to time), This was a top end consumer grade mb with non ecc ram that had been running for 8+ years without fault (except for hard drive platter failures.). Uptime would have been years if it wasn’t for patching. Yuck. I'm sorry, but that may well be what nailed you. ECC is not just about the random cosmic ray. It also saves your bacon when there are power glitches. No. Sorry no. If the data is only half to disk, ECC isn't going to save you at all... it's
Re: ZFS...
This issue is definitely related to sudden unexpected loss of power during resilver.. not ECC/non-ECC issues. Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 01 May 2019, at 00:12, Alan Somers wrote: > >> On Tue, Apr 30, 2019 at 8:05 AM Michelle Sullivan wrote: >> >> >> >> Michelle Sullivan >> http://www.mhix.org/ >> Sent from my iPad >> >>>> On 01 May 2019, at 00:01, Alan Somers wrote: >>>> >>>> On Tue, Apr 30, 2019 at 7:30 AM Michelle Sullivan >>>> wrote: >>>> >>>> Karl Denninger wrote: >>>>> On 4/30/2019 05:14, Michelle Sullivan wrote: >>>>>>>> On 30 Apr 2019, at 19:50, Xin LI wrote: >>>>>>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan >>>>>>>> wrote: >>>>>>>> but in my recent experience 2 issues colliding at the same time >>>>>>>> results in disaster >>>>>>> Do we know exactly what kind of corruption happen to your pool? If you >>>>>>> see it twice in a row, it might suggest a software bug that should be >>>>>>> investigated. >>>>>>> >>>>>>> All I know is it’s a checksum error on a meta slab (122) and from what >>>>>>> I can gather it’s the spacemap that is corrupt... but I am no expert. >>>>>>> I don’t believe it’s a software fault as such, because this was cause >>>>>>> by a hard outage (damaged UPSes) whilst resilvering a single (but >>>>>>> completely failed) drive. ...and after the first outage a second >>>>>>> occurred (same as the first but more damaging to the power hardware)... >>>>>>> the host itself was not damaged nor were the drives or controller. >>>>> . >>>>>>> Note that ZFS stores multiple copies of its essential metadata, and in >>>>>>> my experience with my old, consumer grade crappy hardware (non-ECC RAM, >>>>>>> with several faulty, single hard drive pool: bad enough to crash almost >>>>>>> monthly and damages my data from time to time), >>>>>> This was a top end consumer grade mb with non ecc ram that had been >>>>>> running for 8+ years without fault (except for hard drive platter >>>>>> failures.). Uptime would have been years if it wasn’t for patching. >>>>> Yuck. >>>>> >>>>> I'm sorry, but that may well be what nailed you. >>>>> >>>>> ECC is not just about the random cosmic ray. It also saves your bacon >>>>> when there are power glitches. >>>> >>>> No. Sorry no. If the data is only half to disk, ECC isn't going to save >>>> you at all... it's all about power on the drives to complete the write. >>> >>> ECC RAM isn't about saving the last few seconds' worth of data from >>> before a power crash. It's about not corrupting the data that gets >>> written long before a crash. If you have non-ECC RAM, then a cosmic >>> ray/alpha ray/row hammer attack/bad luck can corrupt data after it's >>> been checksummed but before it gets DMAed to disk. Then disk will >>> contain corrupt data and you won't know it until you try to read it >>> back. >> >> I know this... unless I misread Karl’s message he implied the ECC would have >> saved the corruption in the crash... which is patently false... I think >> you’ll agree.. > > I don't think that's what Karl meant. I think he meant that the > non-ECC RAM could've caused latent corruption that was only detected > when the crash forced a reboot and resilver. > >> >> Michelle >> >> >>> >>> -Alan >>> >>>>> >>>>> Unfortunately however there is also cache memory on most modern hard >>>>> drives, most of the time (unless you explicitly shut it off) it's on for >>>>> write caching, and it'll nail you too. Oh, and it's never, in my >>>>> experience, ECC. >>> >>> Fortunately, ZFS never sends non-checksummed data to the hard drive. >>> So an error in the hard drive's cache ram will usually get detected by >>> the ZFS checksum. >>> >>>> >>>> No comment on that - you're right in the first part, I can't comment if >>>> there are drives with ECC. >>>> >>>>> >>>
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 01 May 2019, at 00:01, Alan Somers wrote: > >>> >>> Unfortunately however there is also cache memory on most modern hard >>> drives, most of the time (unless you explicitly shut it off) it's on for >>> write caching, and it'll nail you too. Oh, and it's never, in my >>> experience, ECC. > > Fortunately, ZFS never sends non-checksummed data to the hard drive. > So an error in the hard drive's cache ram will usually get detected by > the ZFS checksum. True, but a drive losing power mid write will ensure the checksum doesn’t match the data (even if it is written before the data)... you need to ensure all the data and the checksum is written before drive power down.. and in the event of unexpected hard power fail, you can’t guarantee this. Battery backup in the controller that has a write cache and re-writes the last few writes on power restore on the otherhand will save you.. which is why the other machine at my disposal hasn’t failed to date. Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 01 May 2019, at 00:01, Alan Somers wrote: > >> On Tue, Apr 30, 2019 at 7:30 AM Michelle Sullivan wrote: >> >> Karl Denninger wrote: >>> On 4/30/2019 05:14, Michelle Sullivan wrote: >>>>>> On 30 Apr 2019, at 19:50, Xin LI wrote: >>>>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan >>>>>> wrote: >>>>>> but in my recent experience 2 issues colliding at the same time results >>>>>> in disaster >>>>> Do we know exactly what kind of corruption happen to your pool? If you >>>>> see it twice in a row, it might suggest a software bug that should be >>>>> investigated. >>>>> >>>>> All I know is it’s a checksum error on a meta slab (122) and from what I >>>>> can gather it’s the spacemap that is corrupt... but I am no expert. I >>>>> don’t believe it’s a software fault as such, because this was cause by a >>>>> hard outage (damaged UPSes) whilst resilvering a single (but completely >>>>> failed) drive. ...and after the first outage a second occurred (same as >>>>> the first but more damaging to the power hardware)... the host itself was >>>>> not damaged nor were the drives or controller. >>> . >>>>> Note that ZFS stores multiple copies of its essential metadata, and in my >>>>> experience with my old, consumer grade crappy hardware (non-ECC RAM, with >>>>> several faulty, single hard drive pool: bad enough to crash almost >>>>> monthly and damages my data from time to time), >>>> This was a top end consumer grade mb with non ecc ram that had been >>>> running for 8+ years without fault (except for hard drive platter >>>> failures.). Uptime would have been years if it wasn’t for patching. >>> Yuck. >>> >>> I'm sorry, but that may well be what nailed you. >>> >>> ECC is not just about the random cosmic ray. It also saves your bacon >>> when there are power glitches. >> >> No. Sorry no. If the data is only half to disk, ECC isn't going to save >> you at all... it's all about power on the drives to complete the write. > > ECC RAM isn't about saving the last few seconds' worth of data from > before a power crash. It's about not corrupting the data that gets > written long before a crash. If you have non-ECC RAM, then a cosmic > ray/alpha ray/row hammer attack/bad luck can corrupt data after it's > been checksummed but before it gets DMAed to disk. Then disk will > contain corrupt data and you won't know it until you try to read it > back. I know this... unless I misread Karl’s message he implied the ECC would have saved the corruption in the crash... which is patently false... I think you’ll agree.. Michelle > > -Alan > >>> >>> Unfortunately however there is also cache memory on most modern hard >>> drives, most of the time (unless you explicitly shut it off) it's on for >>> write caching, and it'll nail you too. Oh, and it's never, in my >>> experience, ECC. > > Fortunately, ZFS never sends non-checksummed data to the hard drive. > So an error in the hard drive's cache ram will usually get detected by > the ZFS checksum. > >> >> No comment on that - you're right in the first part, I can't comment if >> there are drives with ECC. >> >>> >>> In addition, however, and this is something I learned a LONG time ago >>> (think Z-80 processors!) is that as in so many very important things >>> "two is one and one is none." >>> >>> In other words without a backup you WILL lose data eventually, and it >>> WILL be important. >>> >>> Raidz2 is very nice, but as the name implies it you have two >>> redundancies. If you take three errors, or if, God forbid, you *write* >>> a block that has a bad checksum in it because it got scrambled while in >>> RAM, you're dead if that happens in the wrong place. >> >> Or in my case you write part data therefore invalidating the checksum... >>> >>>> Yeah.. unlike UFS that has to get really really hosed to restore from >>>> backup with nothing recoverable it seems ZFS can get hosed where issues >>>> occur in just the wrong bit... but mostly it is recoverable (and my >>>> experience has been some nasty shit that always ended up being >>>> recoverable.) >>>> >>>> Michelle >&g
Re: ZFS...
Karl Denninger wrote: On 4/30/2019 03:09, Michelle Sullivan wrote: Consider.. If one triggers such a fault on a production server, how can one justify transferring from backup multiple terabytes (or even petabytes now) of data to repair an unmountable/faulted array because all backup solutions I know currently would take days if not weeks to restore the sort of store ZFS is touted with supporting. Had it happen on a production server a few years back with ZFS. The *hardware* went insane (disk adapter) and scribbled on *all* of the vdevs. The machine crashed and would not come back up -- at all. I insist on (and had) emergency boot media physically in the box (a USB key) in any production machine and it was quite-quickly obvious that all of the vdevs were corrupted beyond repair. There was no rational option other than to restore. It was definitely not a pleasant experience, but this is why when you get into systems and data store sizes where it's a five-alarm pain in the neck you must figure out some sort of strategy that covers you 99% of the time without a large amount of downtime involved, and in the 1% case accept said downtime. In this particular circumstance the customer didn't want to spend on a doubled-and-transaction-level protected on-site (in the same DC) redundancy setup originally so restore, as opposed to fail-over/promote and then restore and build a new "redundant" box where the old "primary" resided was the most-viable option. Time to recover essential functions was ~8 hours (and over 24 hours for everything to be restored.) How big was the storage area? -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Karl Denninger wrote: On 4/30/2019 05:14, Michelle Sullivan wrote: On 30 Apr 2019, at 19:50, Xin LI wrote: On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan wrote: but in my recent experience 2 issues colliding at the same time results in disaster Do we know exactly what kind of corruption happen to your pool? If you see it twice in a row, it might suggest a software bug that should be investigated. All I know is it’s a checksum error on a meta slab (122) and from what I can gather it’s the spacemap that is corrupt... but I am no expert. I don’t believe it’s a software fault as such, because this was cause by a hard outage (damaged UPSes) whilst resilvering a single (but completely failed) drive. ...and after the first outage a second occurred (same as the first but more damaging to the power hardware)... the host itself was not damaged nor were the drives or controller. . Note that ZFS stores multiple copies of its essential metadata, and in my experience with my old, consumer grade crappy hardware (non-ECC RAM, with several faulty, single hard drive pool: bad enough to crash almost monthly and damages my data from time to time), This was a top end consumer grade mb with non ecc ram that had been running for 8+ years without fault (except for hard drive platter failures.). Uptime would have been years if it wasn’t for patching. Yuck. I'm sorry, but that may well be what nailed you. ECC is not just about the random cosmic ray. It also saves your bacon when there are power glitches. No. Sorry no. If the data is only half to disk, ECC isn't going to save you at all... it's all about power on the drives to complete the write. Unfortunately however there is also cache memory on most modern hard drives, most of the time (unless you explicitly shut it off) it's on for write caching, and it'll nail you too. Oh, and it's never, in my experience, ECC. No comment on that - you're right in the first part, I can't comment if there are drives with ECC. In addition, however, and this is something I learned a LONG time ago (think Z-80 processors!) is that as in so many very important things "two is one and one is none." In other words without a backup you WILL lose data eventually, and it WILL be important. Raidz2 is very nice, but as the name implies it you have two redundancies. If you take three errors, or if, God forbid, you *write* a block that has a bad checksum in it because it got scrambled while in RAM, you're dead if that happens in the wrong place. Or in my case you write part data therefore invalidating the checksum... Yeah.. unlike UFS that has to get really really hosed to restore from backup with nothing recoverable it seems ZFS can get hosed where issues occur in just the wrong bit... but mostly it is recoverable (and my experience has been some nasty shit that always ended up being recoverable.) Michelle Oh that is definitely NOT true again, from hard experience, including (but not limited to) on FreeBSD. My experience is that ZFS is materially more-resilient but there is no such thing as "can never be corrupted by any set of events." The latter part is true - and my blog and my current situation is not limited to or aimed at FreeBSD specifically, FreeBSD is my experience. The former part... it has been very resilient, but I think (based on this certain set of events) it is easily corruptible and I have just been lucky. You just have to hit a certain write to activate the issue, and whilst that write and issue might be very very difficult (read: hit and miss) to hit in normal every day scenarios it can and will eventually happen. Backup strategies for moderately large (e.g. many Terabytes) to very large (e.g. Petabytes and beyond) get quite complex but they're also very necessary. and there in lies the problem. If you don't have a many 10's of thousands of dollars backup solutions, you're either: 1/ down for a long time. 2/ losing all data and starting again... ..and that's the problem... ufs you can recover most (in most situations) and providing the *data* is there uncorrupted by the fault you can get it all off with various tools even if it is a complete mess here I am with the data that is apparently ok, but the metadata is corrupt (and note: as I had stopped writing to the drive when it started resilvering the data - all of it - should be intact... even if a mess.) Michelle -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 30 Apr 2019, at 19:50, Xin LI wrote: > > >> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan wrote: >> but in my recent experience 2 issues colliding at the same time results in >> disaster > > Do we know exactly what kind of corruption happen to your pool? If you see > it twice in a row, it might suggest a software bug that should be > investigated. All I know is it’s a checksum error on a meta slab (122) and from what I can gather it’s the spacemap that is corrupt... but I am no expert. I don’t believe it’s a software fault as such, because this was cause by a hard outage (damaged UPSes) whilst resilvering a single (but completely failed) drive. ...and after the first outage a second occurred (same as the first but more damaging to the power hardware)... the host itself was not damaged nor were the drives or controller. > > Note that ZFS stores multiple copies of its essential metadata, and in my > experience with my old, consumer grade crappy hardware (non-ECC RAM, with > several faulty, single hard drive pool: bad enough to crash almost monthly > and damages my data from time to time), This was a top end consumer grade mb with non ecc ram that had been running for 8+ years without fault (except for hard drive platter failures.). Uptime would have been years if it wasn’t for patching. > I've never seen a corruption this bad and I was always able to recover the > pool. So far, same. > At previous employer, the only case that we had the pool corrupted enough to > the point that mount was not allowed was because two host nodes happen to > import the pool at the same time, which is a situation that can be avoided > with SCSI reservation; their hardware was of much better quality, though. > > Speaking for a tool like 'fsck': I think I'm mostly convinced that it's not > necessary, because at the point ZFS says the metadata is corrupted, it means > that these metadata was really corrupted beyond repair (all replicas were > corrupted; otherwise it would recover by finding out the right block and > rewrite the bad ones). I see this message all the time and mostly agree.. actually I do agree with possibly a minor exception, but so minor it’s probably not worth it. However as I suggested in my original post.. the pool says the files are there, a tool that would send them (aka zfs send) but ignoring errors to spacemaps etc would be real useful (to me.) > > An interactive tool may be useful (e.g. "I saw data structure version 1, 2, 3 > available, and all with bad checksum, choose which one you would want to > try"), but I think they wouldn't be very practical for use with large data > pools -- unlike traditional filesystems, ZFS uses copy-on-write and heavily > depends on the metadata to find where the data is, and a regular "scan" is > not really useful. Zdb -AAA showed (shows) 36m files.. which suggests the data is intact, but it aborts the mount with I/o error because it says metadata has three errors.. 2 ‘metadata’ and one “” (storage being the pool name).. it does import, and it attempts to resilver but reports the resilver finishes at some 780M (ish).. export import and it does it all again... zdb without -AAA aborts loading metaslab 122. > > I'd agree that you need a full backup anyway, regardless what storage system > is used, though. Yeah.. unlike UFS that has to get really really hosed to restore from backup with nothing recoverable it seems ZFS can get hosed where issues occur in just the wrong bit... but mostly it is recoverable (and my experience has been some nasty shit that always ended up being recoverable.) Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 30 Apr 2019, at 18:44, rai...@ultra-secure.de wrote: > > Am 2019-04-30 10:09, schrieb Michelle Sullivan: > >> Now, yes most production environments have multiple backing stores so >> will have a server or ten to switch to whilst the store is being >> recovered, but it still wouldn’t be a pleasant experience... not to >> mention the possibility that if one store is corrupted there is a >> chance that the other store(s) would also be affected in the same way >> if in the same DC... (Eg a DC fire - which I have seen) .. and if you >> have multi DC stores to protect from that.. size of the pipes between >> DCs comes clearly into play. > > > I have one customer with about 13T of ZFS - and because it would take a while > to restore (actual backups), it zfs-sends delta-snapshots every hour to a > standby-system. > > It was handy when we had to rebuild the system with different HBAs. > > I wonder what would happen if you scaled that up by just 10 (storage) and had the master blow up where it needs to be restored from backup.. how long would one be praying to higher powers that there is no problem with the backup...? (As in no outage or error causing a complete outAge.)... don’t get me wrong.. we all get to that position at sometime, but in my recent experience 2 issues colliding at the same time results in disaster. 13T is really not something I have issues with as I can usually cobble something together with 16T.. (at least until 6T drives became a viable (cost and availability at short notice) option... even 10T is becoming easier to get a hold of now.. but I have a measly 96T here and it takes weeks even with gigabit bonded interfaces when I need to restore. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 30 Apr 2019, at 17:10, Andrea Venturoli wrote: > >> On 4/30/19 2:41 AM, Michelle Sullivan wrote: >> >> The system was originally built on 9.0, and got upgraded through out the >> years... zfsd was not available back then. So get your point, but maybe you >> didn’t realize this blog was a history of 8+ years? > > That's one of the first things I thought about while reading the original > post: what can be inferred from it is that ZFS might not have been that good > in the past. > It *could* still suffer from the same problems or it *could* have improved > and be more resilient. > Answering that would be interesting... > Without a doubt it has come a long way, but in my opinion, until there is a tool to walk the data (to transfer it out) or something that can either repair or invalidate metadata (such as a spacemap corruption) there is still a fatal flaw that makes it questionable to use... and that is for one reason alone (regardless of my current problems.) Consider.. If one triggers such a fault on a production server, how can one justify transferring from backup multiple terabytes (or even petabytes now) of data to repair an unmountable/faulted array because all backup solutions I know currently would take days if not weeks to restore the sort of store ZFS is touted with supporting. Now, yes most production environments have multiple backing stores so will have a server or ten to switch to whilst the store is being recovered, but it still wouldn’t be a pleasant experience... not to mention the possibility that if one store is corrupted there is a chance that the other store(s) would also be affected in the same way if in the same DC... (Eg a DC fire - which I have seen) .. and if you have multi DC stores to protect from that.. size of the pipes between DCs comes clearly into play. Thoughts? Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Comments inline.. Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 30 Apr 2019, at 03:06, Alan Somers wrote: > >> On Mon, Apr 29, 2019 at 10:23 AM Michelle Sullivan >> wrote: >> >> I know I'm not going to be popular for this, but I'll just drop it here >> anyhow. >> >> http://www.michellesullivan.org/blog/1726 >> >> Perhaps one should reconsider either: >> >> 1. Looking at tools that may be able to recover corrupt ZFS metadata, or >> 2. Defaulting to non ZFS filesystems on install. >> >> -- >> Michelle Sullivan >> http://www.mhix.org/ > > Wow, losing multiple TB sucks for anybody. I'm sorry for your loss. > But I want to respond to a few points from the blog post. > > 1) When ZFS says that "the data is always correct and there's no need > for fsck", they mean metadata as well as data. The spacemap is > protected in exactly the same way as all other data and metadata. (to > be pedantically correct, the labels and uberblocks are protected in a > different way, but still protected). The only way to get metadata > corruption is due a disk failure (3-disk failure when using RAIDZ2), > or due to a software bug. Sadly, those do happen, and they're > devilishly tricky to track down. The difference between ZFS and older > filesystems is that older filesystems experience corruption during > power loss _by_design_, not merely due to software bugs. A perfectly > functioning UFS implementation will experience corruption during power > loss, and that's why it needs to be fscked. It's not just > theoretical, either. I use UFS on my development VMs, and they > frequently experience corruption after a panic (which happens all the > time because I'm working on kernel code). I know, which is why I have ZVOLs with UFS filesystems in them for the development VMs... in a perfect world the power would have been all good, the upses would not be damaged and the generator would not run out of fuel because of extended outage... in fact if it was a perfect world I wouldn’t have my own mini dc at home. > > 2) Backups are essential with any filesystem, not just ZFS. After > all, no amount of RAID will protect you from an accidental "rm -rf /". You only do it once... I did it back in 1995... haven’t ever done it again. > > 3) ZFS hotspares can be swapped in automatically, though they don't be > default. It sounds like you already figured out how to assign a spare > to the pool. To use it automatically, you must set the "autoreplace" > pool property and enable zfsd. The latter can be done with "sysrc > zfsd_enable="YES"". The system was originally built on 9.0, and got upgraded through out the years... zfsd was not available back then. So get your point, but maybe you didn’t realize this blog was a history of 8+ years? > > 4) It sounds like you're having a lot of power trouble. Have you > tried sysutils/apcupsd from ports? I did... Malta was notorious for it. Hence 6kva upses in the bottom of each rack (4 racks), cross connected with the rack next to it and a backup generator... Australia on the otherhand is a lot more stable (at least where I am)... 2 power issues in 2 years... both within 10 hours... one was a transformer, the other when some idiot took out a power pole (and I mean actually took it out, it was literally snapped in half... how they got out of the car and did a runner before the police or Ambos got there I’ll never know.) > It's fairly handy. It can talk to > a wide range of UPSes, and can be configured to do stuff like send you > an email on power loss, and power down the server if the battery gets > too low. > They could help this... all 4 upses are toast now. One caught fire, one no longer detects AC input, the other two I’m not even trying after the first catching fire... the lot are being replaced on insurance. It’s a catalog of errors that most wouldn’t normally experience. However it does show (to me) that ZFS on everything is a really bad idea... particularly for home users where there is unknown hardware and you know they will mistreat it... they certainly won’t have ECC RAM in laptops etc... unknown caching facilities etc.. it’s a recipe for losing the root drive... Regards, Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS...
Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 30 Apr 2019, at 03:13, Kurt Jaeger wrote: > > Hi! > >> I know I'm not going to be popular for this, but I'll just drop it here >> anyhow. >> >> http://www.michellesullivan.org/blog/1726 > > With all due respect, I think if that filesystem/server you describe > has not kept with all those mishaps, I think it's not perfect, but > nothing is. The killer was the catalog of errors, where you have resolver in progress, and not one but two power failures...and just to let you know... one of the 6kva upses caught fire the other no longer recognizes AC input... so it was not your normal event. I had a good run with 8 years of running ZFS on this server. > >> Perhaps one should reconsider either: >> >> 2. Defaulting to non ZFS filesystems on install. > > I had more cases of UFS being toast than ZFS until now. I’ve toasted many, however I’ve always been able to get majority(if not all) of the data. > >> 1. Looking at tools that may be able to recover corrupt ZFS metadata, or > > Here I agree! Making tools available to dig around zombie zpools, > which is icky in itself, would be helpful! The one tool that I could think would be useful that is denied “because the data on disk is always right” is not a fact for zfs, but a zfs send with -AAA (like zdb) or a “zfs walk” tool that works similar to zfs send but where you can tell it to ignore the checksum errors (particularly in the structures of zfs rather than on the data itself) so you can send what’s left of your data to another box either in part or fully. Particularly as in my case all the tools tell me all the data is there and intact and it’s just the metadata that can’t be recovered/repaired. Regards, Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS...
I know I'm not going to be popular for this, but I'll just drop it here anyhow. http://www.michellesullivan.org/blog/1726 Perhaps one should reconsider either: 1. Looking at tools that may be able to recover corrupt ZFS metadata, or 2. Defaulting to non ZFS filesystems on install. -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FCP-0101: Deprecating most 10/100 Ethernet drivers
tech-lists wrote: I'm astonished you're considering removing rl given how common it is. I'll second that comment - though no disrespect to Brooks. Brooks as far as I can see is just the messenger. -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: drm / drm2 removal in 12
blubee blubeeme wrote: On Sat, Aug 25, 2018 at 10:04 AM Mark Linimon wrote: On Sat, Aug 25, 2018 at 07:07:24AM +0800, blubee blubeeme wrote: Are these guys insane and please avoid the nonsense about you're doing this in your spare time. Let us know how whatever OS you wind up using instead works for you. I suggest you look for one that will put up with your constant harangues. There are very few people on the mailing lists as nasty and rude as yourself. It is tiresome, demotivating, and childish. Please go elsewhere. mcl Your opinion has been noted but this issue isn't about me. It's about the Graphics devs coding themselves into a corner and looking for an easy button so they can continue to feel good about their toy. There's a reason the changes they tried to force down the FreeBSD source tree was reverted; It does not meet any standards of quality. I have no inside knowledge other than my ability to think clearly and it's obvious that The FreeBSD team wanted to hint at them that their code doesn't pass the sniff test. Instead of being whiny brats improve your code and have it work without breaking compatibility with what has been working for quite a long time. Here's the play by play; You guys push this mess contaminating the FreeBSD source tree, some long standing user tries to update their machines and it blows up; 1) Most will just leave 2) Some will complain 2a) You guys will say; Read UPDATING. bleh bleh blen They'll get aggravated, thereby aggravating people who came to this platform for Stability. Users who actually use FreeBSD to get things done do not time to trawl these mailing lists, they have real world problems to solve with real world constraints. There are OS with kqueue and all those things it's called Linux; You can go there and play w/ that stuff to your hearts content. If you want your code to get merged, make sure it follows the guidelines and not break the systems for people who are already using the platform. Now, I understand hearing harsh criticism about your work might hurt your feelings and all but here's the antidote; work harder, improve your code, try again when your code quality improves. You guys cannot expect people to accept these kludges in their systems that they run everyday. It's an open source project, you can't get mad because your code isn't accepted, it's your jobs and engineers to do better not expect users to jump through hoops to accommodate your subpar attempts at coding. This isn't about me, it's about the quality of code that you guys are trying to submit. Not much to disagree with what you say here.. because that's why I no longer work on using FreeBSD instead having created my own fork which is something I can call stable, that can be patched for security issues and something that is usable across my environment. The one thing who you should be aware of and what I do disagree with you over is who you are speaking to ML is a long standing 'old hat' of FreeBSD and someone I respect and I know would not be putting 'kludges' and substandard code into the trees... Direct your anger elsewhere, whilst still making valid points. Regards, Michelle -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: hw.vga.textmode=1 and the installation media
Eugene M. Zheganin wrote: Hi, would be really nice if the 11.2 and subsequent versions would come with the hw.vga.textmode=1 as the default in the installation media. Because you know, there's a problem with some vendors (like HP) who's servers are incapable of showing graphics in IPMI with the default hw.vga.textmode=0 (yeah, I'm aware that most of the vendors don't have this issue), and there's still a bug that prevents this from being set from a loader prompt - USB keyboard doesn't work at least in 11.0 there (seems to be some sort of FreeBSD "holy cow", along with sshd starting last, after all the local daemons. I would ask again to fix the latter as I did last years, but it really seems to be a cornerstone which the FreeBSD is built upon). Yeah the USB 'bug' has been there since FreeBSD 7/8.x I have a couple of loader command that make it work again... will let you know later when my container arrives (next week) as the details are stored on the HP blades (being transported.) It also breaks most/all of the Softlayer consoles since 8.3(ish) - didn't manage to completely fix this as had no bios access to ensure the USB is in the correct mode. Of course don't forget that the older iLO based HP servers also have these problems - except if you pay the license fee for the advanced iLO support that supports graphics mode... as well as 2 cursor mode (what the f*** would anyone want 2 cursors on a console for anyhow?... never worked that out.) Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 9.3 to 11.1 upgrade
Zoran Kolic wrote: Is it possible, like 9.3 to 11.0 ? Binary. I have a box, which has no reliable hardware, and would like to avoid multiple disk spin. Best regards Generally (and previously) the advice is to go via the next major version so you should go: 9.3 -> 10.x -> 11.x not 9.3 -> 11.x Regards, -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: CVE-2016-7434 NTP
Xin LI wrote: We plan to issue an EN to update the base system ntp to 4.2.8p9. The high impact issue is Windows only by the way. I don't think I'm even impacted - but $security team are going nuts about getting patched on all systems :/ Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: CVE-2016-7434 NTP
Dimitry Andric wrote: On 08 Dec 2016, at 06:08, Michelle Sullivan <miche...@sorbs.net> wrote: Are we going to get a patch for CVE-2016-7434 on FreeBSD 9.3? On Nov 22, in r309009, Xin Li merged ntp 4.2.8p9, which fixes this issue, to stable/9: https://svnweb.freebsd.org/changeset/base/309009 Unfortunately the commit message did not mention the CVE identifier. I can't find any corresponding security advisory either. -Dimitry No updates needed to update system to 9.3-RELEASE-p52. No updates are available to install. Run '/usr/sbin/freebsd-update fetch' first. [root@gauntlet /]# ntpd --version ntpd 4.2.8p8-a (1) So no then... 9.3 is still so-say supported so I'm not talking about -STABLE. Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
CVE-2016-7434 NTP
Are we going to get a patch for CVE-2016-7434 on FreeBSD 9.3? Michelle ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: freebsd-update borked?
Yass Amed wrote: On 08/04/2016 07:00 AM, freebsd-stable-requ...@freebsd.org wrote: Message: 1 Date: Wed, 03 Aug 2016 17:51:13 +0200 From: Michelle Sullivan<miche...@sorbs.net> To: St?phane Dupille via freebsd-stable<freebsd-stable@freebsd.org> Subject: freebsd-update borked? Message-ID:<57a212f1.6060...@sorbs.net> Content-Type: text/plain; CHARSET=US-ASCII; format=flowed As per the subject... [root@cheetah ~]# freebsd-update -r 9.3-RELEASE upgrade Looking up update.FreeBSD.org mirrors... none found. Fetching metadata signature for 9.2-RELEASE from update.FreeBSD.org... done. Fetching metadata index... done. Fetching 1 metadata patches. done. Applying metadata patches... done. Fetching 1 metadata files... gunzip: (stdin): unexpected end of file metadata is corrupt. [root@cheetah ~]# mv /var/db/freebsd-update /var/db/freebsd-update.9.2 [root@cheetah ~]# mkdir /var/db/freebsd-update [root@cheetah ~]# freebsd-update -r 9.3-RELEASE upgrade Looking up update.FreeBSD.org mirrors... none found. Fetching public key from update.FreeBSD.org... done. Fetching metadata signature for 9.2-RELEASE from update.FreeBSD.org... done. Fetching metadata index... done. Fetching 2 metadata files... gunzip: (stdin): unexpected end of file metadata is corrupt. [root@cheetah ~]# uname -a FreeBSD cheetah.sorbs.net 9.2-RELEASE-p15 FreeBSD 9.2-RELEASE-p15 #0: Mon Nov 3 20:31:29 UTC 2014 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 See this thread https://forums.freebsd.org/threads/28992/ you might have a DNS issue according to this line "Looking up update.FreeBSD.org mirrors... none found ". OR, "sudo rm -f /var/db/freebsd-update/*.gz" and re-download the file again. Well the: [root@cheetah ~]# mv /var/db/freebsd-update /var/db/freebsd-update.9.2 [root@cheetah ~]# mkdir /var/db/freebsd-update Would have done the same thing... DNS should be fine... appears to be: [michelle@cheetah /usr/home/michelle]$ dig update.freebsd.org ; <<>> DiG 9.8.4-P2 <<>> update.freebsd.org ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39566 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 3, ADDITIONAL: 6 ;; QUESTION SECTION: ;update.freebsd.org.INA ;; ANSWER SECTION: update.freebsd.org.600INCNAMEupdate5.freebsd.org. update5.freebsd.org.600INA204.9.55.80 ;; AUTHORITY SECTION: freebsd.org.600INNSns2.isc-sns.com. freebsd.org.600INNSns1.isc-sns.net. freebsd.org.600INNSns3.isc-sns.info. ;; ADDITIONAL SECTION: ns1.isc-sns.net.3600INA72.52.71.1 ns1.isc-sns.net.3600IN2001:470:1a::1 ns2.isc-sns.com.3600INA63.243.194.1 ns2.isc-sns.com.3600IN2001:5a0:10::1 ns3.isc-sns.info.3600INA63.243.194.1 ns3.isc-sns.info.3600IN2001:5a0:10::1 ;; Query time: 164 msec ;; SERVER: 89.150.192.2#53(89.150.192.2) ;; WHEN: Thu Aug 4 22:41:04 2016 ;; MSG SIZE rcvd: 294 And actually it'll do this: [michelle@cheetah /usr/home/michelle]$ host -t srv _http._tcp.update.freebsd.org _http._tcp.update.freebsd.org has SRV record 1 40 80 update6.freebsd.org. _http._tcp.update.freebsd.org has SRV record 1 50 80 update5.freebsd.org. _http._tcp.update.freebsd.org has SRV record 1 5 80 update3.freebsd.org. _http._tcp.update.freebsd.org has SRV record 1 35 80 update4.freebsd.org. Which actually returns this: [michelle@cheetah /usr/home/michelle]$ host -t srv _http._tcp.update.freebsd.org | sed -nE "s/update.freebsd.org (has SRV record|server selection) //p" | cut -f 1,2,4 -d ' ' | sed -e 's/\.$//' | sort _http._tcp.1 35 update4.freebsd.org _http._tcp.1 40 update6.freebsd.org _http._tcp.1 5 update3.freebsd.org _http._tcp.1 50 update5.freebsd.org You'll note however, that now... [michelle@cheetah /usr/home/michelle]$ sudo freebsd-update -r 9.3-RELEASE upgrade Password: Looking up update.FreeBSD.org mirrors... none found. Fetching metadata signature for 9.2-RELEASE from update.FreeBSD.org... done. Fetching metadata index... done. Fetching 2 metadata files... done. Inspecting system... done. The following components of FreeBSD seem to be installed: kernel/generic world/base world/doc The following components of FreeBSD do not seem to be installed: src/src world/games world/lib32 Does this look reasonable (y/n)? y Fetching metadata signature for 9.3-RELEASE from update.FreeBSD.org... done. Fetching metadata index... done. Fetching 1 metadata patches. done. . . etc.. So what ever was broken is not now... and it still says "Looking up update.FreeBSD.org mirrors... none found." Thanks for trying... FYI: 9.3 will reach end-of-life in less than 4mos (try one of the 10.Xs). Not a hope in hell of that they don't work on this box, but even so the recommended path is go to the latest be
freebsd-update borked?
As per the subject... [root@cheetah ~]# freebsd-update -r 9.3-RELEASE upgrade Looking up update.FreeBSD.org mirrors... none found. Fetching metadata signature for 9.2-RELEASE from update.FreeBSD.org... done. Fetching metadata index... done. Fetching 1 metadata patches. done. Applying metadata patches... done. Fetching 1 metadata files... gunzip: (stdin): unexpected end of file metadata is corrupt. [root@cheetah ~]# mv /var/db/freebsd-update /var/db/freebsd-update.9.2 [root@cheetah ~]# mkdir /var/db/freebsd-update [root@cheetah ~]# freebsd-update -r 9.3-RELEASE upgrade Looking up update.FreeBSD.org mirrors... none found. Fetching public key from update.FreeBSD.org... done. Fetching metadata signature for 9.2-RELEASE from update.FreeBSD.org... done. Fetching metadata index... done. Fetching 2 metadata files... gunzip: (stdin): unexpected end of file metadata is corrupt. [root@cheetah ~]# uname -a FreeBSD cheetah.sorbs.net 9.2-RELEASE-p15 FreeBSD 9.2-RELEASE-p15 #0: Mon Nov 3 20:31:29 UTC 2014 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mfi driver performance too bad on LSI MegaRAID SAS 9260-8i
tly as well... neither would run ZFS, both would use their on card RAID kernels and UFS on top of them... ZFS would be reserved for the multi-user NFS file servers. (and trust me here, when it comes to media servers - where the media is just stored not changed/updated/edited - the 16i with a good highspeed SSD as 'Cachecade' really performs well... and on a moderately powerful MB/CPU combo with good RAM and several gigabit interfaces it's surprising how many unicast transcoded media streams it can handle... (read: my twin fibres are saturated before the machine reaches anywhere near full load, and I can still write at 13MBps from my old Mac Mini over NFS... which is about all it can do without any load either.) So moral of the story/choices. Don't go with ZFS because people tell you its best, because it isn't, go with ZFS if it suits your hardware and application, and if ZFS suits your application, get hardware for it. Regards, -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FreeBSD Quarterly Status Report - First Quarter 2016 (fwd)
Warren Block wrote: Introduction The first quarter of 2016 showed that FreeBSD retains a strong sense of ipseity. Improvements were pervasive, lending credence to the concept of meliorism. Panegyrics are relatively scarce, but not for lack of need. Perhaps this missive might serve that function in some infinitesimal way. There was propagation, reformation, randomization, accumulation, emulation, transmogrification, debuggenation, and metaphrasal during this quarter. In the financioartistic arena, pork snout futures narrowly edged out pointilism, while parietal art remained fixed. In all, a discomfiture of abundance. View the rubrics below, and marvel at their profusion and magnitude! Marvel! You're trolling right? -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Periodic jobs triggering panics in 10.1 and 10.2
Michael B. Eichorn wrote: > > I sorry, but I really don't get your point, PCBSD has shown a great > reason why zfs on root and on laptops/desktops is a good idea... boot > It has? As this is FreeBSD not PCBSD I must have missed that one... > environments. They have pretty much figured out how to use snapshots to > go from A-B ping-pong installations to A-B-C-D-E installations. I > am even aware of people using it to run Release and Current on the same > machine. Unfortunately at the moment the system requires GRUB, but > there is ongoing work to add the ability to the FreeBSD bootloader. > But it's not there yet... and would you consider this for someone who is not that technical? (Not that technical != non technical) > Further IIRC zfs send-receive has a history involving a developer who > wanted a better rsync for transfering his work to a laptop. As I said previously these are the features are the ones you listed as 'additional' (ie your after thoughts) > In addition > we have pretty much Moore's Lawed our way to the point where a new > laptop today can out spec a typical server from when ZFS was first > implemented. > I have yet to see a 6 spindle laptop... in fact I've yet to see a 3+ spindle laptop... I could be recalling wrongly but I'm pretty sure a number of emails have been seen on @freebsd.org lists that say, "don't use zfs on single spindle machines".. what I do know is that personally I have a machine with a hardware RAID and 16 drives... Initially I configured it with 1 large LD RAID6+HSP and put zfs on it (because I wanted to take advantage of the 'on the fly compression')... it's a backup store... and every scrub checksum errors were found - on files that had not been written to since the last scrub. I reconfigured it as 16 x single disk RAID0 drives - identical hardware, just a different config, put raidz2 across 15 drives and left one as a spare and now I don't have any errors except when a drive fails and even then it 'self heals'... > Hiding features because you 'can' shoot your foot off is hardly a > typical UNIXy way of thinking anyway. Not talking about 'hiding' features, even though this thread started with someone suggesting 'hiding' a bug by using -J and -j options for cron! Look I'm being quite confrontational here in this message, there are a lot of people that don't like me here, and I don't like some of them myself so the feeling is very mutual, the point I'm trying to make is quite simple. I see it almost daily, FreeBSD people saying "install ZFS that'll solve your problems" and "ZFS it's the way forward" ... just the same way as they did with PkgNG etc... (not going to say anything on that, don't want an argument on that, this is not about 'that'..) ZFS has it's place, it is very good at some things, it brings features that people need. ZFS does not work (is not stable) on i386 without recompiling the kernel, but it is presented as an installation option. ZFS is compiled in by default in i386 kernels without the necessary option change to make it "stable". We have been told the kernel option change will never be put there by default. freebsd-update will remove/replace a kernel compiled with the option i386 is still a teir1 platform. 32bit laptops are still available for purchase at major retailers (eg: Bestbuy) I do not believe zfs should be default available when it is not stable on all teir1 platforms. I believe it should be fixed to be stable before its added as an installation option to teir1 platforms and if it cannot/willnot be fixed to 'stable' status then it should never make it into the defaults available... it should be limited to be in advanced installations where the people who know will probably know how to fix things or what to expect. ..anyhow my thoughts on the subject.. why I don't know because in the time it has taken me to write this, it occurred to me, I don't give a stuff really if people see FreeBSD as stable or unstable anymore. I put forward experiences and what I see and the questions/answers I have to deal with here and am usually ignored or argued with and I spend 30 minutes (or more) writing emails explaining stuff/defending myself to people who don't care and think (like me) they know best when I could actually be doing the work I get paid for. On that note I will leave you to considerand discard my thoughts as trivial and pointless and reply as such and get on with making my stuff better by actually listening to people who use it. -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Periodic jobs triggering panics in 10.1 and 10.2
Michael B. Eichorn wrote: > On Tue, 2015-12-08 at 16:31 -0600, Dustin Wenz wrote: > >> I suspect this is a zfs bug that is triggered by the access patterns >> in the periodic scripts. There is significant load on the system when >> the scheduled processes start, because all jails execute the same >> scripts at the same time. >> >> I've been able to alleviate this problem by disabling the security >> scans within the jails, but leave it enabled on the root host. >> > > To avoid the problem of jails all starting things at the same time, use > the cron(8) flags -j and -J to set a 'jitter' which will cause cron to > sleep for a random period of specified duration (60 sec max). Cron > flags can be set using the rc.conf variable 'cron_flags'. > ___ > No that will just hide it (if successful at all) and it won't work in all cases. ... i386 is even worse for similar (not the same) instability triggered by the same scripts ... because zfs should not be used with the stock i386 kernel (which means if you're using it the whole patching process with freebsd-update won't work or will 'undo' your kernel config.) Personally I think zfs should be optional only for 'advanced' users and come with a whole host of warnings about what it is not suitable for however, it seems to be treated as a magic bullet for data corruption issues yet all I have seen is an ever growing list of where it causes problems.. when did UFS become an unreliable FS that is susceptible to chronic data corruption? -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Periodic jobs triggering panics in 10.1 and 10.2
Jan Bramkamp wrote: > On 09/12/15 13:45, Michelle Sullivan wrote: >> >> No that will just hide it (if successful at all) and it won't work in >> all cases. >> >> ... i386 is even worse for similar (not the same) instability triggered >> by the same scripts ... because zfs should not be used with the stock >> i386 kernel (which means if you're using it the whole patching process >> with freebsd-update won't work or will 'undo' your kernel config.) > > Do you have a good idea how to prevent users from shooting themselves > in the foot by running ZFS on 32 Bit kernels? Yes default not to having zfs available on any platform and allow people that know what they are doing to turn it on I mean "prevent users from shooting themselves in the foot" - how about by not having an option to install a zfs root on the default install disks? > >> Personally I think zfs should be optional only for 'advanced' users and >> come with a whole host of warnings about what it is not suitable for >> however, it seems to be treated as a magic bullet for data corruption >> issues yet all I have seen is an ever growing list of where it causes >> problems.. when did UFS become an unreliable FS that is susceptible to >> chronic data corruption? > > As storage capacity grew a lot faster than reliability. Yeah, that's why we have these multi-tes-of-terrabyte laptops that must have a zfs root install... > > UFS is a good file system for its time, but it trusts hardware > absolutely. Modern hardware doesn't deserve this level of trust. Ok at this point we have to question things... Does your average home machine need zfs? (because windows doesn't) ... does your average laptop require zfs (or even benefit) ...? In fact when I look at it, I'm running 70+ servers and a few desktops and I'm running 5 of them with zfs... 2 of them absolutely need it, 2 of them are solaris (which probably doesn't count and certainly doesn't have relevance to FreeBSD) the other is a 2005 P4 based server that is completely unusable because zfs on i386 doesn't work with the stock kernel and guess what ... it has 73G 15k SCSI Server drives in it so it probably has reliable hardware that doesn't suffer from "Modern hardware doesn't deserve this level of trust" > ZFS detects and recovers without dataloss from most errors caused by > the limited hardware reliability. Currently I've had more problems with the reliability of zfs in FreeBSD than reliability of hardware.. I do get your point though... > > ZFS isn't just a tool to deal with hardware limitations it's also a > convenience I no longer want to give up. Snapshots and replication > streams simplify backups and a background scrub once a week (or month) > sure beats waiting for fsck. Now this is the one set of reasons I can really appreciate and had it been the opening argument I'd have understood your position, but it seems this is a side note to the above and the above is where I see it's completely useless... When ZFS was first developed a friend and I in Sun had lots of fun setting up servers where we just chucked any old drives we could lay our hands on into a pool ... this we found very cool and this was where 'unreliable' hardware was an understatement - the drives were pulled from machines because SMART (and other tools) were reporting the drive(s) failing. but it was a work around for bad sectors etc... Seriously though the default to install with zfs and root on zfs is a really bad idea - the people who know how not to shoot themselves in the foot are those people that don't need a selectable option in the install because they know how to configure it... they're the people who will probably be in every manual and advanced option they can find anyhow (or just using boot servers and predefined install scripts)!! Regards, -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 10.2-Beta i386..what's wrong..?
Glen Barber wrote: On Fri, Jul 24, 2015 at 02:54:00AM +0200, Michelle Sullivan wrote: Actually I'm quite sucessfully running zfs on i386 (in a VM) ... here's the trick (which leads me to suspect ARC handling as the problem) - when I get to 512M of kernel space or less than 1G of RAM available system wide, I export/import the zfs pool... Using this formula I have uptimes of months... I haven't yet tried the 'ARC patch' that was proposed recently... Which FreeBSD version is this? Things changed since 10.1-RELEASE and what will be 10.2-RELEASE enough that I can't even get a single-disk ZFS system (in VirtualBox) to boot on i386. During 10.1-RELEASE testing, I only saw problems with multi-disk setup (mirror, raidzN), but the FreeBSD kernel grew since 10.1-RELEASE, so this is not unexpected. 9.2-i386 and 9.3-i386 - I don't run 10 on anything. -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 10.2-Beta i386..what's wrong..?
Glen Barber wrote: ZFS on i386 requires KSTACK_PAGES=4 in the kernel configuration to work properly, as noted in the 10.1-RELEASE errata (and release notes, if I remember correctly). We cannot set KSTACK_PAGES=4 in GENERIC by default, as it is too disruptive. Why? If you are using ZFS on i386, you *must* build your own kernel for this. It is otherwise unsupported by default. Why is zfs on i386 so hard? Why is it even in the GENERIC kernel if it's unsupported? -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 10.2-Beta i386..what's wrong..?
Glen Barber wrote: On Thu, Jul 23, 2015 at 07:44:43PM -0500, Mark Linimon wrote: On Fri, Jul 24, 2015 at 12:43:43AM +, Glen Barber wrote: Even on amd64, you need to tune the system with less than 4GB RAM. The only correct answer to how much RAM do you need to run ZFS is always more AFAICT. There's a bit more to it than that. You *can* successfully run amd64 ZFS system with certain tunings (vfs.kmem_max IIRC), but you also need to adjust things like disabling prefetching with less than 4GB RAM (accessible to the OS). So yeah, more RAM is always a thing in this playing field. Glen Actually I'm quite sucessfully running zfs on i386 (in a VM) ... here's the trick (which leads me to suspect ARC handling as the problem) - when I get to 512M of kernel space or less than 1G of RAM available system wide, I export/import the zfs pool... Using this formula I have uptimes of months... I haven't yet tried the 'ARC patch' that was proposed recently... -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS out of swap space
armonia wrote: Not solve the problem, I booted from 11.0 STABLE and when you try to import a pool system goes into freeze. (zpool import -f zroot) Just hang when attempting to mount the problematic section zroot / var / db / mysql / billing In this home are installed properly, I even removed through truncate -s 0 15 gigs in the file / home / user, but it does not help. More options? All attempts to freeze mount lead as well as attempts to boot the system. All I can suggest is boot and wait - see if it comes back - if not I am out of suggestions - zfs import -Ff storage worked for me when I was running out of swap when trying to import on a 9.2 system (storage being my pool name) After the import I exported booted back to 9.2 and it all worked... issued a 'scrub' *after* booting back to 9.2 (don't do it before) Michelle Среда, 25 марта 2015, 16:49 +01:00 от Michelle Sullivan miche...@sorbs.net: armonia wrote: -- Hello. Please help . I mirror ZFS 9.3 , after an active it by using mysql read \ write from an external script something broken. The operating system is not loaded at the time of Mount local filesystems pool consists of a mirror (raid 1 ) + hot swap, zfs partitions on a separate . zpool import -f -R /tmp zroot freezes try to import a pool from a LiveCD (10.1) - zpool import -f -R / tmp zroot out as in deep thought and then wrote something like pid 99217 (sh), uid 0, was killed: out of swap space pid 896 (ssh), uid 0, was killed: out of swap space zpool made of 2 disks to GPT and one as hot-swap. Write a 11-STABLE USB drive boot from it to single user, run your import, then run an export, then reboot into 9.3 and import without flags. Regards, -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org /compose?To=freebsd%2dsta...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org /compose?To=freebsd%2dstable%2dunsubscr...@freebsd.org -- -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS out of swap space
armonia wrote: How much are you waiting time after import? I was waiting for 4 days for the import to complete - however I do have a 16 drive 48T pool. Look a screenshot: http://i58.tinypic.com/mvkj00.jpg It helped you because you certainly do import 11 branch? It helped because there was some patches in the 11 branch that handle errors in the pool better than in 9.x... when it imported I was then able to export and restart back to 9.x which corrected the uncaught errors and it was importable straight back into 9.x.. I was then able to correct the rest of the errors by doing a scrub in 9.x (which ultimately was all caused by power, ups and battery failure whilst in the middle of a resilver. Regards, -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS out of swap space
armonia wrote: -- Hello. Please help . I mirror ZFS 9.3 , after an active it by using mysql read \ write from an external script something broken. The operating system is not loaded at the time of Mount local filesystems pool consists of a mirror (raid 1 ) + hot swap, zfs partitions on a separate . zpool import -f -R /tmp zroot freezes try to import a pool from a LiveCD (10.1) - zpool import -f -R / tmp zroot out as in deep thought and then wrote something like pid 99217 (sh), uid 0, was killed: out of swap space pid 896 (ssh), uid 0, was killed: out of swap space zpool made of 2 disks to GPT and one as hot-swap. Write a 11-STABLE USB drive boot from it to single user, run your import, then run an export, then reboot into 9.3 and import without flags. Regards, -- Michelle Sullivan http://www.mhix.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org