Re: [9fans] a 9P session between debian client and Plan 9 server side
Now I start the server with -D flag and try ''echo hello foo on Linux. The server side on Plan 9 say: -5- Twalk tag 1 fid 408 newfid 437 nwname 1 0:foo -5- Rwalk tag 1 nwqid 1 0:( 0 ) -5- Tstat tag 1 fid 437 -5- Rstat tag 1 stat 'foo' 'bootes' 'bootes' 'unknown' q ( 0 ) m 0666 at 1325748319 mt 1325748319 l 0 t 0 d 0 -5- Tclunk tag 1 fid 437 -5- Rclunk tag 1 -5- Twalk tag 2 fid 408 newfid 437 nwname 1 0:foo -5- Rwalk tag 2 nwqid 1 0:( 0 ) -5- Tstat tag 2 fid 437 -5- Rstat tag 2 stat 'foo' 'bootes' 'bootes' 'unknown' q ( 0 ) m 0666 at 1325748319 mt 1325748319 l 0 t 0 d 0 -5- Tclunk tag 2 fid 437 -5- Rclunk tag 2 -5- Twalk tag 2 fid 408 newfid 438 nwname 1 0:foo -5- Rwalk tag 2 nwqid 1 0:( 0 ) -5- Topen tag 2 fid 438 mode 1 fid mode is 0x1 -5- Ropen tag 2 qid ( 0 ) iounit 0 -5- Tstat tag 2 fid 438 -5- Rstat tag 2 stat 'foo' 'bootes' 'bootes' 'unknown' q ( 0 ) m 0666 at 1325748319 mt 1325748319 l 0 t 0 d 0 -5- Twrite tag 6 fid 438 offset 0 count 6 'hello' -5- Rwrite tag 6 count 6 -5- Tclunk tag 6 fid 438 -5- Rclunk tag 6 I will read more 9P examples in order to implement a wstat. Thank you Yaroslav and Andrey, it is worth to learn from you.
[9fans] venti and contrib: RFC
Hello, Summary of the previous epidodes: My Plan9 installation was still the initial one as far as partitionning is concerned. Since I had not grasped the venti purpose, other was empty, everything going into the venti archived. And I was doing a number of install/de-install of kerTeX for tests purposes, boom!: disk full and need to find a way to load an alternate root to fix things---or reinstall. But this leads to questions regarding the contrib stuff. When one has the sources, archiving with history the sources make sense. To take the example of kerTeX, there is a map describing where to put eventually a file, so the sources vary a little, but the result may be arbitrary. Secondly, the binaries compiled from the sources may vary even if the sources do not vary. So the compiled result is not worth archiving. (The convenience to have a fallback snapshot to not disrupt work is here; in case of bigger disaster, the time needed to recompile everything is acceptable---for kerTeX, even if the result is several tens of Mb, this is a matter of minutes.) Furthermore, for an experimental work, archiving a transient state is not worth the disk space. With the design of namespace manipulations, a Plan9 user can redirect the writes where he wants them to happen---venti or not venti, that is the question. But the user has to know. Is there a policy described somewhere? The problem, I think, is that on other systems, one thinks backup and archiving _after_---and decide what goes in backups. While here, powerful tools are there, by default, but user may be unaware of consequences. Perhaps should it be proposed by default, for the let's see what is Plan9, to get fossil only, and to switch to venti when things are clear? Cheers, -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] venti and contrib: RFC
So the compiled result is not worth archiving. it has been more than once that in tracking down a problem, i've found that the known working executable worked but the source from that point in history didn't. and vice versa. having the executables and libraries archived was very valuable. otoh, just to use round numbers, if your build creates 100mb of new data and all of it hits venti before being replaced, then you've got 1 builds/TB. in practice, i think most people push to venti only once a day, so this is practically infinite. or, put another way, that's $100/1 = 1¢/build. since the standard depricated comment is worth 2¢, it appears that these days 100mb is not worth commenting on. ☺. - erik
Re: [9fans] venti and contrib: RFC
On Thu, Jan 05, 2012 at 07:59:00AM -0500, erik quanstrom wrote: So the compiled result is not worth archiving. it has been more than once that in tracking down a problem, i've found that the known working executable worked but the source from that point in history didn't. and vice versa. having the executables and libraries archived was very valuable. otoh, just to use round numbers, if your build creates 100mb of new data and all of it hits venti before being replaced, then you've got 1 builds/TB. in practice, i think most people push to venti only once a day, so this is practically infinite. or, put another way, that's $100/1 = 1¢/build. since the standard depricated comment is worth 2¢, it appears that these days 100mb is not worth commenting on. ?. Perhaps, but it seems to me like digging ore, extracting the small percentage of valuable; forging a ring; and throwing it in the ore, and storing the whole... Secondly, I still use optical definitive storage from time to time (disks go in a vault)... with KerGIS and others, and kerTeX, this still fit 3 times on a CDROM. So... And finally, didn't the increase in size of the disks, with no decrease of the reliability, increases the probability of disks failure? Unfortunately, one finds not small (that were huge some years ago) disks anymore... PS: and for a Plan9 tester, he will begin by devoting a partition on a disk to see. The iso is around 300Mb, so allocating 512 or 1024Mb will seem enough. If he's hooked to Plan9---that happened to me ;)---sooner or later a problem will occur. -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] venti and contrib: RFC
On Thu, Jan 05, 2012 at 02:20:34PM +0100, tlaro...@polynum.com wrote: And finally, didn't the increase in size of the disks, with no decrease no increase, of course. If probability of a failure for a sector is P, increasing the number of sectors increases the probability of disk failure. -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] venti and contrib: RFC
Perhaps, but it seems to me like digging ore, extracting the small percentage of valuable; forging a ring; and throwing it in the ore, and storing the whole... generally it's apparent which files are worth investigating, and between history (list of changes by date) and a binary search, it shouldn't take more than a handful of tries to narrow things down. in practice, i've found the executables more helpful than the source. Secondly, I still use optical definitive storage from time to time (disks go in a vault)... with KerGIS and others, and kerTeX, this still fit 3 times on a CDROM. So... if you are using venti, there is no reason to re-archive closed arenas. (and there's no a priori reason that your optical backup must include history.) And finally, didn't the increase in size of the disks, with no decrease of the reliability, increases the probability of disks failure? Unfortunately, one finds not small (that were huge some years ago) disks anymore... i think disk reliability is a term that gets canceled out. if you have n copies of an executable, whatever the reliablity of the drive, each copy is exactly as likely to be intact. PS: and for a Plan9 tester, he will begin by devoting a partition on a disk to see. The iso is around 300Mb, so allocating 512 or 1024Mb will seem enough. If he's hooked to Plan9---that happened to me ;)---sooner or later a problem will occur. that's not what i did. i started with several 18gb scsi drives. - erik
Re: [9fans] venti and contrib: RFC
On Thu, Jan 05, 2012 at 08:27:50AM -0500, erik quanstrom wrote: Secondly, I still use optical definitive storage from time to time (disks go in a vault)... with KerGIS and others, and kerTeX, this still fit 3 times on a CDROM. So... if you are using venti, there is no reason to re-archive closed arenas. (and there's no a priori reason that your optical backup must include history.) Because I use CVS (not on Plan9), and I backup my CVS. So, sources with history. I do not consider CDROM to be eternal. So there is a small number kept, and the older is destroyed when the new one is burnt. -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] venti and contrib: RFC
I am not sure to understand your question. Nothing forces you to dump the full Fossil tree to Venti every night. You can run snap manually every time you want, or run it only on a part of the tree. You can also individually exclude some files from the snapshots using the DMTMP bit. If you really want to avoid archiving binaries, you could simply add a cron job which automatically apply the DMTMP bit on the binaries just before the archival snapshot. Fossil and Venti are very flexible, you can do almost everything you want. -- David du Colombier
Re: [9fans] venti and contrib: RFC
Because I use CVS (not on Plan9), and I backup my CVS. So, sources with history. I do not consider CDROM to be eternal. So there is a small number kept, and the older is destroyed when the new one is burnt. sorry. i thought we were talking about organizing plan 9 storage. never mind - erik
Re: [9fans] venti and contrib: RFC
On Thu Jan 5 08:28:57 EST 2012, tlaro...@polynum.com wrote: On Thu, Jan 05, 2012 at 02:20:34PM +0100, tlaro...@polynum.com wrote: And finally, didn't the increase in size of the disks, with no decrease no increase, of course. If probability of a failure for a sector is P, increasing the number of sectors increases the probability of disk failure. sector failure != disk failure. disk failure is generally due to the heads c, and generally independent of the size of the device. - erik
Re: [9fans] venti and contrib: RFC
On Thu, Jan 05, 2012 at 09:14:28AM -0500, erik quanstrom wrote: Because I use CVS (not on Plan9), and I backup my CVS. So, sources with history. I do not consider CDROM to be eternal. So there is a small number kept, and the older is destroyed when the new one is burnt. sorry. i thought we were talking about organizing plan 9 storage. never mind I use CVS on NetBSD now. But even on Plan9, I want my sources with history. This means that on Plan9, I will make a separate partition for my sources; add a log file to register comments about changes; and backup the whole arena. -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] venti and contrib: RFC
On Thu, Jan 05, 2012 at 02:48:10PM +0100, David du Colombier wrote: Fossil and Venti are very flexible, you can do almost everything you want. No doubt about that. But perhaps the other users are smart enough to have understood all this at installation time, but when I first installed Plan9, that was not for the archival features. And I spent my time on Plan9 looking for the distributed system, the namespace and so on, not on venti. The question is more about the defaults and/or the documentation. -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] venti and contrib: RFC
On Thu, Jan 5, 2012 at 10:15 AM, tlaro...@polynum.com wrote: But perhaps the other users are smart enough to have understood all this at installation time, but when I first installed Plan9, that was not for the archival features. And I spent my time on Plan9 looking for the distributed system, the namespace and so on, not on venti. The question is more about the defaults and/or the documentation. The default is that you have so little data in comparison to a modern disk that there is no good reason not to save full snapshots. As Erik and others have pointed out, if you do find reason to exclude certain trees from the snapshots, you can use chmod +t. The system is working as intended. Russ
Re: [9fans] venti and contrib: RFC
On Thu, Jan 05, 2012 at 10:44:18AM -0500, Russ Cox wrote: The default is that you have so little data in comparison to a modern disk that there is no good reason not to save full snapshots. As Erik and others have pointed out, if you do find reason to exclude certain trees from the snapshots, you can use chmod +t. The system is working as intended. Quoting ``Installing the Plan9 Distribution'': You need an x86-based PC with 32MB of RAM, a supported video card, and a hard disk with at least 300MB of unpartitionned space and a free primary partition slot. Yes, this is from the printed edition of Plan9 Programmer's Manual, 3rd Edition. But I don't see why caveats will hurt a new comer, who is probably not devoting an entire new disk to a system he doesn't know yet and wants to try, but making Plan9 some place on a disk populated with other data. And giving a hint about the archival features will not hurt either. -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] venti and contrib: RFC
The third edition was published in june 2000. It predates both Venti (april 2002) and Fossil (january 2003). This documentation was about installing Plan 9 on a standalone terminal running kfs, not a file server. -- David du Colombier
Re: [9fans] venti and contrib: RFC
I doubt anyone would object if you want to change the text and submit to the website owners. ron
Re: [9fans] Killing venti
Russ Cox r...@swtch.com writes: run venti/sync. Ah. Cool. Gotta love those undocumented commands. :) While probing the distal edges of Venti's documented functionality, I also came across the following, which have similar (but not identical) effect: hget http://$vthost:$vtwebport/flushicache hget http://$vthost:$vtwebport/flushdcache These HTTP requests initiate flushes of the index and arena block caches, respectively, and don't return a response until the respective flush is complete. -- +---+ |Smiley smi...@icebubble.orgPGP key ID:BC549F8B | |Fingerprint: 9329 DB4A 30F5 6EDA D2BA 3489 DAB7 555A BC54 9F8B| +---+
Re: [9fans] ramfs, fossil, venti etc.
Steve Simon st...@quintile.net writes: Even fossil can be grown though you will need a new bigger partition or grow the existing one using fs(3), this can then be refreshed from a venti snapshot. Quid pro quo: IIRC, if you do this (using flfmt), you will retain venti archives of the fossil filesystem, but loose the ephemeral snapshots, as well as any data marked +t. There's currently no way to resize a fossil file system in-place, is there? -- +---+ |Smiley smi...@icebubble.orgPGP key ID:BC549F8B | |Fingerprint: 9329 DB4A 30F5 6EDA D2BA 3489 DAB7 555A BC54 9F8B| +---+
Re: [9fans] venti and contrib: RFC
On Thu, 05 Jan 2012 17:39:07 +0100 tlaro...@polynum.com wrote: On Thu, Jan 05, 2012 at 10:44:18AM -0500, Russ Cox wrote: The default is that you have so little data in comparison to a modern disk that there is no good reason not to save full snapshots. As Erik and others have pointed out, if you do find reason to exclude certain trees from the snapshots, you can use chmod +t. The system is working as intended. Quoting ``Installing the Plan9 Distribution'': You need an x86-based PC with 32MB of RAM, a supported video card, and a hard disk with at least 300MB of unpartitionned space and a free primary partition slot. Yes, this is from the printed edition of Plan9 Programmer's Manual, 3rd Edition. But I don't see why caveats will hurt a new comer, who is probably not devoting an entire new disk to a system he doesn't know yet and wants to try, but making Plan9 some place on a disk populated with other data. Are you going to update the wiki then? Newbies need all the help they can get! If you do, make sure to mention they use RAID1! Not when they are just playing with plan9 but before they start storing precious data. On a moderen consumer disk, if you read 1TB, you have 8% chance of a silent bad read sector. More important to worry about that in today's world than optimizing disk space use. If you partition disk space, chances are, it will be used suboptimally (that is, it will turn out you guessed partition size wrong for the actual use). These days (for most people) the *bulk* of their disks contain user data so there is really no point in partitioning. Just make sure your truly critical data is backed up separately (repeat until you are satisfied).
Re: [9fans] (no subject)
erik quanstrom wrote: for me, the most important questions are - how do i set up a raid/hot spares, and - can i do this without rebooting. Of course, and right now I'm doing exactly that using a different operating system. Can I do that on Plan 9? I don't know, I'm trying to find out without much success. wikipedia. I really don't understand why are you sending me to read wikipedia. Generally, I think of myself as a decent speaker, I know how to make myself clear. It's obvious that in this case I failed. In my previous job I have worked on a file system that among other things also implements redundancy. If I implemented these things I guess I know about them without having to read wikipedia. The machine I use today for storage also runs bits of my own software. It's very easy to administer, tells me when disks are broken, I can just add disks for more storage without reboot and I can hot swap disks. Can Plan 9 do this? I don't know, I guess not? It's fine by me, I'm willing to sacrifice performance and ease of administration for an operating system I like better. I'm willing to implement myself what I need and doesn't exist yet, though I have a very hard time understanding what's missing from these very, very vague discussions. there's nothing strange about a sata device or even a raid of various devices of any type being presented with an ide programming interface. one could just as easily slap an ahci programming interface on, but either requires translation software/hardware. I agree that's nothing inherently strange, but in practice it's uncommon, at least in my experience. But then again, I'm not a hardware guy, so my experience means nothing. -- Aram Hăvărneanu
Re: [9fans] venti and contrib: RFC
On Thu, Jan 5, 2012 at 10:15 AM, tlaro...@polynum.com wrote: But perhaps the other users are smart enough to have understood all this at installation time, but when I first installed Plan9, that was not for the archival features. And I spent my time on Plan9 looking for the distributed system, the namespace and so on, not on venti. The question is more about the defaults and/or the documentation. The default is that you have so little data in comparison to a modern disk that there is no good reason not to save full snapshots. As Erik and others have pointed out, if you do find reason to exclude certain trees from the snapshots, you can use chmod +t. The system is working as intended. Russ For reference, I set up our current Plan 9 system about half a year ago. We have 3.8 TB of Venti storage total. We have used 2.8 GB of that, with basically no precautions taken to set anything +t; in general, if it's around at 4 a.m., it's going into Venti. I figure we have roughly another 2,000 years of storage left at the current rate :) John
Re: [9fans] venti and contrib: RFC
if you read 1TB, you have 8% chance of a silent bad read sector. More important to worry about that in today's world than optimizing disk space use. do you have a citation for this? i know if you work out the numbers from the BER, this is about what you get, but in practice i do not see this 8%. we do pattern writes all the time, and i can't recall the last time i saw a silent read error. - erik
Re: [9fans] venti and contrib: RFC
On Thu, Jan 05, 2012 at 09:36:13AM -0800, ron minnich wrote: I doubt anyone would object if you want to change the text and submit to the website owners. That was my intention, but before, I wanted to submit to the list some stuff, in order to not publish nonsense. [But probably some people equal Laronde and nonsense...] I do want to submit the stuff for revue, since I have been playing with the installation process to use it at a reparation tool. Mounting the CD image, extracting the El Torito 2.88Mb image; mounting the file as a fat; extracting 9load and repopulating my plan9:9fat (9pcflop having a kernel and a spartiate root is then a reparation tool). Jivaro'ing the CD image to put it in the 100 Mb 9fat, in order to install from there, Plan9 fat being seen from NetBSD by playing with a disklabel. Etc. I don't care about a software incident if I have fun and use it to learn... -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] venti and contrib: RFC
On Thu, Jan 05, 2012 at 10:07:08AM -0800, John Floren wrote: For reference, I set up our current Plan 9 system about half a year ago. We have 3.8 TB of Venti storage total. We have used 2.8 GB of that, with basically no precautions taken to set anything +t; in general, if it's around at 4 a.m., it's going into Venti. I figure we have roughly another 2,000 years of storage left at the current rate :) The TB were there because you planned to use TeXlive, that's all... ;) -- Thierry Laronde tlaronde +AT+ polynum +dot+ com http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: [9fans] Killing venti
venti/sync calls vtsync which is documented in venti-client(2). Hopefully, you don't have to flush the dcache or icache before shutting down Venti. Especially since flushing the icache will likely take a very long time. -- David du Colombier
Re: [9fans] venti and contrib: RFC
On Thu, 05 Jan 2012 13:01:52 EST erik quanstrom quans...@quanstro.net wrote: if you read 1TB, you have 8% chance of a silent bad read sector. More important to worry about that in today's world than optimizing disk space use. do you have a citation for this? i know if you work out the numbers from the BER, this is about what you get, but in practice i do not see this 8%. we do pattern writes all the time, and i can't recall the last time i saw a silent read error. Silent == unseen! Do you log RAID errors? Only way to catch them. That number is derived purely on an bit error rate (I think vendors base that on the Reed-Solomon code used). No idea how uniformly random the data (or medium) is in practice. I thought the practice was worse!
Re: [9fans] venti and contrib: RFC
On Thu, 05 Jan 2012 10:07:08 PST John Floren j...@jfloren.net wrote: For reference, I set up our current Plan 9 system about half a year ago. We have 3.8 TB of Venti storage total. We have used 2.8 GB of that, with basically no precautions taken to set anything +t; in general, if it's around at 4 a.m., it's going into Venti. I figure we have roughly another 2,000 years of storage left at the current rate :) I first read that 2.8 GB as 2.8 TB and was utterly confused! You'd save a bunch of energy if you only powered up venti disks once @ 4AM for a few minutes (and on demand when you look at /n/dump). Though venti might have fits! And the disks might too! So may be this calls for a two level venti? First to an SSD RAID and a much less frequent venti/copy to hard disks. venti doesn't have a scrub command, does it? zfs scrub was instrumental in warning me that I needed new disks.
Re: [9fans] venti and contrib: RFC
On Thu Jan 5 13:26:16 EST 2012, ba...@bitblocks.com wrote: On Thu, 05 Jan 2012 13:01:52 EST erik quanstrom quans...@quanstro.net wrote: if you read 1TB, you have 8% chance of a silent bad read sector. More important to worry about that in today's world than optimizing disk space use. do you have a citation for this? i know if you work out the numbers from the BER, this is about what you get, but in practice i do not see this 8%. we do pattern writes all the time, and i can't recall the last time i saw a silent read error. Silent == unseen! Do you log RAID errors? Only way to catch them. That number is derived purely on an bit error rate (I think vendors base that on the Reed-Solomon code used). No idea how uniformly random the data (or medium) is in practice. I thought the practice was worse! i thought your definition of silent was not caught by the on-drive ecc. i think this is not very likely, and we're explicitly checking for this byrunning massive numbers of disks through pattern writes with verification, and don't see it. - erik
Re: [9fans] venti and contrib: RFC
On Thu, 05 Jan 2012 10:07:08 PST John Floren j...@jfloren.net wrote: For reference, I set up our current Plan 9 system about half a year ago. We have 3.8 TB of Venti storage total. We have used 2.8 GB of that, with basically no precautions taken to set anything +t; in general, if it's around at 4 a.m., it's going into Venti. I figure we have roughly another 2,000 years of storage left at the current rate :) I first read that 2.8 GB as 2.8 TB and was utterly confused! You'd save a bunch of energy if you only powered up venti disks once @ 4AM for a few minutes (and on demand when you look at /n/dump). Though venti might have fits! And the disks might too! So may be this calls for a two level venti? First to an SSD RAID and a much less frequent venti/copy to hard disks. venti doesn't have a scrub command, does it? zfs scrub was instrumental in warning me that I needed new disks. Well, we need the venti disks powered on whenever we're using it, right? Since most of the filesystem is actually living on Venti and fossil just has pointers to it? Also, I think it's probably better for disks to stay on all the time rather than go on-off-on-off. And compared to the rest of the machine room, keeping a Coraid running all the time isn't that big of a thing. John
Re: [9fans] venti and contrib: RFC
You'd save a bunch of energy if you only powered up venti disks once @ 4AM for a few minutes (and on demand when you look at /n/dump). Though venti might have fits! And the disks might too! So may be this calls for a two level venti? First to an SSD RAID and a much less frequent venti/copy to hard disks. venti doesn't have a scrub command, does it? zfs scrub was instrumental in warning me that I needed new disks. they're using coraid storage. all this is taken care of for them by the SR appliance. - erik
Re: [9fans] Killing venti
On Thu, Jan 5, 2012 at 12:35 PM, smi...@icebubble.org wrote: run venti/sync. Ah. Cool. Gotta love those undocumented commands. :) While probing the distal edges of Venti's documented functionality, I also came across the following, which have similar (but not identical) effect: hget http://$vthost:$vtwebport/flushicache hget http://$vthost:$vtwebport/flushdcache These HTTP requests initiate flushes of the index and arena block caches, respectively, and don't return a response until the respective flush is complete. Honestly, you don't even have to run venti/sync. Every command that writes to venti ends by doing a sync. You probably don't want to use those hget commands. They are safe, of course, but it is equally safe not to run them. The icache in particular can take a long time to flush, and venti will recover the entries (in less time than the flush would have taken) the next time it starts. Russ
Re: [9fans] venti and contrib: RFC
venti doesn't have a scrub command, does it? zfs scrub was instrumental in warning me that I needed new disks. they're using coraid storage. all this is taken care of for them by the SR appliance. Out of curiosity, how? ZFS blocks are checksummed. ZFS scrub reads not physical blocks on disks, but logical ZFS blocks and validates their checksum. How can the Coraid appliance determine if the data it reads is valid or not since it works below the file system layer and only understands physical blocks? -- Aram Hăvărneanu
Re: [9fans] venti and contrib: RFC
On Thu Jan 5 14:13:55 EST 2012, ara...@mgk.ro wrote: venti doesn't have a scrub command, does it? zfs scrub was instrumental in warning me that I needed new disks. they're using coraid storage. all this is taken care of for them by the SR appliance. Out of curiosity, how? ZFS blocks are checksummed. ZFS scrub reads not physical blocks on disks, but logical ZFS blocks and validates their checksum. How can the Coraid appliance determine if the data it reads is valid or not since it works below the file system layer and only understands physical blocks? all redundant raid types have parity (even raid 1; fun fact!). - erik
Re: [9fans] venti and contrib: RFC
but john, the whole your venti would easily fit in even a small server memory, now and forever ;) ron
Re: [9fans] venti and contrib: RFC
For reference, I set up our current Plan 9 system about half a year ago. We have 3.8 TB of Venti storage total. We have used 2.8 GB of that, with basically no precautions taken to set anything +t; in general, if it's around at 4 a.m., it's going into Venti. I figure we have roughly another 2,000 years of storage left at the current rate :) in 10 years, we've managed to store only 645gb of stuff in ken fs. so duplicates are stored as dupes. large chunks of it are email (from the pre nupas days) or imported from even older systems. the worm is only 3tb. the system was built last nov with the then-the-best-option 1tb drives. today, the whole worm could be put on 2 drives with raid 10. in 2 years, when it's time to replace the drives, the worm should be less than 2/3 full, assuming ~300% year/year acceleration in storage used. my personal system, a mere 6 years old, only has about 12.5gb of junk. (in a 1500gb worm!). and drives are ripe for replacement. by the way, in thinking a bit more about the BER and scrubbing, my 3 raids have been scrubbing continuously for 3 years and, (so that's hundreds of times) except when i actually had a bad disk, i have not seen URE. - erik
Re: [9fans] venti and contrib: RFC
On Thu, 05 Jan 2012 13:43:49 EST erik quanstrom quans...@quanstro.net wrote: On Thu Jan 5 13:26:16 EST 2012, ba...@bitblocks.com wrote: On Thu, 05 Jan 2012 13:01:52 EST erik quanstrom quans...@quanstro.net wr ote: if you read 1TB, you have 8% chance of a silent bad read sector. More important to worry about that in today's world than optimizing disk space use. do you have a citation for this? i know if you work out the numbers from the BER, this is about what you get, but in practice i do not see this 8%. we do pattern writes all the time, and i can't recall the last time i saw a silent read error. Silent == unseen! Do you log RAID errors? Only way to catch them. That number is derived purely on an bit error rate (I think vendors base that on the Reed-Solomon code used). No idea how uniformly random the data (or medium) is in practice. I thought the practice was worse! i thought your definition of silent was not caught by the on-drive ecc. i think this is not very likely, and we're explicitly checking for Hmm You are right! I meant *uncorrectable* read errors (URE), which are not necessarily *undetectable* errors (where a data pattern switches to another pattern mapping to the same syndrome bits). Clearly my memory by now has had much more massive bit-errors! Still, consumer disk URE rate of 10^-14 coupled with large disk sizes does mean RAID is essential. this byrunning massive numbers of disks through pattern writes with verification, and don't see it. Are these new disks? The rate goes up with age. Do SMART stats show any new errors? It is also possible vendors are *conservatively* specifying 10^-14 (though I no longer know how they arrive at the URE number!). Can you share what you did discover? [offline, if you don't want to broadcast] You've probably read http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.ps
Re: [9fans] venti and contrib: RFC
On Thu, 05 Jan 2012 13:50:48 EST erik quanstrom quans...@quanstro.net wrote: You'd save a bunch of energy if you only powered up venti disks once @ 4AM for a few minutes (and on demand when you look at /n/dump). Though venti might have fits! And the disks might too! So may be this calls for a two level venti? First to an SSD RAID and a much less frequent venti/copy to hard disks. venti doesn't have a scrub command, does it? zfs scrub was instrumental in warning me that I needed new disks. they're using coraid storage. all this is taken care of for them by the SR appliance. When are you going to sell these retail?! The question was for venti though.
Re: [9fans] venti and contrib: RFC
venti doesn't have a scrub command, does it? zfs scrub was instrumental in warning me that I needed new disks. they're using coraid storage. all this is taken care of for them by the SR appliance. When are you going to sell these retail?! The question was for venti though. i'm not sure i follow. why can't venti assume a perfect array-of-bytes device and let the appliance take care of it? if you care enough to get ecc memory, your data path should be 100% ecc protected. - erik
Re: [9fans] venti and contrib: RFC
You'd save a bunch of energy if you only powered up venti disks once @ 4AM for a few minutes (and on demand when you look at /n/dump). If fossil is setup to dump to venti then it needs venti to work at all. Fossil is a write cache, so, just after the dump at 4am fossil is empty and consists only of a pointer to the root of the dump in venti; all reads are then satisfied from venti alone, until some data is written. -Steve
Re: [9fans] venti and contrib: RFC
2012/1/5 Bakul Shah ba...@bitblocks.com: You'd save a bunch of energy if you only powered up venti disks once @ 4AM for a few minutes (and on demand when you look at /n/dump). Though venti might have fits! And the disks might too! So may be this calls for a two level venti? First to an SSD RAID and a much less frequent venti/copy to hard disks. I think you're confusing kenfs+worm with fossil+venti in sense that ken fs is a complete cache for worm while fossil is a write cache for venti. You need venti running all the time. -- - Yaroslav
Re: [9fans] venti and contrib: RFC
On Thu Jan 5 16:24:58 EST 2012, yari...@gmail.com wrote: 2012/1/5 Bakul Shah ba...@bitblocks.com: You'd save a bunch of energy if you only powered up venti disks once @ 4AM for a few minutes (and on demand when you look at /n/dump). Though venti might have fits! And the disks might too! So may be this calls for a two level venti? First to an SSD RAID and a much less frequent venti/copy to hard disks. I think you're confusing kenfs+worm with fossil+venti in sense that ken fs is a complete cache for worm while fossil is a write cache for venti. You need venti running all the time. this only could work if you assume that everything in the worm is also in the cache, and you've configured your cache to cache the worm. since these days the cache is often similar in performance to the worm, the default we use is to not copy the worm into the cache. this would just result in more i/o. so tl;dr you need the worm available at all times be it venti+fossil or ken fs/cwfs - erik
Re: [9fans] venti and contrib: RFC
erik quanstrom wrote: do you have a citation for this? i know if you work out the numbers from the BER, this is about what you get, but in practice i do not see this 8%. we do pattern writes all the time, and i can't recall the last time i saw a silent read error. Yes, the real numbers are much, much lower, but still significant because they affect RAID reconstruction. See this[1] paper. Some unrelated, but interesting fact from that paper: nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise class disk drives. [1] L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. An Analysis of Data Corruption in the Storage Stack. In FAST, 2008 http://www.cs.toronto.edu/~bianca/papers/fast08.pdf -- Aram Hăvărneanu