Re: [zfs-discuss] path-name encodings
Anton B. Rang [EMAIL PROTECTED] wrote: OK, thanks. I still haven't got any answer to my original question, though. I.e., is there some way to know what text the filename is, or do I have to make a more or less wild guess what encoding the program that created the file used? You have to guess. Ouch! Guessing sucks. (By the way, that's why I switched to ZFS with its internal checksums, so that I wouldn't have to guess if my data was OK.) Thanks for the answer, though. Do you happen to know where programs in (Open)Solaris look when they want to know how to encode text to be used in a filename? Is it LC_CTYPE? NFS doesn't provide a mechanism to send the encoding with the filename; I don't believe that CIFS does, either. Really?!? That's insane! How do programs know how to encode filenames to be sent over NFS or CIFS? If you're writing the application, you could store the encoding as an extended attribute of the file. This would be useful, for instance, for an AFP server. OK. But then I'd have to hack a similar change into all other programs that I use, too. The trick is that in order to support such things as casesensitivity=false for CIFS, the OS needs to know what characters are uppercase vs lowercase, which means it needs to know about encodings, and reject codepoints which cannot be classified as uppercase vs lowercase. I don't see why the OS would care about that. Isn't that the job of the CIFS daemon? The CIFS daemon can do it, but it would require that the daemon cache the whole directory in memory (at least, to get reasonable efficiency). I guess that depends on what file access functions there are for the file system. If you leave it up to the CIFS daemon, you also wind up with problems if you have a single sharepoint shared between local users, NFS CIFS -- the NFS client can create two files named a and A, but the CIFS client can only see one of those. Not necessarily. There could be some (nonstandard) way of accessing such duplicates (e.g., by having the CIFS daemon append [dup-N] or somesuch to the name). And even if that problem did exist it might still be OK for CIFS access to have that limitation. Regards, Marcus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] path-name encodings
[EMAIL PROTECTED] (Joerg Schilling) wrote: [...] ISO-8859-1 (the low 8 bits of UNOICODE) [...] Unicode is not an encoding, but you probably mean the low 8 bits of UCS-2 or the first 256 codepoints in Unicode or somesuch. Regards, Marcus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] path-name encodings
Marcus Sundman [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] (Joerg Schilling) wrote: [...] ISO-8859-1 (the low 8 bits of UNOICODE) [...] Unicode is not an encoding, but you probably mean the low 8 bits of UCS-2 or the first 256 codepoints in Unicode or somesuch. Unicode _is_ an encoding that uses 21 (IIRC) bits. UCS-2 is a way to _represent_ the low 16 bits of UNICODE in a way that allows to use some tricks go bejund 16 bits. Microfoft e.g. does not go bejund 16 bits. ISO-8859-1 is a representation of the low 8 bits of UNICODE (well ISO-8859-1 is older than UNICODE ;-). ISO-8859-1 does not allow to code more than the 8 least sinificant bits from unicode. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What is likely the best way to accomplish this task?
I realize I can't remove devices from a vdev, which, well, sucks and all, but I'm not going to complain about that. ;) I have 4x500G disks in a RAIDZ. I'd like to repurpose one of them as I'm finding that all that space isn't really needed and that one disk would serve me much better elsewhere (as the second half of a mirror in a machine going into colo). SYS1 124G 1.21T 29.9K /SYS1 the only other storage in the machine is the small 60G pool made up of the remains of the OS disks, which is not going to be enough space to hold this all while I rebuild the array. Is there any way that I could possibly get one of the disks out of the pool long enough to be used as temp space while I rebuild the array with just 3 disks? I'm thinking either to pull it out of the machine and scribble all over it in another machine so that it no longer has its ZFS bits on it. If this plan works, then it would just be a blank disk (although it will have the same dev id, so I don't know if zfs will pick it back up anyway). The other plan is to replace it with a file that resides on the 60G partition. This seems more likely to work, but I don't know how that'd work out with the fact that the file would be a lot smaller than the actual pool. Maybe a thinly provisioned dev would do the trick, as I could then make it look like 500G, but it would only really use what it needed to resync the pool. Is this logic all correct based on how ZFS works on snv_64a? I could also upgrade first if that's at all recommended, it's something I've been meaning to do anyway. -brian -- Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly. -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] path-name encodings
Bart Smaalders [EMAIL PROTECTED] wrote: OK, thanks. I still haven't got any answer to my original question, though. I.e., is there some way to know what text the filename is, or do I have to make a more or less wild guess what encoding the program that created the file used? How do you expect the filesystem to know this? Open(2) takes 3 args; none of them have anything to do with the encoding. A while ago, when discussing thing with some filesystem guys, I made the proposal to introduce a new syscall to inform the kernel about the locale coding used by a process. If the kernel (or filesystem) then like to store file names in a kernel-specific way and if there is a in-kernel libiconv, the kernel could convert from/to the userland view. A problem that remains is a userland coding that probably cannot represent all characters used inside the kernel view. There are two characters not allowed in filenames: NULL and '/'. Everything else is meaning imparted by the user, just like the contents of text documents. Platforms that insist in UTF-8 codinf for filenames often disallow octett codingd tha are not valid inside a UTF-8 character sequence. The OS doesn't care; the user does. If a user creates a file named ?? in his home directory, but my encoding doesn't contain these characters, what should ls -l display? You also assume that knowing the encoding will transfer meaning... but a directory containing files named ??, ??? and ?? may as well be line noise for most of us. The OS doesn't care one whit about language or encodings (save the optional upper/lower case accommodation for CIFS). The OS simply stores files under names that don't contain either '/' or NULL. UTF8 is the answer here. If you care about anything more than simple ascii and you work in more than a single locale/encoding, use UTF8. You may not understand the meaning of a filename, but at least you'll see the same characters as the person who wrote it. UTF-8 may be the answer for many but definitely not all problems. UTF-8 may make less problems in 5 years (if more people then use it) than the problem known with UTF-8 today. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What is likely the best way to accomplish this task?
On 3/4/08, Brian Hechinger [EMAIL PROTECTED] wrote: I realize I can't remove devices from a vdev, which, well, sucks and all, but I'm not going to complain about that. ;) I have 4x500G disks in a RAIDZ. I'd like to repurpose one of them as I'm finding that all that space isn't really needed and that one disk would serve me much better elsewhere (as the second half of a mirror in a machine going into colo). SYS1 124G 1.21T 29.9K /SYS1 the only other storage in the machine is the small 60G pool made up of the remains of the OS disks, which is not going to be enough space to hold this all while I rebuild the array. Is there any way that I could possibly get one of the disks out of the pool long enough to be used as temp space while I rebuild the array with just 3 disks? I'm thinking either to pull it out of the machine and scribble all over it in another machine so that it no longer has its ZFS bits on it. If this plan works, then it would just be a blank disk (although it will have the same dev id, so I don't know if zfs will pick it back up anyway). The other plan is to replace it with a file that resides on the 60G partition. This seems more likely to work, but I don't know how that'd work out with the fact that the file would be a lot smaller than the actual pool. Maybe a thinly provisioned dev would do the trick, as I could then make it look like 500G, but it would only really use what it needed to resync the pool. Is this logic all correct based on how ZFS works on snv_64a? I could also upgrade first if that's at all recommended, it's something I've been meaning to do anyway. -brian -- Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly. -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I'd suggest going the pull one disk route. Stick it into another machine, reformat it with gparted (http://gparted-livecd.tuxfamily.org/), and you're on your way. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What is likely the best way to accomplish this task?
have 4x500G disks in a RAIDZ. I'd like to repurpose [...] as the second half of a mirror in a machine going into colo. rsync or zfs send -R the 128G to the machine going to the colo if you need more space in colo, remove one disk faulting sys1 and add (stripe) it on colo (note: you will need to destroy the pool on colo after copying everything back to attach rather than add the disk going to colo) destroy and remake sys1 as 2+1 and copy it back. removing a vdev is coming, but it will be a hole vdev, ie: remove the 3+1 after you added a 2+1. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What is likely the best way to accomplish this task?
On Tue, Mar 04, 2008 at 09:48:05AM -0500, Rob Logan wrote: have 4x500G disks in a RAIDZ. I'd like to repurpose [...] as the second half of a mirror in a machine going into colo. rsync or zfs send -R the 128G to the machine going to the colo Yeah, that's the fallback plan, which I was trying to avoid, but, it might just be the way to go. -brian -- Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly. -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirroring to a smaller disk
Jonathan, On Tue, Mar 04, 2008 at 12:37:33AM -0800, Jonathan Loran wrote: I'm 'not sure I follow how this would work. The keyword here is thin provisioning. The sparse zvol only uses as much space as the actual data needs. So, if you use a sparse zvol, you may mirror to a smaller disk, iff you use as much space as is physically available to the sparse zvol. I do have tons of space on the old array. It's only 15% utilized, hence my original comment. How does my data get into the /test/old zvol (zpool foo)? What would I end up with. There's no zvol on foo. After detaching /test/old, you may reconfigure your old array. At that point, foo is on a zvol on the pool bar. In what way to get the data over depends on how your reconfiguration of the old array impacts the pool and vdev size. If it gets smaller, you cannot attach it to the pool where your data currently resides and have to go the send|receive route... Putting the zpool on a zvol permanently might not be something you want as this creates some overhead, I can't quantisize, and you mentioned some performance issues you're already experiencing. This seems a bit like black magic. Maybe that's what I need, eh? Feel the magic at http://www.cuddletech.com/blog/pivot/entry.php?id=729 Greetings, Patrick ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
[slightly different angle below...] Nathan Kroenert wrote: Hey, Bob, Though I have already got the answer I was looking for here, I thought I'd at least take the time to provide my point of view as to my *why*... First: I don't think any of us have forgotten the goodness that ZFS's checksum *can* bring. I'm also keenly aware that we have some customers running HDS / EMC boxes who disable the ZFS checksum by default because they 'don't want to have files break due to a single bit flip...' and they really don't care where the flip happens, and they don't want to 'waste' disks or bandwidth allowing ZFS to do it's own protection when they already pay for it inside their zillion dollar disk box. (Some say waste, some call it insurance... ;). Oracle users in particular seem to have this mindset, though that's another thread entirely. :) If you look at the zfs-discuss archives, you will find anecdotes of failing raid arrays (yes, even expensive ones) and SAN switches causing corruption which was detected by ZFS. A telltale sign of borken hardware is someone complaining that ZFS checksums are borken, only to find out their hardware is at fault. As for Oracle, modern releases of the Oracle database also have checksumming enabled by default, so there is some merit to the argument that ZFS checksums are redundant. IMNSHO, ZFS is not being designed to replace ASM. I'd suspect we don't hear people whining about single bit flips, because they would not know if it's happening unless the app sitting on top had it's own protection. Or - if the error is obvious, or crashes their system... Or if they were running ZFS, but at this stage, we cannot delineate between single bit or massively crapped out errors, so what's to say we are NOT seeing it? Also - Don't assume bit rot on disk is the only way we can get single bit errors. Considering that until very recently (and quite likely even now to a reasonable extent), most CPU's did not have data protection in *every* place data transited through, single bit flips are still a very real possibility, and becoming more likely as process shrinks continue. Granted, on CPU's with Register Parity protection, undetected doubles are more likely to 'slip under the radar', as registers are typically protected with parity at best, if at all... A single bit in the parity protected register will be detected, a double won't. It depends on the processor. Most of the modern SPARC processors have extensive error detection and correction inside. But processors are still different than memories in that the time a datum resides in a single location is quite short. We worry more about random data losses when the datum is stored in one place for a long time, which is why you see different sorts of data protection at the different layers of a system design. To put this in more mathematical terms, there is a failure rate for each failure mode, but your exposure to the failure mode is time bounded. It does seem that some of us are getting a little caught up in disks and their magnificence in what they write to the platter and read back, and overlooking the potential value of a simple (though potentially computationally expensive) circus trick, which might, just might, make your broken 1TB archive useful again... I don't think it's a good idea for us to assume that it's OK to 'leave out' potential goodness for the masses that want to use ZFS in non-enterprise environments like laptops / home PC's, or use commodity components in conjunction with the Big Stuff... (Like white box PC's connected to an EMC or HDS box... ) Anyhoo - I'm glad we have pretty much already done this work once before. It gives me hope that we'll see it make a comeback. ;) (And I look forward to Jeff Co developing a hyper cool way of generating 12800 checksums using all 64 threads of a Niagara 2, using the same source data in cache, so we don't need to hit memory, so that it happens in the blink of an eye. or two. ok - maybe three... ;) Maybe we could also use the SPU's as well... OK - So, I'm possibly dreaming here, but hell, if I'm dreaming, why not dream big. :) I sense that the requested behaviour here is to be able to get to the corrupted contents of a file, even if we know it is corrupted. I think this is a good idea because: 1. The block is what is corrupted, not necessarily my file. A single block may contain several files which are grouped together, checksummed, and written to disk. 2. The current behaviour of returning EIO when read()ing a file up to the (possible) corruption point is rather irritating, but probably the right thing to do. Since we know the files affected, we could write a savior, providing we can get some reasonable response other than EIO. As Jeff points out, I'm not sure that automatic repair is the right answer, but a manual savior might work better than
Re: [zfs-discuss] Mirroring to a smaller disk
Patrick Bachmann wrote: Jonathan, On Tue, Mar 04, 2008 at 12:37:33AM -0800, Jonathan Loran wrote: I'm 'not sure I follow how this would work. The keyword here is thin provisioning. The sparse zvol only uses as much space as the actual data needs. So, if you use a sparse zvol, you may mirror to a smaller disk, iff you use as much space as is physically available to the sparse zvol. I do have tons of space on the old array. It's only 15% utilized, hence my original comment. How does my data get into the /test/old zvol (zpool foo)? What would I end up with. There's no zvol on foo. After detaching /test/old, you may reconfigure your old array. At that point, foo is on a zvol on the pool bar. In what way to get the data over depends on how your reconfiguration of the old array impacts the pool and vdev size. If it gets smaller, you cannot attach it to the pool where your data currently resides and have to go the send|receive route... Putting the zpool on a zvol permanently might not be something you want as this creates some overhead, I can't quantisize, and you mentioned some performance issues you're already experiencing. Well, there's the rub. I will be reconfiguring the old array identical to the new one. It will be smaller. It's always something, isn't it. I have to say though, this is very slick and I can see this sparse zvol trick will be handy in the future. Thanks! Jon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?
On Tue, 4 Mar 2008, Richard Elling wrote: Also note: the checksums don't have enough information to recreate the data for very many bit changes. Hashes might, but I don't know anyone using sha256. It is indeed important to recognize that the checksums are a way to detect that the data is incorrect rather than a way to tell that the data is correct. There may be several permutations of wrong data which can result in the same checksum, but the probability of encountering those permutations due to natural causes is quite small. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] path-name encodings
[EMAIL PROTECTED] (Joerg Schilling) wrote: Marcus Sundman [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] (Joerg Schilling) wrote: [...] ISO-8859-1 (the low 8 bits of UNOICODE) [...] Unicode is not an encoding, but you probably mean the low 8 bits of UCS-2 or the first 256 codepoints in Unicode or somesuch. Unicode _is_ an encoding that uses 21 (IIRC) bits. AFAIK you are incorrect. Unicode is a standard that, among other things, defines a _number_ for each character. A number does not equal 21 bits, even if it so happens that the highest codepoint number in the current version is no more than 21 bits long. Unicode defines (at least) 3 encodings to represent those characters: UTF-8, UTF-16 and UTF-32. Well, it doesn't very much matter exactly how the terms are defined, as long as everybody knows what's what. So, I'm sorry for nitpicking. - Marcus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] path-name encodings
Marcus Sundman wrote: Bart Smaalders [EMAIL PROTECTED] wrote: UTF8 is the answer here. If you care about anything more than simple ascii and you work in more than a single locale/encoding, use UTF8. You may not understand the meaning of a filename, but at least you'll see the same characters as the person who wrote it. I think you are a bit confused. A) If you meant that _I_ should use UTF-8 then that alone won't help. Let's say the person who created the file used ISO-8859-1 and named it 'häst', i.e., 0x68e47374. If I then use UTF-8 when displaying the filename my program will be faced with the problem of what to do with the second byte, 0xe4, which can't be decoded using UTF-8. (häst is 0x68c3a47374 in UTF-8, in case someone wonders.) What I mean is very simple: The OS has no way of merging your various encodings. If I create a directory, and have people from around the world create a file in that directory named after themselves in their own character sets, what should I see when I invoke: % ls -l | less in that directory? If you wish to share filenames across locales, I suggest you and everyone else writing to that directory use an encoding that will work across all those locales. The encoding that works well for this on Unix systems is UTF8, since it leaves '/' and NULL alone. - Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts You will contribute more with mercurial than with thunderbird. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Ca use fordata corruption?
Also note: the checksums don't have enough information to recreate the data for very many bit changes. Hashes might, but I don't know anyone using sha256. My ~/Documents uses sha256 checksums, but then again, it also uses copies=2 :) -mg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] raidz in zfs questions
Guys i have 2 questions. 1. In zfs can you currently add more disks to an existing raidz? This is important to me as i slowly add disks to my system one at a time. 2. in a raidz do all the disks have to be the same size? This is the one thing that has always been a pain with a raid 5 as you need to split the disks into multiple raid 5's to use all the space. I never understood why the system to not do the same thing behind the scenes for you. EG 320gb x 4 and 1tb x 4. The ultimate is to only lose 1TB to parity instead of 1.32TB. To me it is not the space loss so much as the extra drive slot that i lose. instead i normally just create a raid5 of 320 x 8 and a rait 5 of 680 x 4. I would prefer just to create one raidz with all the disks and let the system work out the best way to get the most data out of it. I did do a test of the raidz with different size disks in a virtual machine and it would not let me create the raidz without using -F. once i created it with -F the pool size seems a little strange and i can't work out the combo of disks and space lost to parity to get the filesize it returned. Chris This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] List of ZFS patches to be released with Solaris 10 U5
On Mar 4, 2008, at 5:13 PM, Ben Grele wrote: Experts, Do you know where I could find the list of all the ZFS patches that will be released with Solaris 10 U5? My customer told me that they've seen such list for prior update releases. I've not been able to find anything like it in the usual places. Yes, something akin to George Wilson's post for s10u4 would be nice: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024516.html /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss