On Feb 17, 2009, at 11:03 AM, Jan-Benedict Glaw wrote: > This happens. ... > So technically, "encoding" is a per-file property on some > filesystems (those that don't care about a filename's contents, as > long as it doesn't contain the directory delimiter (typically '/' > or '\\') or the '\0' (end of string)).
Ugh. This would be fine if the filesystem stored and provided the information about what encoding was used for each name, but I'm betting they don't do that. :-) So, what should Tahoe do? 1. Always treat filenames as opaque blobs. This means Tahoe is losing information that some filesystems (e.g. NTFS) provide, and making it harder for users on the other side of Tahoe to unambiguously decode those filenames. 2. If the filesystem guarantees a specific encoding, use that one, else treat the filename as an opaque blob. 3. If the filesystem guarantees a specific encoding, use that one, else if it provides a "default" encoding, then try to decode with that one, and if decoding fails then reject the filename and ask the user to fix it up. 3.b. ... and if decoding fails then treat the filename as an opaque blob. 3.c. ... and if decoding fails then try to decode it with a few dozen of our favorite encodings in descending order of popularity ... 4. Any other options? Thanks! Zooko _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
