Re: normalizing filenames and strings
El vie, 06-04-2007 a las 21:37 +0200, Denis Jacquerye escribió: The filechooser doesn't use the existing filename if what is typed is canonically equivalent, even if autocompletion gets it right. I had already opened http://bugzilla.gnome.org/show_bug.cgi?id=421736 and http://bugzilla.gnome.org/show_bug.cgi?id=423242 Ah, bummer :) Do you want to take a stab at debugging this? I can guide you through the GtkFileChooser code if you get stuck. Federico ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On 4/5/07, Federico Mena Quintero [EMAIL PROTECTED] wrote: El jue, 29-03-2007 a las 09:52 +0200, Alexander Larsson escribió: On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote: [Those functions don't normalize currently; maybe they need to... do you have a reproducible bug?] NO! They should not normalize! Then you can't open a file that has an unnormalized filename. You mean 1. user types a UTF-8 filename to open it 2. the app does g_filename_from_utf8() and gets normalized 3. the file can't be opened because its filename on disk is not normalized ? But you would have to type the human-readable-filename in the first place, whereas normally one picks the file from a list (in which case there is no problem, since we know the filename-on-disk in advance). It *is* a problem for Save As (or anything that requires you to type a possibly-existing filename), since the file dialog (or whatever) needs to correlate the human-readable-filename with the filenames that are on disk. I really don't know if GtkFileChooserEntry and the file chooser in SAVE mode deal with this correctly. Denis, could you please test that case and file a bug against the file chooser if it doesn't work? The filechooser doesn't use the existing filename if what is typed is canonically equivalent, even if autocompletion gets it right. I had already opened http://bugzilla.gnome.org/show_bug.cgi?id=421736 and http://bugzilla.gnome.org/show_bug.cgi?id=423242 [Alex is right; the functions shouldn't normalize out of the box, at least not g_filename_from_utf8(). I don't really see a problem with g_filename_to_utf8() normalizing for the app's benefit, although this makes it more likely to uncover bugs with apps that try to round-trip filenames.] ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
El jue, 29-03-2007 a las 09:52 +0200, Alexander Larsson escribió: On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote: [Those functions don't normalize currently; maybe they need to... do you have a reproducible bug?] NO! They should not normalize! Then you can't open a file that has an unnormalized filename. You mean 1. user types a UTF-8 filename to open it 2. the app does g_filename_from_utf8() and gets normalized 3. the file can't be opened because its filename on disk is not normalized ? But you would have to type the human-readable-filename in the first place, whereas normally one picks the file from a list (in which case there is no problem, since we know the filename-on-disk in advance). It *is* a problem for Save As (or anything that requires you to type a possibly-existing filename), since the file dialog (or whatever) needs to correlate the human-readable-filename with the filenames that are on disk. I really don't know if GtkFileChooserEntry and the file chooser in SAVE mode deal with this correctly. Denis, could you please test that case and file a bug against the file chooser if it doesn't work? [Alex is right; the functions shouldn't normalize out of the box, at least not g_filename_from_utf8(). I don't really see a problem with g_filename_to_utf8() normalizing for the app's benefit, although this makes it more likely to uncover bugs with apps that try to round-trip filenames.] Federico ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Fri, 2007-03-30 at 13:39 +1000, Andrew Cowie wrote: On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote: g_filename_to_utf8() and g_filename_from_utf8() Just when I thought I was beginning to know something, an old hand like Frederico goes and describes something that is enormously powerful and _very_ well presented in the docs, that I just hadn't come across yet. I love hanging out in the GNOME community. I just looked through some of the GTK sources and I think that things like GtkFileChooser use g_filename_from_utf8() etc automatically. Is that true? Yes. The file chooser returns filenames in filename encoding, whereas it always gets utf8 from the GtkEntry it has. So, it has to use g_filename_from_utf8() internally. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander LarssonRed Hat, Inc [EMAIL PROTECTED][EMAIL PROTECTED] He's an immortal umbrella-wielding paramedic with a robot buddy named Sparky. She's a virginal blonde queen of the dead in the wrong place at the wrong time. They fight crime! ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Wed, 2007-03-28 at 21:40 +0200, Denis Jacquerye wrote: On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote: On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote: On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote: Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. If we don't normalize, users might have a hard time opening files from the command line. Filenames on disk can *never* *ever* be changed. They are byte strings and must be treated as such, otherwise you can't open or operate on the file they reference. However, when creating a *new* file, given a utf8 string as filename, we can normalize it before creating the file. For command line or invoked name, applications could test for the requested name; if inexistant, they should attempt with the canonically equivalent filenames existing. No, its never right to guess like this. It can lead to all sorts of problems, and it is a performance drag. File names are exact identifiers, not UI strings. So how should it be done? If I have a file é (precomposed) and I type é (composed), how is the existing file going to be opened? If the file already exists its rarely a problem. The typeahead matching in the file selector can do normalization, or you could click on the filename in the file selector. In the shell you could use tab completion. Of course, there is some situations where things can be tricky in the shell, like the case of an unnormalized single-character filename. But there are many other similar cases, like files with newlines in them, filenames that start with dashes (-rf comes to mind), or filenames that are also a uri. The more we have applications try to do magic with the passed in filename (like guessing if its a filename or a uri), the more strange corner case behaviours we get. Its up to the filesystem to handle filenames in a consistent way. If the filesystem normilizes, that is fine, but if it doesn't, we shouldn't try to work around it. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander LarssonRed Hat, Inc [EMAIL PROTECTED][EMAIL PROTECTED] He's a war-weary alcoholic senator with a secret. She's a high-kicking motormouth journalist who can talk to animals. They fight crime! ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote: If you have applications that don't work when you have Unicode filenames, it's a telltale sign that they are not using g_filename_to_utf8() and g_filename_from_utf8() correctly: http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings I'd be interested in knowing of applications that use this API properly, but that still have problems with composed/decomposed names. In that case, we may need to explicitly normalize inside those functions: they are the central point of change between UTF-8 (GNOME's encoding for strings) and the file system's own encoding. [Those functions don't normalize currently; maybe they need to... do you have a reproducible bug?] NO! They should not normalize! Then you can't open a file that has an unnormalized filename. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander LarssonRed Hat, Inc [EMAIL PROTECTED][EMAIL PROTECTED] He's an otherworldly hunchbacked hairdresser looking for 'the Big One.' She's a provocative insomniac archaeologist with the power to see death. They fight crime! ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Thu, 2007-03-29 at 09:52 +0200, Alexander Larsson wrote: On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote: If you have applications that don't work when you have Unicode filenames, it's a telltale sign that they are not using g_filename_to_utf8() and g_filename_from_utf8() correctly: http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings I'd be interested in knowing of applications that use this API properly, but that still have problems with composed/decomposed names. In that case, we may need to explicitly normalize inside those functions: they are the central point of change between UTF-8 (GNOME's encoding for strings) and the file system's own encoding. I think the expectation is false. GNOME apps shouldn't treat these filenames as equivalent, exactly like Makefile and makefile should be 2 different filenames, except when completing. [Those functions don't normalize currently; maybe they need to... do you have a reproducible bug?] NO! They should not normalize! Then you can't open a file that has an unnormalized filename. And that may introduce security bugs: if you transform a filename behind the user's back, you have to be pretty sure of your utf8 decoder - see http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt for examples of subtilities. Windows IIS even had a security bug where the attacker crafted an URL with special characters with alternative (illegal) unicode encoding, which wrongly passed safety tests but once canonicalized enabled access to system paths (and even remote command execution, cf http://seclists.org/bugtraq/2000/Oct/0271.html for details). Xav ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
Denis Jacquerye wrote: Most apps don't handle search/compare/sort properly. Actually, this point hasn't really been noticed in the discussion. (Denis has been diligently filing bugs for applications that do not search correctly.) For instance, gthumb used g_pattern_match_simple in some search functions. g_pattern_match_simple uses UTF8 arguments for the pattern and target strings, but it doesn't normalize them. It probably should, I think... - Mike ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote: g_filename_to_utf8() and g_filename_from_utf8() Just when I thought I was beginning to know something, an old hand like Frederico goes and describes something that is enormously powerful and _very_ well presented in the docs, that I just hadn't come across yet. I love hanging out in the GNOME community. I just looked through some of the GTK sources and I think that things like GtkFileChooser use g_filename_from_utf8() etc automatically. Is that true? AfC Sydney signature.asc Description: This is a digitally signed message part ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Tue, 2007-03-27 at 13:15 -0400, Dr. Michael J. Chudobiak wrote: Filenames could also be NFC normalized when created, although that's not absolutely necessary. It would be nice if gnome mandated a standard approach for normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 for info.) This could be fixed at a low level, in gtk filechooser for some cases or in apps. Gnome-vfs should handle that too. It would be nice if gnome-vfs could handle this in the background, so coders don't have to worry about uri escaping and normalization at the same time. (The existing normalization functions have to be used on unescaped URIs. It's already tricky enough keeping track of gnome-vfs escaping issues...) Its very hard and quite expensive to handle normalization automatically at the low level. You have to intercept every i/o operation, and it can introduce very strange behaviour (since we can't control whats already on the disk). We have to accept that unix filenames are strings of bytes and that we just cannot enforce any meaning on them (although we can do our best to try to make them some normalized form of utf8). For uri escaping I'm doing my best to make it not an issue in the new GVFS API that is to replace gnome-vfs. (By not using uris much in the API.) In practice i don't think there is an enourmous problem. Most files are either selected in the fileselector/filemanager (so we don't care about normalization, just the filename bytestring that was selected) or for new files, typed into the file selector.If the fileselector can do some normalization for typed-in names we shouldn't really in normal use cause any duplicate unnormalized filenames. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander LarssonRed Hat, Inc [EMAIL PROTECTED][EMAIL PROTECTED] He's a fiendish Republican photographer trapped in a world he never made. She's a strong-willed junkie Valkyrie with the power to see death. They fight crime! ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Wed, 2007-03-28 at 16:46 +0200, Alexander Larsson wrote: On Tue, 2007-03-27 at 13:15 -0400, Dr. Michael J. Chudobiak wrote: Filenames could also be NFC normalized when created, although that's not absolutely necessary. It would be nice if gnome mandated a standard approach for normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 for info.) This could be fixed at a low level, in gtk filechooser for some cases or in apps. Gnome-vfs should handle that too. It would be nice if gnome-vfs could handle this in the background, so coders don't have to worry about uri escaping and normalization at the same time. (The existing normalization functions have to be used on unescaped URIs. It's already tricky enough keeping track of gnome-vfs escaping issues...) Its very hard and quite expensive to handle normalization automatically at the low level. You have to intercept every i/o operation, and it can introduce very strange behaviour (since we can't control whats already on the disk). We have to accept that unix filenames are strings of bytes and that we just cannot enforce any meaning on them (although we can do our best to try to make them some normalized form of utf8). For uri escaping I'm doing my best to make it not an issue in the new GVFS API that is to replace gnome-vfs. (By not using uris much in the API.) In practice i don't think there is an enourmous problem. Most files are either selected in the fileselector/filemanager (so we don't care about normalization, just the filename bytestring that was selected) or for new files, typed into the file selector.If the fileselector can do some normalization for typed-in names we shouldn't really in normal use cause any duplicate unnormalized filenames. IMHO the only work needed to handle this is in all filename-selection widgets, which should do completion based on similar unicode names (like the fileselector does already for names differing only by case). Xav ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On 3/28/07, Xavier Bestel [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 16:46 +0200, Alexander Larsson wrote: In practice i don't think there is an enourmous problem. Most files are either selected in the fileselector/filemanager (so we don't care about normalization, just the filename bytestring that was selected) or for new files, typed into the file selector.If the fileselector can do some normalization for typed-in names we shouldn't really in normal use cause any duplicate unnormalized filenames. The more I think about it, the more I think filenames should be normalized with NFC when created, in addition with checking for existing equivalents for name conflict. The W3C's motivations to NFC as early as possible seem very reasonable. IMHO the only work needed to handle this is in all filename-selection widgets, which should do completion based on similar unicode names (like the fileselector does already for names differing only by case). The Gtk fileselector already offers canonically equivalent names with autocompletion thanks to GtkEntryCompletion already doing the right thing (here normalization and casefolding). Denis Moyogo Jacquerye ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote: On Wed, 2007-03-28 at 16:46 +0200, Alexander Larsson wrote: On Tue, 2007-03-27 at 13:15 -0400, Dr. Michael J. Chudobiak wrote: Filenames could also be NFC normalized when created, although that's not absolutely necessary. It would be nice if gnome mandated a standard approach for normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 for info.) This could be fixed at a low level, in gtk filechooser for some cases or in apps. Gnome-vfs should handle that too. It would be nice if gnome-vfs could handle this in the background, so coders don't have to worry about uri escaping and normalization at the same time. (The existing normalization functions have to be used on unescaped URIs. It's already tricky enough keeping track of gnome-vfs escaping issues...) Its very hard and quite expensive to handle normalization automatically at the low level. You have to intercept every i/o operation, and it can introduce very strange behaviour (since we can't control whats already on the disk). We have to accept that unix filenames are strings of bytes and that we just cannot enforce any meaning on them (although we can do our best to try to make them some normalized form of utf8). For uri escaping I'm doing my best to make it not an issue in the new GVFS API that is to replace gnome-vfs. (By not using uris much in the API.) In practice i don't think there is an enourmous problem. Most files are either selected in the fileselector/filemanager (so we don't care about normalization, just the filename bytestring that was selected) or for new files, typed into the file selector.If the fileselector can do some normalization for typed-in names we shouldn't really in normal use cause any duplicate unnormalized filenames. IMHO the only work needed to handle this is in all filename-selection widgets, which should do completion based on similar unicode names (like the fileselector does already for names differing only by case). Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. If we don't normalize, users might have a hard time opening files from the command line. If we do normalize, then people will pretty much never be able to open files that have unnormalized file names, which seems like a much more serious problem. -- Shaun ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote: On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote: Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. If we don't normalize, users might have a hard time opening files from the command line. Filenames on disk can *never* *ever* be changed. They are byte strings and must be treated as such, otherwise you can't open or operate on the file they reference. However, when creating a *new* file, given a utf8 string as filename, we can normalize it before creating the file. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander LarssonRed Hat, Inc [EMAIL PROTECTED][EMAIL PROTECTED] He's a globe-trotting moralistic grifter on the run. She's a tortured kleptomaniac soap star with her own daytime radio talk show. They fight crime! ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote: On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote: Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. If we don't normalize, users might have a hard time opening files from the command line. Filenames on disk can *never* *ever* be changed. They are byte strings and must be treated as such, otherwise you can't open or operate on the file they reference. However, when creating a *new* file, given a utf8 string as filename, we can normalize it before creating the file. For command line or invoked name, applications could test for the requested name; if inexistant, they should attempt with the canonically equivalent filenames existing. Denis Moyogo Jacquerye ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
Le mercredi 28 mars 2007 à 11:50 -0500, Shaun McCance a écrit : Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? I don't know, I always use bash completion, which should avoid the need to use canonical-encoding - except, of course, in the example which started this thread (1st char is the problematic one). Well, copy/paste to the shell should solve that. IMHO this is more a bash problem than a GNOME problem: it will happen with /bin/rm too. Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. In the nautilus-to-eog case, there should be no problem because the filename shouldn't be touched in between, be it normalized or not. If we don't normalize, users might have a hard time opening files from the command line. Sure. If we do normalize, then people will pretty much never be able to open files that have unnormalized file names, which seems like a much more serious problem. Not if fileselectors autocomplete correctly: they are currently more-or-less case-insensitive, they just have to become utf8-flavor-insensitive. Xav ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote: On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote: On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote: Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. If we don't normalize, users might have a hard time opening files from the command line. Filenames on disk can *never* *ever* be changed. They are byte strings and must be treated as such, otherwise you can't open or operate on the file they reference. However, when creating a *new* file, given a utf8 string as filename, we can normalize it before creating the file. For command line or invoked name, applications could test for the requested name; if inexistant, they should attempt with the canonically equivalent filenames existing. No, its never right to guess like this. It can lead to all sorts of problems, and it is a performance drag. File names are exact identifiers, not UI strings. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander LarssonRed Hat, Inc [EMAIL PROTECTED][EMAIL PROTECTED] He's an underprivileged albino rock star with a mysterious suitcase handcuffed to his arm. She's a strong-willed Buddhist politician with an MBA from Harvard. They fight crime! ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote: On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote: On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote: Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. If we don't normalize, users might have a hard time opening files from the command line. Filenames on disk can *never* *ever* be changed. They are byte strings and must be treated as such, otherwise you can't open or operate on the file they reference. However, when creating a *new* file, given a utf8 string as filename, we can normalize it before creating the file. For command line or invoked name, applications could test for the requested name; if inexistant, they should attempt with the canonically equivalent filenames existing. No, its never right to guess like this. It can lead to all sorts of problems, and it is a performance drag. File names are exact identifiers, not UI strings. So how should it be done? If I have a file é (precomposed) and I type é (composed), how is the existing file going to be opened? Denis Moyogo Jacquerye ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
Le mercredi 28 mars 2007 à 21:40 +0200, Denis Jacquerye a écrit : On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote: On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote: On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote: Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. If we don't normalize, users might have a hard time opening files from the command line. Filenames on disk can *never* *ever* be changed. They are byte strings and must be treated as such, otherwise you can't open or operate on the file they reference. However, when creating a *new* file, given a utf8 string as filename, we can normalize it before creating the file. For command line or invoked name, applications could test for the requested name; if inexistant, they should attempt with the canonically equivalent filenames existing. No, its never right to guess like this. It can lead to all sorts of problems, and it is a performance drag. File names are exact identifiers, not UI strings. So how should it be done? If I have a file é (precomposed) and I type é (composed), how is the existing file going to be opened? You don't. Command-line applications don't second-guess their target. Xav ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
On 3/28/07, Xavier Bestel [EMAIL PROTECTED] wrote: Le mercredi 28 mars 2007 à 21:40 +0200, Denis Jacquerye a écrit : On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote: On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote: On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote: On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote: Most applications that operate on files will accept file name arguments when invoked. What are we supposed to do with these? Bear in mind that the argument isn't only used by shell junkies. It's also used when, for example, you double-click a JPG to open EOG. Nautilus passes the file name to EOG. If we don't normalize, users might have a hard time opening files from the command line. Filenames on disk can *never* *ever* be changed. They are byte strings and must be treated as such, otherwise you can't open or operate on the file they reference. However, when creating a *new* file, given a utf8 string as filename, we can normalize it before creating the file. For command line or invoked name, applications could test for the requested name; if inexistant, they should attempt with the canonically equivalent filenames existing. No, its never right to guess like this. It can lead to all sorts of problems, and it is a performance drag. File names are exact identifiers, not UI strings. So how should it be done? If I have a file é (precomposed) and I type é (composed), how is the existing file going to be opened? You don't. Command-line applications don't second-guess their target. Interestingly enough Mac OS X normalizes (NFD) at the filesystem level. http://developer.apple.com/qa/qa2001/qa1235.html Denis Moyogo Jacquerye ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
El mar, 27-03-2007 a las 18:55 +0200, Denis Jacquerye escribió: There's an interesting Unicode bug with all gnome apps at the moment. Canonically equivalent file names are not considered equivalent by applications. If you have applications that don't work when you have Unicode filenames, it's a telltale sign that they are not using g_filename_to_utf8() and g_filename_from_utf8() correctly: http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings I'd be interested in knowing of applications that use this API properly, but that still have problems with composed/decomposed names. In that case, we may need to explicitly normalize inside those functions: they are the central point of change between UTF-8 (GNOME's encoding for strings) and the file system's own encoding. [Those functions don't normalize currently; maybe they need to... do you have a reproducible bug?] Federico ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list
Re: normalizing filenames and strings
Filenames could also be NFC normalized when created, although that's not absolutely necessary. It would be nice if gnome mandated a standard approach for normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 for info.) This could be fixed at a low level, in gtk filechooser for some cases or in apps. Gnome-vfs should handle that too. It would be nice if gnome-vfs could handle this in the background, so coders don't have to worry about uri escaping and normalization at the same time. (The existing normalization functions have to be used on unescaped URIs. It's already tricky enough keeping track of gnome-vfs escaping issues...) - Mike ___ desktop-devel-list mailing list desktop-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/desktop-devel-list