Re: normalizing filenames and strings

2007-04-11 Thread Federico Mena Quintero
El vie, 06-04-2007 a las 21:37 +0200, Denis Jacquerye escribió:

 The filechooser doesn't use the existing filename if what is typed is
 canonically equivalent, even if autocompletion gets it right.
 I had already opened
 http://bugzilla.gnome.org/show_bug.cgi?id=421736 and
 http://bugzilla.gnome.org/show_bug.cgi?id=423242

Ah, bummer :)

Do you want to take a stab at debugging this?  I can guide you through
the GtkFileChooser code if you get stuck.

  Federico

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-04-06 Thread Denis Jacquerye
On 4/5/07, Federico Mena Quintero [EMAIL PROTECTED] wrote:
 El jue, 29-03-2007 a las 09:52 +0200, Alexander Larsson escribió:
  On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote:
  
   [Those functions don't normalize currently; maybe they need to... do you
   have a reproducible bug?]
 
  NO! They should not normalize! Then you can't open a file that has an
  unnormalized filename.

 You mean

 1. user types a UTF-8 filename to open it
 2. the app does g_filename_from_utf8() and gets normalized
 3. the file can't be opened because its filename on disk is not normalized

 ?

 But you would have to type the human-readable-filename in the first
 place, whereas normally one picks the file from a list (in which case
 there is no problem, since we know the filename-on-disk in advance).

 It *is* a problem for Save As (or anything that requires you to type a
 possibly-existing filename), since the file dialog (or whatever) needs
 to correlate the human-readable-filename with the filenames that are on
 disk.  I really don't know if GtkFileChooserEntry and the file chooser
 in SAVE mode deal with this correctly.

 Denis, could you please test that case and file a bug against the file
 chooser if it doesn't work?

The filechooser doesn't use the existing filename if what is typed is
canonically equivalent, even if autocompletion gets it right.
I had already opened
http://bugzilla.gnome.org/show_bug.cgi?id=421736 and
http://bugzilla.gnome.org/show_bug.cgi?id=423242

 [Alex is right; the functions shouldn't normalize out of the box, at
 least not g_filename_from_utf8().  I don't really see a problem with
 g_filename_to_utf8() normalizing for the app's benefit, although this
 makes it more likely to uncover bugs with apps that try to round-trip
 filenames.]
___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-04-04 Thread Federico Mena Quintero
El jue, 29-03-2007 a las 09:52 +0200, Alexander Larsson escribió:
 On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote:
  
  [Those functions don't normalize currently; maybe they need to... do you
  have a reproducible bug?]
 
 NO! They should not normalize! Then you can't open a file that has an
 unnormalized filename.

You mean

1. user types a UTF-8 filename to open it
2. the app does g_filename_from_utf8() and gets normalized
3. the file can't be opened because its filename on disk is not normalized

?

But you would have to type the human-readable-filename in the first
place, whereas normally one picks the file from a list (in which case
there is no problem, since we know the filename-on-disk in advance).

It *is* a problem for Save As (or anything that requires you to type a
possibly-existing filename), since the file dialog (or whatever) needs
to correlate the human-readable-filename with the filenames that are on
disk.  I really don't know if GtkFileChooserEntry and the file chooser
in SAVE mode deal with this correctly.

Denis, could you please test that case and file a bug against the file
chooser if it doesn't work?

[Alex is right; the functions shouldn't normalize out of the box, at
least not g_filename_from_utf8().  I don't really see a problem with
g_filename_to_utf8() normalizing for the app's benefit, although this
makes it more likely to uncover bugs with apps that try to round-trip
filenames.]

 Federico

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-03-30 Thread Alexander Larsson
On Fri, 2007-03-30 at 13:39 +1000, Andrew Cowie wrote:
 On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote:
  g_filename_to_utf8() and g_filename_from_utf8()
 
 Just when I thought I was beginning to know something, an old hand like
 Frederico goes and describes something that is enormously powerful and
 _very_ well presented in the docs, that I just hadn't come across yet. I
 love hanging out in the GNOME community.
 
 I just looked through some of the GTK sources and I think that things
 like GtkFileChooser use g_filename_from_utf8() etc automatically. Is
 that true?

Yes. The file chooser returns filenames in filename encoding, whereas it
always gets utf8 from the GtkEntry it has. So, it has to use
g_filename_from_utf8() internally.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander LarssonRed Hat, Inc 
   [EMAIL PROTECTED][EMAIL PROTECTED] 
He's an immortal umbrella-wielding paramedic with a robot buddy named Sparky. 
She's a virginal blonde queen of the dead in the wrong place at the wrong 
time. They fight crime! 

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-29 Thread Alexander Larsson
On Wed, 2007-03-28 at 21:40 +0200, Denis Jacquerye wrote:
 On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
  On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote:
   On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote:
 On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
 Most applications that operate on files will accept file name
 arguments when invoked.  What are we supposed to do with these?
 Bear in mind that the argument isn't only used by shell junkies.
 It's also used when, for example, you double-click a JPG to open
 EOG.  Nautilus passes the file name to EOG.

 If we don't normalize, users might have a hard time opening
 files from the command line.
   
Filenames on disk can *never* *ever* be changed. They are byte strings
and must be treated as such, otherwise you can't open or operate on the
file they reference.
   
However, when creating a *new* file, given a utf8 string as filename, we
can normalize it before creating the file.
  
   For command line or invoked name, applications could test for the
   requested name; if inexistant, they should attempt with the
   canonically equivalent filenames existing.
 
  No, its never right to guess like this. It can lead to all sorts of
  problems, and it is a performance drag. File names are exact
  identifiers, not UI strings.
 
 So how should it be done? If I have a file é (precomposed) and I
 type é (composed), how is the existing file going to be opened?

If the file already exists its rarely a problem. The typeahead matching
in the file selector can do normalization, or you could click on the
filename in the file selector. In the shell you could use tab
completion.

Of course, there is some situations where things can be tricky in the
shell, like the case of an unnormalized single-character filename. But
there are many other similar cases, like files with newlines in them,
filenames that start with dashes (-rf comes to mind), or filenames
that are also a uri. The more we have applications try to do magic
with the passed in filename (like guessing if its a filename or a uri),
the more strange corner case behaviours we get.

Its up to the filesystem to handle filenames in a consistent way. If the
filesystem normilizes, that is fine, but if it doesn't, we shouldn't try
to work around it.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander LarssonRed Hat, Inc 
   [EMAIL PROTECTED][EMAIL PROTECTED] 
He's a war-weary alcoholic senator with a secret. She's a high-kicking 
motormouth journalist who can talk to animals. They fight crime! 

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-03-29 Thread Alexander Larsson
On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote:

 If you have applications that don't work when you have Unicode
 filenames, it's a telltale sign that they are not using
 g_filename_to_utf8() and g_filename_from_utf8() correctly:
 
 http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings
 
 I'd be interested in knowing of applications that use this API properly,
 but that still have problems with composed/decomposed names.  In that
 case, we may need to explicitly normalize inside those functions:  they
 are the central point of change between UTF-8 (GNOME's encoding for
 strings) and the file system's own encoding.
 
 [Those functions don't normalize currently; maybe they need to... do you
 have a reproducible bug?]

NO! They should not normalize! Then you can't open a file that has an
unnormalized filename.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander LarssonRed Hat, Inc 
   [EMAIL PROTECTED][EMAIL PROTECTED] 
He's an otherworldly hunchbacked hairdresser looking for 'the Big One.' She's 
a provocative insomniac archaeologist with the power to see death. They fight 
crime! 

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-29 Thread Xavier Bestel
On Thu, 2007-03-29 at 09:52 +0200, Alexander Larsson wrote:
 On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote:
 
  If you have applications that don't work when you have Unicode
  filenames, it's a telltale sign that they are not using
  g_filename_to_utf8() and g_filename_from_utf8() correctly:
  
  http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings
  
  I'd be interested in knowing of applications that use this API properly,
  but that still have problems with composed/decomposed names.  In that
  case, we may need to explicitly normalize inside those functions:  they
  are the central point of change between UTF-8 (GNOME's encoding for
  strings) and the file system's own encoding.

I think the expectation is false. GNOME apps shouldn't treat these
filenames as equivalent, exactly like Makefile and makefile should
be 2 different filenames, except when completing.

  [Those functions don't normalize currently; maybe they need to... do you
  have a reproducible bug?]
 
 NO! They should not normalize! Then you can't open a file that has an
 unnormalized filename.

And that may introduce security bugs: if you transform a filename behind
the user's back, you have to be pretty sure of your utf8 decoder - see
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt for examples
of subtilities.
Windows IIS even had a security bug where the attacker crafted an URL
with special characters with alternative (illegal) unicode encoding,
which wrongly passed safety tests but once canonicalized enabled access
to system paths (and even remote command execution, cf
http://seclists.org/bugtraq/2000/Oct/0271.html for details).

Xav


___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-29 Thread Dr. Michael J. Chudobiak
Denis Jacquerye wrote:
 Most apps don't handle search/compare/sort properly.

Actually, this point hasn't really been noticed in the discussion. 
(Denis has been diligently filing bugs for applications that do not 
search correctly.)

For instance, gthumb used g_pattern_match_simple in some search 
functions. g_pattern_match_simple uses UTF8 arguments for the pattern 
and target strings, but it doesn't normalize them. It probably should, I 
think...

- Mike
___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-29 Thread Andrew Cowie
On Wed, 2007-03-28 at 19:52 -0600, Federico Mena Quintero wrote:
 g_filename_to_utf8() and g_filename_from_utf8()

Just when I thought I was beginning to know something, an old hand like
Frederico goes and describes something that is enormously powerful and
_very_ well presented in the docs, that I just hadn't come across yet. I
love hanging out in the GNOME community.

I just looked through some of the GTK sources and I think that things
like GtkFileChooser use g_filename_from_utf8() etc automatically. Is
that true?

AfC
Sydney



signature.asc
Description: This is a digitally signed message part
___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-03-28 Thread Alexander Larsson
On Tue, 2007-03-27 at 13:15 -0400, Dr. Michael J. Chudobiak wrote:
  Filenames could also be NFC normalized when created, although that's
  not absolutely necessary.
 
 It would be nice if gnome mandated a standard approach for 
 normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 
 for info.)
 
  This could be fixed at a low level, in gtk filechooser for some cases
  or in apps. Gnome-vfs should handle that too.
 
 It would be nice if gnome-vfs could handle this in the background, so 
 coders don't have to worry about uri escaping and normalization at the 
 same time. (The existing normalization functions have to be used on 
 unescaped URIs. It's already tricky enough keeping track of gnome-vfs 
 escaping issues...)

Its very hard and quite expensive to handle normalization automatically
at the low level. You have to intercept every i/o operation, and it can
introduce very strange behaviour (since we can't control whats already
on the disk). We have to accept that unix filenames are strings of bytes
and that we just cannot enforce any meaning on them (although we can do
our best to try to make them some normalized form of utf8).

For uri escaping I'm doing my best to make it not an issue in the new
GVFS API that is to replace gnome-vfs. (By not using uris much in the
API.)

In practice i don't think there is an enourmous problem. Most files are
either selected in the fileselector/filemanager (so we don't care about
normalization, just the filename bytestring that was selected) or for
new files, typed into the file selector.If the fileselector can do some
normalization for typed-in names we shouldn't really in normal use cause
any duplicate unnormalized filenames.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander LarssonRed Hat, Inc 
   [EMAIL PROTECTED][EMAIL PROTECTED] 
He's a fiendish Republican photographer trapped in a world he never made. 
She's a strong-willed junkie Valkyrie with the power to see death. They fight 
crime! 

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-28 Thread Xavier Bestel
On Wed, 2007-03-28 at 16:46 +0200, Alexander Larsson wrote:
 On Tue, 2007-03-27 at 13:15 -0400, Dr. Michael J. Chudobiak wrote:
   Filenames could also be NFC normalized when created, although that's
   not absolutely necessary.
  
  It would be nice if gnome mandated a standard approach for 
  normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 
  for info.)
  
   This could be fixed at a low level, in gtk filechooser for some cases
   or in apps. Gnome-vfs should handle that too.
  
  It would be nice if gnome-vfs could handle this in the background, so 
  coders don't have to worry about uri escaping and normalization at the 
  same time. (The existing normalization functions have to be used on 
  unescaped URIs. It's already tricky enough keeping track of gnome-vfs 
  escaping issues...)
 
 Its very hard and quite expensive to handle normalization automatically
 at the low level. You have to intercept every i/o operation, and it can
 introduce very strange behaviour (since we can't control whats already
 on the disk). We have to accept that unix filenames are strings of bytes
 and that we just cannot enforce any meaning on them (although we can do
 our best to try to make them some normalized form of utf8).
 
 For uri escaping I'm doing my best to make it not an issue in the new
 GVFS API that is to replace gnome-vfs. (By not using uris much in the
 API.)
 
 In practice i don't think there is an enourmous problem. Most files are
 either selected in the fileselector/filemanager (so we don't care about
 normalization, just the filename bytestring that was selected) or for
 new files, typed into the file selector.If the fileselector can do some
 normalization for typed-in names we shouldn't really in normal use cause
 any duplicate unnormalized filenames.

IMHO the only work needed to handle this is in all filename-selection
widgets, which should do completion based on similar unicode names (like
the fileselector does already for names differing only by case).

Xav


___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-28 Thread Denis Jacquerye
On 3/28/07, Xavier Bestel [EMAIL PROTECTED] wrote:
 On Wed, 2007-03-28 at 16:46 +0200, Alexander Larsson wrote:
  In practice i don't think there is an enourmous problem. Most files are
  either selected in the fileselector/filemanager (so we don't care about
  normalization, just the filename bytestring that was selected) or for
  new files, typed into the file selector.If the fileselector can do some
  normalization for typed-in names we shouldn't really in normal use cause
  any duplicate unnormalized filenames.

The more I think about it, the more I think filenames should be
normalized with NFC when created, in addition with checking for
existing equivalents for name conflict. The W3C's motivations to NFC
as early as possible seem very reasonable.

 IMHO the only work needed to handle this is in all filename-selection
 widgets, which should do completion based on similar unicode names (like
 the fileselector does already for names differing only by case).

The Gtk fileselector already offers canonically equivalent names with
autocompletion thanks to GtkEntryCompletion already doing the right
thing (here normalization and casefolding).

Denis Moyogo Jacquerye
___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-28 Thread Shaun McCance
On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
 On Wed, 2007-03-28 at 16:46 +0200, Alexander Larsson wrote:
  On Tue, 2007-03-27 at 13:15 -0400, Dr. Michael J. Chudobiak wrote:
Filenames could also be NFC normalized when created, although that's
not absolutely necessary.
   
   It would be nice if gnome mandated a standard approach for 
   normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 
   for info.)
   
This could be fixed at a low level, in gtk filechooser for some cases
or in apps. Gnome-vfs should handle that too.
   
   It would be nice if gnome-vfs could handle this in the background, so 
   coders don't have to worry about uri escaping and normalization at the 
   same time. (The existing normalization functions have to be used on 
   unescaped URIs. It's already tricky enough keeping track of gnome-vfs 
   escaping issues...)
  
  Its very hard and quite expensive to handle normalization automatically
  at the low level. You have to intercept every i/o operation, and it can
  introduce very strange behaviour (since we can't control whats already
  on the disk). We have to accept that unix filenames are strings of bytes
  and that we just cannot enforce any meaning on them (although we can do
  our best to try to make them some normalized form of utf8).
  
  For uri escaping I'm doing my best to make it not an issue in the new
  GVFS API that is to replace gnome-vfs. (By not using uris much in the
  API.)
  
  In practice i don't think there is an enourmous problem. Most files are
  either selected in the fileselector/filemanager (so we don't care about
  normalization, just the filename bytestring that was selected) or for
  new files, typed into the file selector.If the fileselector can do some
  normalization for typed-in names we shouldn't really in normal use cause
  any duplicate unnormalized filenames.
 
 IMHO the only work needed to handle this is in all filename-selection
 widgets, which should do completion based on similar unicode names (like
 the fileselector does already for names differing only by case).

Most applications that operate on files will accept file name
arguments when invoked.  What are we supposed to do with these?
Bear in mind that the argument isn't only used by shell junkies.
It's also used when, for example, you double-click a JPG to open
EOG.  Nautilus passes the file name to EOG.

If we don't normalize, users might have a hard time opening
files from the command line.

If we do normalize, then people will pretty much never be able
to open files that have unnormalized file names, which seems
like a much more serious problem.

--
Shaun


___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-28 Thread Alexander Larsson
On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote:
 On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
 Most applications that operate on files will accept file name
 arguments when invoked.  What are we supposed to do with these?
 Bear in mind that the argument isn't only used by shell junkies.
 It's also used when, for example, you double-click a JPG to open
 EOG.  Nautilus passes the file name to EOG.
 
 If we don't normalize, users might have a hard time opening
 files from the command line.

Filenames on disk can *never* *ever* be changed. They are byte strings
and must be treated as such, otherwise you can't open or operate on the
file they reference.

However, when creating a *new* file, given a utf8 string as filename, we
can normalize it before creating the file.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander LarssonRed Hat, Inc 
   [EMAIL PROTECTED][EMAIL PROTECTED] 
He's a globe-trotting moralistic grifter on the run. She's a tortured 
kleptomaniac soap star with her own daytime radio talk show. They fight crime! 

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-28 Thread Denis Jacquerye
On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
 On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote:
  On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
  Most applications that operate on files will accept file name
  arguments when invoked.  What are we supposed to do with these?
  Bear in mind that the argument isn't only used by shell junkies.
  It's also used when, for example, you double-click a JPG to open
  EOG.  Nautilus passes the file name to EOG.
 
  If we don't normalize, users might have a hard time opening
  files from the command line.

 Filenames on disk can *never* *ever* be changed. They are byte strings
 and must be treated as such, otherwise you can't open or operate on the
 file they reference.

 However, when creating a *new* file, given a utf8 string as filename, we
 can normalize it before creating the file.

For command line or invoked name, applications could test for the
requested name; if inexistant, they should attempt with the
canonically equivalent filenames existing.

Denis Moyogo Jacquerye
___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-28 Thread Xavier Bestel
Le mercredi 28 mars 2007 à 11:50 -0500, Shaun McCance a écrit :
 Most applications that operate on files will accept file name
 arguments when invoked.  What are we supposed to do with these?

I don't know, I always use bash completion, which should avoid the need
to use canonical-encoding - except, of course, in the example which
started this thread (1st char is the problematic one). Well, copy/paste
to the shell should solve that.
IMHO this is more a bash problem than a GNOME problem: it will happen
with /bin/rm too.

 Bear in mind that the argument isn't only used by shell junkies.
 It's also used when, for example, you double-click a JPG to open
 EOG.  Nautilus passes the file name to EOG.

In the nautilus-to-eog case, there should be no problem because the
filename shouldn't be touched in between, be it normalized or not.

 If we don't normalize, users might have a hard time opening
 files from the command line.

Sure.

 If we do normalize, then people will pretty much never be able
 to open files that have unnormalized file names, which seems
 like a much more serious problem. 

Not if fileselectors autocomplete correctly: they are currently
more-or-less case-insensitive, they just have to become
utf8-flavor-insensitive.

Xav


___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-28 Thread Alexander Larsson
On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote:
 On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
  On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote:
   On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
   Most applications that operate on files will accept file name
   arguments when invoked.  What are we supposed to do with these?
   Bear in mind that the argument isn't only used by shell junkies.
   It's also used when, for example, you double-click a JPG to open
   EOG.  Nautilus passes the file name to EOG.
  
   If we don't normalize, users might have a hard time opening
   files from the command line.
 
  Filenames on disk can *never* *ever* be changed. They are byte strings
  and must be treated as such, otherwise you can't open or operate on the
  file they reference.
 
  However, when creating a *new* file, given a utf8 string as filename, we
  can normalize it before creating the file.
 
 For command line or invoked name, applications could test for the
 requested name; if inexistant, they should attempt with the
 canonically equivalent filenames existing.

No, its never right to guess like this. It can lead to all sorts of
problems, and it is a performance drag. File names are exact
identifiers, not UI strings.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander LarssonRed Hat, Inc 
   [EMAIL PROTECTED][EMAIL PROTECTED] 
He's an underprivileged albino rock star with a mysterious suitcase handcuffed 
to his arm. She's a strong-willed Buddhist politician with an MBA from 
Harvard. They fight crime! 

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list


Re: normalizing filenames and strings

2007-03-28 Thread Denis Jacquerye
On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
 On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote:
  On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
   On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote:
On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
Most applications that operate on files will accept file name
arguments when invoked.  What are we supposed to do with these?
Bear in mind that the argument isn't only used by shell junkies.
It's also used when, for example, you double-click a JPG to open
EOG.  Nautilus passes the file name to EOG.
   
If we don't normalize, users might have a hard time opening
files from the command line.
  
   Filenames on disk can *never* *ever* be changed. They are byte strings
   and must be treated as such, otherwise you can't open or operate on the
   file they reference.
  
   However, when creating a *new* file, given a utf8 string as filename, we
   can normalize it before creating the file.
 
  For command line or invoked name, applications could test for the
  requested name; if inexistant, they should attempt with the
  canonically equivalent filenames existing.

 No, its never right to guess like this. It can lead to all sorts of
 problems, and it is a performance drag. File names are exact
 identifiers, not UI strings.

So how should it be done? If I have a file é (precomposed) and I
type é (composed), how is the existing file going to be opened?

Denis Moyogo Jacquerye
___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-03-28 Thread Xavier Bestel
Le mercredi 28 mars 2007 à 21:40 +0200, Denis Jacquerye a écrit :
 On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
  On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote:
   On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote:
 On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
 Most applications that operate on files will accept file name
 arguments when invoked.  What are we supposed to do with these?
 Bear in mind that the argument isn't only used by shell junkies.
 It's also used when, for example, you double-click a JPG to open
 EOG.  Nautilus passes the file name to EOG.

 If we don't normalize, users might have a hard time opening
 files from the command line.
   
Filenames on disk can *never* *ever* be changed. They are byte strings
and must be treated as such, otherwise you can't open or operate on the
file they reference.
   
However, when creating a *new* file, given a utf8 string as filename, we
can normalize it before creating the file.
  
   For command line or invoked name, applications could test for the
   requested name; if inexistant, they should attempt with the
   canonically equivalent filenames existing.
 
  No, its never right to guess like this. It can lead to all sorts of
  problems, and it is a performance drag. File names are exact
  identifiers, not UI strings.
 
 So how should it be done? If I have a file é (precomposed) and I
 type é (composed), how is the existing file going to be opened?

You don't. Command-line applications don't second-guess their target.

Xav


___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-03-28 Thread Denis Jacquerye
On 3/28/07, Xavier Bestel [EMAIL PROTECTED] wrote:
 Le mercredi 28 mars 2007 à 21:40 +0200, Denis Jacquerye a écrit :
  On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
   On Wed, 2007-03-28 at 19:43 +0200, Denis Jacquerye wrote:
On 3/28/07, Alexander Larsson [EMAIL PROTECTED] wrote:
 On Wed, 2007-03-28 at 11:50 -0500, Shaun McCance wrote:
  On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
  Most applications that operate on files will accept file name
  arguments when invoked.  What are we supposed to do with these?
  Bear in mind that the argument isn't only used by shell junkies.
  It's also used when, for example, you double-click a JPG to open
  EOG.  Nautilus passes the file name to EOG.
 
  If we don't normalize, users might have a hard time opening
  files from the command line.

 Filenames on disk can *never* *ever* be changed. They are byte strings
 and must be treated as such, otherwise you can't open or operate on 
 the
 file they reference.

 However, when creating a *new* file, given a utf8 string as filename, 
 we
 can normalize it before creating the file.
   
For command line or invoked name, applications could test for the
requested name; if inexistant, they should attempt with the
canonically equivalent filenames existing.
  
   No, its never right to guess like this. It can lead to all sorts of
   problems, and it is a performance drag. File names are exact
   identifiers, not UI strings.
 
  So how should it be done? If I have a file é (precomposed) and I
  type é (composed), how is the existing file going to be opened?

 You don't. Command-line applications don't second-guess their target.

Interestingly enough Mac OS X normalizes (NFD) at the filesystem level.
http://developer.apple.com/qa/qa2001/qa1235.html

Denis Moyogo Jacquerye
___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-03-28 Thread Federico Mena Quintero
El mar, 27-03-2007 a las 18:55 +0200, Denis Jacquerye escribió:

 There's an interesting Unicode bug with all gnome apps at the moment.
 Canonically equivalent file names are not considered equivalent by
 applications.

If you have applications that don't work when you have Unicode
filenames, it's a telltale sign that they are not using
g_filename_to_utf8() and g_filename_from_utf8() correctly:

http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings

I'd be interested in knowing of applications that use this API properly,
but that still have problems with composed/decomposed names.  In that
case, we may need to explicitly normalize inside those functions:  they
are the central point of change between UTF-8 (GNOME's encoding for
strings) and the file system's own encoding.

[Those functions don't normalize currently; maybe they need to... do you
have a reproducible bug?]

  Federico

___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list

Re: normalizing filenames and strings

2007-03-27 Thread Dr. Michael J. Chudobiak
 Filenames could also be NFC normalized when created, although that's
 not absolutely necessary.

It would be nice if gnome mandated a standard approach for 
normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 
for info.)

 This could be fixed at a low level, in gtk filechooser for some cases
 or in apps. Gnome-vfs should handle that too.

It would be nice if gnome-vfs could handle this in the background, so 
coders don't have to worry about uri escaping and normalization at the 
same time. (The existing normalization functions have to be used on 
unescaped URIs. It's already tricky enough keeping track of gnome-vfs 
escaping issues...)

- Mike
___
desktop-devel-list mailing list
desktop-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/desktop-devel-list