[chromium-dev] Re: Changes to FilePath?

2009-05-14 Thread Greg Spencer
On Wed, May 13, 2009 at 7:24 PM, Brett Wilson bre...@chromium.org wrote:

 You can't actually canonicalize a filename on Windows, so I think it's
 dangerous to write a component that claims to do it.


You can do it under controlled conditions, and especially if the file exists
on the disk already and is accessible.  For instance, if you don't try to
handle (non-deterministic) 8.3 names of files that don't exist yet/anymore
and NTFS mount points, I think you can fairly safely apply the regular
rules to canonicalize paths (and even if you applied the rules to those,
most of the time they would still work).  I would make sure that the class
only claims to canonicalize paths that it really knows it can do, of course.

Look, I know there are tough problems here, but why not TRY to solve them as
well as possible.  FilePath is fine for simple manipulations, and is a good,
lightweight container if you're not planning on doing anything complex with
the file names.  If you actually need to do more interesting things with
them, like display the names, convert to relative paths, compare them for
equality or pass them off to a third party in a particular encoding, it's
not sufficient.

I could write a half-assed implementation that kinda works if you don't
throw anything wonky at it.  I've got that now.  I want something more
bulletproof.  It can't be perfect because file paths are non-deterministic
on all three systems in not so obvious ways, but why should everyone who
needs more than FilePath have to climb that learning curve?  And we can only
give out information that is as good as we get from the OS -- if the OS
isn't able to present a filesystem that makes sense, we can only provide the
best gibberish we can get our hands on.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
(ping)
So, I had another idea.  How about a separate file path manipulation class
that has a well defined character encoding, so that we can do filename
manipulations like with FilePath (and a few more).  It could convert from a
FilePath if given an encoding, and convert back to a FilePath with the
platform's default encoding (using LC_*/LANG on Linux, falling back to
ASCII), or a given encoding.  It could touch the filesystem so that it could
know what ecoding methods and manipulations were valid for the
platform/drive combination.

Since it seems like this is not really something that Chromium needs or
wants right now (and it doesn't belong in base anyhow because of needing to
touch the filesystem), I think I'll work on this for O3D, and later you can
see if you want to use it for Chromium.

-Greg.

On Wed, Apr 29, 2009 at 3:58 PM, Greg Spencer gspen...@google.com wrote:

 On Wed, Apr 29, 2009 at 12:22 PM, Mark Mentovai m...@chromium.org wrote:

 I understand your problem.  You're saying I have user-supplied data
 that I want to build a filename from, and I have this pathname that
 I want to display back to the user.  I agree that it would be good to

 have a way to handle these cases in base.  I don't know if FilePath
 proper is the right place to do it.  If we do it in FilePath, it still
 won't really be right.


 OK, so it sounds like you're telling me not to use FilePath to represent
 file paths from a disk for my purposes because they can't ever be converted
 reliably to a particular encoding on Linux (which is a requirement for me,
 because of the third party libraries that require a particular encoding).

 That's fine, but what do I do instead?  Roll my own FilePath clone that has
 some encoding assumptions?  I can do that, but it has the same issues as the
 ones you're worried about with FilePath, so it seems better to solve the
 issue in one place rather than have two versions that are both insufficient.
  Man, it would be better if FilePath could reliably know its encoding!  (I
 realize that Linux makes this impossible, it just seems like it would be
 better that way. :-)

 Since Linux is the only platform where the encoding is unclear, what if we
 did the best we could on Linux:

 When constructing a FilePath from a char* string on Linux:
 - Test the input string for values  127 to determine if it's really just
 ASCII (and if so, we're out of the woods).
 - Then check LANG, LC_CTYPE, LC_ALL (through appropriate Linux APIs) for an
 encoding that we can support, and note the encoding for later if we are
 requested to do a conversion.
 - If we run into an invalid sequence during a conversion, or an encoding we
 can't convert from, then use a CHECK to crash.

 This should work on most filenames, in almost all situations -- I'll bet
 most filenames are ASCII, even on foreign systems, and the ones that aren't
 ASCII have set LANG to something in /etc/profile, so all filenames created
 by any app running on that machine should match that encoding.

 Where they don't do that correctly, they're already getting garbage (and
 should expect garbage) from any application they use, not just Chrome, since
 there is no way *any *app can decode a path with multiple encodings in it,
 or where the encoding is different than LANG (or LC_*) says it is.

 Chrome already crashes like this when it encounters situations where it's
 just impossible to know what's right, so it's consistent with Chrome's
 behavior in other areas.


 it should be the caller's responsibility to only deal with user-created
 names with
 this interface.


 What do you mean here?  Isn't that the case now with FilePath?  (It's the
 file_util routines that actually read the filesystem and make FilePaths out
 of them, afterall).  As for your suggestion to only deal with path
 components, how would you propose to parse user-supplied paths into one of
 these?


  2) I'd like to make it possible to instantiate a POSIX FilePath object
 on
  Windows and a Windows FilePath on POSIX platforms.  This is because some
  libraries (e.g. the zip library, or tar files), use POSIX semantics for
  their paths even on Windows (I haven't seen a use case for Windows paths
 on
  POSIX yet, actually).   This would make it possible to use the nice API
 that
  FilePath has to manipulate paths appropriately for these other
 libraries.
  This could be easily accomplished by having POSIX and Windows versions
 of
  FilePath, and then typedef'ing FilePath differently on different
 platforms
  to one of these versions.

 Sounds pretty Pythonic.

 FilePath already sort of has some support for this - it does a bunch
 of things based on feature macros, mostly so that as I was writing it,
 I could test the Windows semantics without having to (shudder) resort
 to running on Windows.  These could probably be adapted to do what
 you're asking.


 Cool.


  3) It would be helpful to have real path normalization for each of the
  platforms (although I know what a testing nightmare 

[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Mark Mentovai

If you've got a file that begins its life as something on-disk, and
you just need to carry the path to it around, then that's fine, it
should live its life as a FilePath.

If you've got to create a file using some name where the name is some
constant in code, use FilePath with ASCII constants.  AppendASCII
exists to stick new ASCII components onto existing FilePaths.  This is
fine and is considered safe because ASCII is a subset of any rational
filesystem encoding.

If you've got to take an arbitrary FilePath and convert it for display
to the user, or take an arbitrary string in a known encoding and
re-encode it for the filesystem, then we don't have anything in
FilePath for this.  I believe that if we do add something, it should
strictly operate only on single pathname components at a time, and not
entire pathnames.  We could add it to FilePath or we could add it
somewhere else, because it is sort of distinct from what FilePath is
really supposed to be, which is just a container for ferrying around
native paths.

 It's also a specification and implementation nightmare.  Everyone has
 a different idea of what normalization means.  What's your idea?

 Yes, I know it's a nightmare all around, but I think it would be useful to
 have something that addresses this.  My idea would be the same as Python's
 os.path.normpath, mainly because it's a well-tested, seasoned example with
 test cases.  Windows also has a routine for this (PathCanonicalize) that
 could be used (but I know it doesn't work for UNC paths).

Why would it be useful?  Do you want to compare paths for equality?
Then we should have an API that compares paths for equality.  It would
have to hit the disk to do so.  You might need general-purpose
canonization to implement that on some systems.  Great, you need to
hit the disk to do that too.  It's fine if you want these things, but
we can't put them into FilePath.  It's important that FilePath remain
lightweight and not make any system calls, because system calls can
block and FilePath is just a data carrier.

os.path.normpath is known to be buggy.  It might be well-tested and
seasoned, but only within the confines of its known limitations.
Watch this.

m...@anodizer bash$ ls -l a/b/../c
-rw-r--r--  1 mark  staff  0 May 13 15:47 a/b/../c
m...@anodizer bash$ python
Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type help, copyright, credits or license for more information.
 import os.path
 os.path.normpath('a/b/../c')
'a/c'
 ^D
m...@anodizer bash$ ls -l a/c
ls: a/c: No such file or directory

 Probably the same as os.path.normcase in Python.  I want this stuff so that
 I can make sure that I can at least semi-reliably compare/manipulate
 FilePaths to do things like absolute-relative path conversion, or store
 FilePaths in a set or map and be sure I don't have multiple entries pointing
 to the same file.  Without these kinds of operations, doing these things is
 pretty much impossible.

I don't think os.path.normcase does what you're asking for either.

m...@anodizer bash$ ls -lid /System/Library
81 drwxr-xr-x  64 root  wheel  2176 May 12 18:37 /System/Library
m...@anodizer bash$ ls -lid /system/LIBRARY
81 drwxr-xr-x  64 root  wheel  2176 May 12 18:37 /system/LIBRARY
m...@anodizer bash$ python
Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type help, copyright, credits or license for more information.
 import sys
 sys.platform
'darwin'
 import os.path
 os.path.normcase('/System/Library')
'/System/Library'
 os.path.normcase('/system/LIBRARY')
'/system/LIBRARY'
 ^D

Even os.path.realpath returns the same results.

Again, it sounds like what you really want is a pathname comparator
that hits the disk.  You really can't do this stuff correctly on most
systems without talking to the filesystem.  You can't even do
general-purpose canonization without talking to the filesystem.

Let me make clear: I'm not trying to shoot down the idea of needing to
be able to compare paths or even necessarily canonize them.  I'm
arguing primarily against doing it in FilePath, but I'm also also
trying to illustrate that doing proper comparisons and canonization is
harder than it seems, that even seasoned and well-tested APIs are
limited in ways that developers don't necessarily expect, and that the
semantics and expectations need to be well-defined.

Mark

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Darin Fisher
On Tue, Apr 28, 2009 at 2:47 PM, Greg Spencer gspen...@google.com wrote:

 On Tue, Apr 28, 2009 at 2:41 PM, Amanda Walker ama...@chromium.orgwrote:


 On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer gspen...@google.com
 wrote:
  1) I'd like to add some explicit routines for converting to/from UTF8
 and
  UTF16.  While it's nice (and important) that FilePath uses the
 platform's
  native string, we've found that many third party libraries have made
 other
  assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t)
 paths
  regardless of platform, and converting a FilePath to and from those
 forms is
  a platform-dependent exercise which should be centralized into the class
  (i.e. adding ToUTF8 and ToWide functions to the class, and explicit
  constructors that take each type).

 One thing many of us have found, across multiple projects, is that
 wchar_t is fraught with complication as soon as more than one platform
 is involved. wchar_t == UTF16 is a Windowsism (gcc defaults to 4
 bytes, for example, and Lmumble gets stored in UCS-4, not UTF-16).
 Chrome started with more or less what you are suggesting, and we moved
 off of it after much pain.


 I understand those issues quite well (but I probably should call the
 conversion method ToUTF16, now that you mention it).  And char* isn't
 necessarily UTF8 on all platforms either.

 OK, so what's the currently recommended path for converting to UTF16 or
 UTF8 from a FilePath?



That conversion is not defined.  If you are on Linux, the contents of the
file path is just an array of bytes.  It might be UTF-8, in which case you
can convert to UTF-16.  However, it may also be some crazy encoding or it
may not match any encoding.  This OS does not require it to match an
encoding.

When we need to convert a FilePath to Unicode, we use the SysWideToNativeMB
and SysNativeMBToWide functions from base.  This works by inspecting what
the system thinks the current multi-byte encoding is.  On Mac that is UTF-8.
 On Linux, it depends on the value of $LANG.  Each time we do such a
conversion, we are introducing a potential bug in the product (on Linux at
least), so we try hard to avoid them.

-Darin

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
On Wed, May 13, 2009 at 1:03 PM, Mark Mentovai m...@chromium.org wrote:

 If you've got to take an arbitrary FilePath and convert it for display
 to the user, or take an arbitrary string in a known encoding and
 re-encode it for the filesystem, then we don't have anything in
 FilePath for this.  I believe that if we do add something, it should
 strictly operate only on single pathname components at a time, and not
 entire pathnames.  We could add it to FilePath or we could add it
 somewhere else, because it is sort of distinct from what FilePath is
 really supposed to be, which is just a container for ferrying around
 native paths.


OK, I can see the allure of dealing in terms of lists of encoded strings so
that you
can encode them separately.   For my purposes, I need to get a string
encoded as
UTF16 (on Windows) or UTF8 (on other platforms) that represents a filename
so that
I can pass it to third party APIs, so it has to include the path separators.
 But that
can be done as a join operation when I get the string out.

 It's also a specification and implementation nightmare.  Everyone has
  a different idea of what normalization means.  What's your idea?
 
  Yes, I know it's a nightmare all around, but I think it would be useful
 to
  have something that addresses this.  My idea would be the same as
 Python's
  os.path.normpath, mainly because it's a well-tested, seasoned example
 with
  test cases.  Windows also has a routine for this (PathCanonicalize) that
  could be used (but I know it doesn't work for UNC paths).

 Why would it be useful?  Do you want to compare paths for equality?


Yes, for instance to be able to place them into a map or set and be sure I
only have one
entry for a particular file.  And I want to be able to do absolute to
relative path conversions
(as far as possible, anyhow).  And yes, I know that those are *really hard*
to do properly,
which argues even more for implementing one in a common library so that
individual
developers don't roll their own all the time, thinking that it is easy (and
consequently
producing buggy implementations).


 Then we should have an API that compares paths for equality.  It would
 have to hit the disk to do so.  You might need general-purpose
 canonization to implement that on some systems.  Great, you need to
 hit the disk to do that too.  It's fine if you want these things, but
 we can't put them into FilePath.  It's important that FilePath remain
 lightweight and not make any system calls, because system calls can
 block and FilePath is just a data carrier.


Which is why I proposed in my last message not putting them into FilePath,
since I can see
that it is not your intention that it support anything that hits the
filesystem (and I can see why
you would want that).

os.path.normpath is known to be buggy.  It might be well-tested and
 seasoned, but only within the confines of its known limitations.
 Watch this. [...]


Yes, I'm aware that you can create situations (especially with symbolic
links) where
the same path conversions will succeed or fail depending on the filesystem
contents.  This is why
the class would have to have access to the filesystem.


 Again, it sounds like what you really want is a pathname comparator
 that hits the disk.  You really can't do this stuff correctly on most
 systems without talking to the filesystem.  You can't even do
 general-purpose canonization without talking to the filesystem.


Yep.  Totally agreed. (and normcase is probably not the behavior I'm looking
for, you're right).


 Let me make clear: I'm not trying to shoot down the idea of needing to
 be able to compare paths or even necessarily canonize them.  I'm
 arguing primarily against doing it in FilePath, but I'm also also
 trying to illustrate that doing proper comparisons and canonization is
 harder than it seems, that even seasoned and well-tested APIs are
 limited in ways that developers don't necessarily expect, and that the
 semantics and expectations need to be well-defined.


Very well illustrated, and I assure you that I'm well aware that it's a
bitch to do right.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
On Wed, May 13, 2009 at 2:05 PM, Darin Fisher da...@chromium.org wrote:

 That conversion is not defined.  If you are on Linux, the contents of the
 file path is just an array of bytes.  It might be UTF-8, in which case you
 can convert to UTF-16.  However, it may also be some crazy encoding or it
 may not match any encoding.  This OS does not require it to match an
 encoding.

 When we need to convert a FilePath to Unicode, we use the SysWideToNativeMB
 and SysNativeMBToWide functions from base.  This works by inspecting what
 the system thinks the current multi-byte encoding is.  On Mac that is UTF-8.
  On Linux, it depends on the value of $LANG.  Each time we do such a
 conversion, we are introducing a potential bug in the product (on Linux at
 least), so we try hard to avoid them.


Yes, I know that this is how it works (see earlier messages in this thread),
but can you tell me if there are any Linux apps that manage to do this
correctly (e.g. without having this bug), and how they do it?

I can't see how any Linux app can do any better than looking at LANG and
LC_CHAR and hoping that they're set correctly.  Certainly there's no way to
decode a pathname that includes multiple encodings, and I have no idea what
happens with NFS mounts between machines with different settings.

I'm just saying why not just do as well as can be done by the best app out
there, and punt after that?

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Scott Hess

This post made me think that we should have infrastructure so that
certain unit tests can opt to run in a restricted environment to
enforce that someone doesn't come along and add filesystem-access code
or other known-bad synchronous APIs.

I realize that that is probably hard, and that patches would be
welcome.  Just throwing it out there in hopes that someone says Hey,
I know how to do that and someone else says Hey, do that.

-scott

[It could also be a rathole that only seems like a good idea until you
actually try it, like getting const-ness propagation thoroughly
correct.]


On Wed, May 13, 2009 at 1:03 PM, Mark Mentovai m...@chromium.org wrote:

 If you've got a file that begins its life as something on-disk, and
 you just need to carry the path to it around, then that's fine, it
 should live its life as a FilePath.

 If you've got to create a file using some name where the name is some
 constant in code, use FilePath with ASCII constants.  AppendASCII
 exists to stick new ASCII components onto existing FilePaths.  This is
 fine and is considered safe because ASCII is a subset of any rational
 filesystem encoding.

 If you've got to take an arbitrary FilePath and convert it for display
 to the user, or take an arbitrary string in a known encoding and
 re-encode it for the filesystem, then we don't have anything in
 FilePath for this.  I believe that if we do add something, it should
 strictly operate only on single pathname components at a time, and not
 entire pathnames.  We could add it to FilePath or we could add it
 somewhere else, because it is sort of distinct from what FilePath is
 really supposed to be, which is just a container for ferrying around
 native paths.

 It's also a specification and implementation nightmare.  Everyone has
 a different idea of what normalization means.  What's your idea?

 Yes, I know it's a nightmare all around, but I think it would be useful to
 have something that addresses this.  My idea would be the same as Python's
 os.path.normpath, mainly because it's a well-tested, seasoned example with
 test cases.  Windows also has a routine for this (PathCanonicalize) that
 could be used (but I know it doesn't work for UNC paths).

 Why would it be useful?  Do you want to compare paths for equality?
 Then we should have an API that compares paths for equality.  It would
 have to hit the disk to do so.  You might need general-purpose
 canonization to implement that on some systems.  Great, you need to
 hit the disk to do that too.  It's fine if you want these things, but
 we can't put them into FilePath.  It's important that FilePath remain
 lightweight and not make any system calls, because system calls can
 block and FilePath is just a data carrier.

 os.path.normpath is known to be buggy.  It might be well-tested and
 seasoned, but only within the confines of its known limitations.
 Watch this.

 m...@anodizer bash$ ls -l a/b/../c
 -rw-r--r--  1 mark  staff  0 May 13 15:47 a/b/../c
 m...@anodizer bash$ python
 Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
 [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
 Type help, copyright, credits or license for more information.
 import os.path
 os.path.normpath('a/b/../c')
 'a/c'
 ^D
 m...@anodizer bash$ ls -l a/c
 ls: a/c: No such file or directory

 Probably the same as os.path.normcase in Python.  I want this stuff so that
 I can make sure that I can at least semi-reliably compare/manipulate
 FilePaths to do things like absolute-relative path conversion, or store
 FilePaths in a set or map and be sure I don't have multiple entries pointing
 to the same file.  Without these kinds of operations, doing these things is
 pretty much impossible.

 I don't think os.path.normcase does what you're asking for either.

 m...@anodizer bash$ ls -lid /System/Library
 81 drwxr-xr-x  64 root  wheel  2176 May 12 18:37 /System/Library
 m...@anodizer bash$ ls -lid /system/LIBRARY
 81 drwxr-xr-x  64 root  wheel  2176 May 12 18:37 /system/LIBRARY
 m...@anodizer bash$ python
 Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
 [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
 Type help, copyright, credits or license for more information.
 import sys
 sys.platform
 'darwin'
 import os.path
 os.path.normcase('/System/Library')
 '/System/Library'
 os.path.normcase('/system/LIBRARY')
 '/system/LIBRARY'
 ^D

 Even os.path.realpath returns the same results.

 Again, it sounds like what you really want is a pathname comparator
 that hits the disk.  You really can't do this stuff correctly on most
 systems without talking to the filesystem.  You can't even do
 general-purpose canonization without talking to the filesystem.

 Let me make clear: I'm not trying to shoot down the idea of needing to
 be able to compare paths or even necessarily canonize them.  I'm
 arguing primarily against doing it in FilePath, but I'm also also
 trying to illustrate that doing proper comparisons and canonization is
 harder than it seems, that even seasoned and well-tested APIs 

[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Amanda Walker

Perhaps what we need is a companion to FilePath.  For example:

FilePath: much as it is now, lightweight, alternative to string manipulation.
FileReference: heavierweight, can talk to the file system and have
carnal knowledge of platform specifics for things like resolving /
canonicalizing pathnames, determining whether or not they refer to the
same files, generating C strings that can be passed to 3rd party
libraries, etc.

--Amanda


On Wed, May 13, 2009 at 5:22 PM, Greg Spencer gspen...@google.com wrote:
 On Wed, May 13, 2009 at 1:03 PM, Mark Mentovai m...@chromium.org wrote:

 If you've got to take an arbitrary FilePath and convert it for display
 to the user, or take an arbitrary string in a known encoding and
 re-encode it for the filesystem, then we don't have anything in
 FilePath for this.  I believe that if we do add something, it should
 strictly operate only on single pathname components at a time, and not
 entire pathnames.  We could add it to FilePath or we could add it
 somewhere else, because it is sort of distinct from what FilePath is
 really supposed to be, which is just a container for ferrying around
 native paths.


 OK, I can see the allure of dealing in terms of lists of encoded strings so
 that you
 can encode them separately.   For my purposes, I need to get a string
 encoded as
 UTF16 (on Windows) or UTF8 (on other platforms) that represents a filename
 so that
 I can pass it to third party APIs, so it has to include the path separators.
  But that
 can be done as a join operation when I get the string out.

  It's also a specification and implementation nightmare.  Everyone has
  a different idea of what normalization means.  What's your idea?
 
  Yes, I know it's a nightmare all around, but I think it would be useful
  to
  have something that addresses this.  My idea would be the same as
  Python's
  os.path.normpath, mainly because it's a well-tested, seasoned example
  with
  test cases.  Windows also has a routine for this (PathCanonicalize) that
  could be used (but I know it doesn't work for UNC paths).

 Why would it be useful?  Do you want to compare paths for equality?

 Yes, for instance to be able to place them into a map or set and be sure I
 only have one
 entry for a particular file.  And I want to be able to do absolute to
 relative path conversions
 (as far as possible, anyhow).  And yes, I know that those are *really hard*
 to do properly,
 which argues even more for implementing one in a common library so that
 individual
 developers don't roll their own all the time, thinking that it is easy (and
 consequently
 producing buggy implementations).


 Then we should have an API that compares paths for equality.  It would
 have to hit the disk to do so.  You might need general-purpose
 canonization to implement that on some systems.  Great, you need to
 hit the disk to do that too.  It's fine if you want these things, but
 we can't put them into FilePath.  It's important that FilePath remain
 lightweight and not make any system calls, because system calls can
 block and FilePath is just a data carrier.

 Which is why I proposed in my last message not putting them into FilePath,
 since I can see
 that it is not your intention that it support anything that hits the
 filesystem (and I can see why
 you would want that).

 os.path.normpath is known to be buggy.  It might be well-tested and
 seasoned, but only within the confines of its known limitations.
 Watch this. [...]

 Yes, I'm aware that you can create situations (especially with symbolic
 links) where
 the same path conversions will succeed or fail depending on the filesystem
 contents.  This is why
 the class would have to have access to the filesystem.


 Again, it sounds like what you really want is a pathname comparator
 that hits the disk.  You really can't do this stuff correctly on most
 systems without talking to the filesystem.  You can't even do
 general-purpose canonization without talking to the filesystem.

 Yep.  Totally agreed. (and normcase is probably not the behavior I'm looking
 for, you're right).


 Let me make clear: I'm not trying to shoot down the idea of needing to
 be able to compare paths or even necessarily canonize them.  I'm
 arguing primarily against doing it in FilePath, but I'm also also
 trying to illustrate that doing proper comparisons and canonization is
 harder than it seems, that even seasoned and well-tested APIs are
 limited in ways that developers don't necessarily expect, and that the
 semantics and expectations need to be well-defined.

 Very well illustrated, and I assure you that I'm well aware that it's a
 bitch to do right.
 -Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Brett Wilson

On Wed, May 13, 2009 at 3:51 PM, Amanda Walker ama...@chromium.org wrote:

 Perhaps what we need is a companion to FilePath.  For example:

 FilePath: much as it is now, lightweight, alternative to string 
 manipulation.
 FileReference: heavierweight, can talk to the file system and have
 carnal knowledge of platform specifics for things like resolving /
 canonicalizing pathnames, determining whether or not they refer to the
 same files, generating C strings that can be passed to 3rd party
 libraries, etc.

I think this is very dangerous.

I think Greg should not be talking to the filesystem when inserting
filenames into a set. We don't allow filesystem access from the UI
thread of Chrome, and I think other parts of our system should also
not do filesystem access on their critical threads, especially if they
want to be more part of Chrome in the future.

Brett

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Amanda Walker

On Wed, May 13, 2009 at 7:07 PM, Brett Wilson bre...@chromium.org wrote:
 On Wed, May 13, 2009 at 3:51 PM, Amanda Walker ama...@chromium.org wrote:

 Perhaps what we need is a companion to FilePath.  For example:

 FilePath: much as it is now, lightweight, alternative to string 
 manipulation.
 FileReference: heavierweight, can talk to the file system and have
 carnal knowledge of platform specifics for things like resolving /
 canonicalizing pathnames, determining whether or not they refer to the
 same files, generating C strings that can be passed to 3rd party
 libraries, etc.

 I think this is very dangerous.

 I think Greg should not be talking to the filesystem when inserting
 filenames into a set. We don't allow filesystem access from the UI
 thread of Chrome, and I think other parts of our system should also
 not do filesystem access on their critical threads, especially if they
 want to be more part of Chrome in the future.

But in context, he's passing these things to 3rd party libraries that
will be doing plenty of file system access (importing and exporting
data, for example).  That's why I was suggesting something separate
from FilePath for such use.

--Amanda

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Darin Fisher
On Wed, May 13, 2009 at 2:20 PM, Greg Spencer gspen...@google.com wrote:

 On Wed, May 13, 2009 at 2:05 PM, Darin Fisher da...@chromium.org wrote:

 That conversion is not defined.  If you are on Linux, the contents of the
 file path is just an array of bytes.  It might be UTF-8, in which case you
 can convert to UTF-16.  However, it may also be some crazy encoding or it
 may not match any encoding.  This OS does not require it to match an
 encoding.

 When we need to convert a FilePath to Unicode, we use the
 SysWideToNativeMB and SysNativeMBToWide functions from base.  This works by
 inspecting what the system thinks the current multi-byte encoding is.  On
 Mac that is UTF-8.  On Linux, it depends on the value of $LANG.  Each time
 we do such a conversion, we are introducing a potential bug in the product
 (on Linux at least), so we try hard to avoid them.


 Yes, I know that this is how it works (see earlier messages in this
 thread), but can you tell me if there are any Linux apps that manage to do
 this correctly (e.g. without having this bug), and how they do it?

 I can't see how any Linux app can do any better than looking at LANG and
 LC_CHAR and hoping that they're set correctly.  Certainly there's no way to
 decode a pathname that includes multiple encodings, and I have no idea what
 happens with NFS mounts between machines with different settings.

 I'm just saying why not just do as well as can be done by the best app out
 there, and punt after that?

 -Greg.



Sorry to repeat information.  This is a long thread!

The solution is to not convert to UTF-16 unless you are trying to generate
a string to display to the user.  Then you should use the LANG information
to determine how best to render the text for display to the user.

The program should try its best to preserve the file path in the original
form and not try to convert to UTF-16 and back again since that conversion
may be lossy.

I know this doesn't really help.  I think it is reasonable to have a utility
somewhere to perform a conversion to UTF-16 (or UTF-8), but it should come
with a stern warning, and I kind of prefer it not being a method on FilePath
since I would prefer people not be tempted to overuse it.

-Darin

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Brett Wilson

On Wed, May 13, 2009 at 4:34 PM, Amanda Walker ama...@chromium.org wrote:
 On Wed, May 13, 2009 at 7:07 PM, Brett Wilson bre...@chromium.org wrote:
 On Wed, May 13, 2009 at 3:51 PM, Amanda Walker ama...@chromium.org wrote:

 Perhaps what we need is a companion to FilePath.  For example:

 FilePath: much as it is now, lightweight, alternative to string 
 manipulation.
 FileReference: heavierweight, can talk to the file system and have
 carnal knowledge of platform specifics for things like resolving /
 canonicalizing pathnames, determining whether or not they refer to the
 same files, generating C strings that can be passed to 3rd party
 libraries, etc.

 I think this is very dangerous.

 I think Greg should not be talking to the filesystem when inserting
 filenames into a set. We don't allow filesystem access from the UI
 thread of Chrome, and I think other parts of our system should also
 not do filesystem access on their critical threads, especially if they
 want to be more part of Chrome in the future.

 But in context, he's passing these things to 3rd party libraries that
 will be doing plenty of file system access (importing and exporting
 data, for example).  That's why I was suggesting something separate
 from FilePath for such use.

Then he doesn't need canonicalization at all. He needs to know how the
third party library is going to use the string for filesystem access
and then do the corresponding transformations. That does not involve
filesystem access.

Brett

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
On Wed, May 13, 2009 at 4:35 PM, Darin Fisher da...@chromium.org wrote:

 The solution is to not convert to UTF-16 unless you are trying to
 generate a string to display to the user.  Then you should use the LANG
 information to determine how best to render the text for display to the
 user.


Yeah, that would be nice, and I agree, but the reason I need it is that some
third party APIs (probably wrongly) take UTF16 to represent an input file in
their API.  So in order for the third party API to load the file properly, I
need a UTF16 version of the file path.  Also, in all of the O3D code, we
assume that strings are encoded in UTF8 (which is fine and correct for any
string except for filenames on Linux), so any string that might come from
the user would come in as UTF8, and I'd have to translate it into a FilePath
(somehow).


 I know this doesn't really help.  I think it is reasonable to have a
 utility somewhere to perform a conversion to UTF-16 (or UTF-8), but it
 should come with a stern warning, and I kind of prefer it not being a method
 on FilePath since I would prefer people not be tempted to overuse it.


Yeah, I think we've beat that to death: it won't be in FilePath.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
On Wed, May 13, 2009 at 4:07 PM, Brett Wilson bre...@chromium.org wrote:

 On Wed, May 13, 2009 at 3:51 PM, Amanda Walker ama...@chromium.org
 wrote:
 
  Perhaps what we need is a companion to FilePath.  For example:
 
  FilePath: much as it is now, lightweight, alternative to string
 manipulation.
  FileReference: heavierweight, can talk to the file system and have
  carnal knowledge of platform specifics for things like resolving /
  canonicalizing pathnames, determining whether or not they refer to the
  same files, generating C strings that can be passed to 3rd party
  libraries, etc.

 I think this is very dangerous.

 I think Greg should not be talking to the filesystem when inserting
 filenames into a set. We don't allow filesystem access from the UI
 thread of Chrome, and I think other parts of our system should also
 not do filesystem access on their critical threads, especially if they
 want to be more part of Chrome in the future.


Well, so the use I have for this in O3D at the moment is in our importer,
which currently is a separate command-line tool that reads Collada files and
writes out our wire format for geometry.  So it isn't meant to be occuring
in a UI thread, but I could see times when it might be useful to know for
sure if two files reference the same file in the UI thread (dragging and
dropping a file onto a drop zone, for instance).

I do need to know if I have the same file more than once in a set because
the COLLADA file might reference the same texture multiple times, or (more
dangerous) it might reference a file that is one file on Windows,
but (incorrectly) maps to two different files in the (Unix-path-format) .tgz
files.  To detect that, I need canonicalization.

I also need to convert paths in the Collada file to relative paths in our
tgz files.  In order to do that, I need to be able to normalize the path to
the Collada file so I can normalize the paths to the referenced texture
files and strip off common base directories.

I'd really like to avoid the filesystem access too -- it's a real pain in
the ass to do, which is why it hasn't been done yet.  Currently, the user
has to tell me the string to strip off of the pathnames to make them
relative, and if files collide or split, then the output is just 2x bigger,
or just doesn't work.  I'd like to fix those things, but to do it right, I
need a better set of tools, and it seemed to me that if I was needing these
tools, then someone else could use them too.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Brett Wilson

On Wed, May 13, 2009 at 6:12 PM, Greg Spencer gspen...@google.com wrote:
 On Wed, May 13, 2009 at 4:07 PM, Brett Wilson bre...@chromium.org wrote:

 On Wed, May 13, 2009 at 3:51 PM, Amanda Walker ama...@chromium.org
 wrote:
 
  Perhaps what we need is a companion to FilePath.  For example:
 
  FilePath: much as it is now, lightweight, alternative to string
  manipulation.
  FileReference: heavierweight, can talk to the file system and have
  carnal knowledge of platform specifics for things like resolving /
  canonicalizing pathnames, determining whether or not they refer to the
  same files, generating C strings that can be passed to 3rd party
  libraries, etc.

 I think this is very dangerous.

 I think Greg should not be talking to the filesystem when inserting
 filenames into a set. We don't allow filesystem access from the UI
 thread of Chrome, and I think other parts of our system should also
 not do filesystem access on their critical threads, especially if they
 want to be more part of Chrome in the future.

 Well, so the use I have for this in O3D at the moment is in our importer,
 which currently is a separate command-line tool that reads Collada files and
 writes out our wire format for geometry.  So it isn't meant to be occuring
 in a UI thread, but I could see times when it might be useful to know for
 sure if two files reference the same file in the UI thread (dragging and
 dropping a file onto a drop zone, for instance).
 I do need to know if I have the same file more than once in a set because
 the COLLADA file might reference the same texture multiple times, or (more
 dangerous) it might reference a file that is one file on Windows,
 but (incorrectly) maps to two different files in the (Unix-path-format) .tgz
 files.  To detect that, I need canonicalization.

You can't actually canonicalize a filename on Windows, so I think it's
dangerous to write a component that claims to do it.

I think you just need to come up with some simple rules that makes it
work most of the time. Personally I would do ASCII lowercasing and
stop worrying about it. If you use ICU to lower-case correctly,
Windows won't necessarily agree and you won't be able to use that
file.

Brett

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Brett Wilson

On Wed, May 13, 2009 at 7:24 PM, Brett Wilson bre...@chromium.org wrote:
 On Wed, May 13, 2009 at 6:12 PM, Greg Spencer gspen...@google.com wrote:
 On Wed, May 13, 2009 at 4:07 PM, Brett Wilson bre...@chromium.org wrote:

 On Wed, May 13, 2009 at 3:51 PM, Amanda Walker ama...@chromium.org
 wrote:
 
  Perhaps what we need is a companion to FilePath.  For example:
 
  FilePath: much as it is now, lightweight, alternative to string
  manipulation.
  FileReference: heavierweight, can talk to the file system and have
  carnal knowledge of platform specifics for things like resolving /
  canonicalizing pathnames, determining whether or not they refer to the
  same files, generating C strings that can be passed to 3rd party
  libraries, etc.

 I think this is very dangerous.

 I think Greg should not be talking to the filesystem when inserting
 filenames into a set. We don't allow filesystem access from the UI
 thread of Chrome, and I think other parts of our system should also
 not do filesystem access on their critical threads, especially if they
 want to be more part of Chrome in the future.

 Well, so the use I have for this in O3D at the moment is in our importer,
 which currently is a separate command-line tool that reads Collada files and
 writes out our wire format for geometry.  So it isn't meant to be occuring
 in a UI thread, but I could see times when it might be useful to know for
 sure if two files reference the same file in the UI thread (dragging and
 dropping a file onto a drop zone, for instance).
 I do need to know if I have the same file more than once in a set because
 the COLLADA file might reference the same texture multiple times, or (more
 dangerous) it might reference a file that is one file on Windows,
 but (incorrectly) maps to two different files in the (Unix-path-format) .tgz
 files.  To detect that, I need canonicalization.

 You can't actually canonicalize a filename on Windows, so I think it's
 dangerous to write a component that claims to do it.

I guess you could call GetShortPathName every time you see a name. But
I think that's a crazy solution. I still think you should do my
suggestion below.


 I think you just need to come up with some simple rules that makes it
 work most of the time. Personally I would do ASCII lowercasing and
 stop worrying about it. If you use ICU to lower-case correctly,
 Windows won't necessarily agree and you won't be able to use that
 file.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Darin Fisher
FYI:  Don't use GetShortPathName.  It isn't supported on some Windows
systems.  We had a significant number of users that could not use Firefox
until we stopped using it.
-Darin


On Wed, May 13, 2009 at 7:29 PM, Brett Wilson bre...@chromium.org wrote:


 On Wed, May 13, 2009 at 7:24 PM, Brett Wilson bre...@chromium.org wrote:
  On Wed, May 13, 2009 at 6:12 PM, Greg Spencer gspen...@google.com
 wrote:
  On Wed, May 13, 2009 at 4:07 PM, Brett Wilson bre...@chromium.org
 wrote:
 
  On Wed, May 13, 2009 at 3:51 PM, Amanda Walker ama...@chromium.org
  wrote:
  
   Perhaps what we need is a companion to FilePath.  For example:
  
   FilePath: much as it is now, lightweight, alternative to string
   manipulation.
   FileReference: heavierweight, can talk to the file system and have
   carnal knowledge of platform specifics for things like resolving /
   canonicalizing pathnames, determining whether or not they refer to
 the
   same files, generating C strings that can be passed to 3rd party
   libraries, etc.
 
  I think this is very dangerous.
 
  I think Greg should not be talking to the filesystem when inserting
  filenames into a set. We don't allow filesystem access from the UI
  thread of Chrome, and I think other parts of our system should also
  not do filesystem access on their critical threads, especially if they
  want to be more part of Chrome in the future.
 
  Well, so the use I have for this in O3D at the moment is in our
 importer,
  which currently is a separate command-line tool that reads Collada files
 and
  writes out our wire format for geometry.  So it isn't meant to be
 occuring
  in a UI thread, but I could see times when it might be useful to know
 for
  sure if two files reference the same file in the UI thread (dragging and
  dropping a file onto a drop zone, for instance).
  I do need to know if I have the same file more than once in a set
 because
  the COLLADA file might reference the same texture multiple times, or
 (more
  dangerous) it might reference a file that is one file on Windows,
  but (incorrectly) maps to two different files in the (Unix-path-format)
 .tgz
  files.  To detect that, I need canonicalization.
 
  You can't actually canonicalize a filename on Windows, so I think it's
  dangerous to write a component that claims to do it.

 I guess you could call GetShortPathName every time you see a name. But
 I think that's a crazy solution. I still think you should do my
 suggestion below.


  I think you just need to come up with some simple rules that makes it
  work most of the time. Personally I would do ASCII lowercasing and
  stop worrying about it. If you use ICU to lower-case correctly,
  Windows won't necessarily agree and you won't be able to use that
  file.

 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Darin Fisher
I mean.. there's a registry setting or something that can be set to disable
it.-darin

On Wed, May 13, 2009 at 8:40 PM, Darin Fisher da...@chromium.org wrote:

 FYI:  Don't use GetShortPathName.  It isn't supported on some Windows
 systems.  We had a significant number of users that could not use Firefox
 until we stopped using it.
 -Darin


 On Wed, May 13, 2009 at 7:29 PM, Brett Wilson bre...@chromium.org wrote:


 On Wed, May 13, 2009 at 7:24 PM, Brett Wilson bre...@chromium.org
 wrote:
  On Wed, May 13, 2009 at 6:12 PM, Greg Spencer gspen...@google.com
 wrote:
  On Wed, May 13, 2009 at 4:07 PM, Brett Wilson bre...@chromium.org
 wrote:
 
  On Wed, May 13, 2009 at 3:51 PM, Amanda Walker ama...@chromium.org
  wrote:
  
   Perhaps what we need is a companion to FilePath.  For example:
  
   FilePath: much as it is now, lightweight, alternative to string
   manipulation.
   FileReference: heavierweight, can talk to the file system and have
   carnal knowledge of platform specifics for things like resolving /
   canonicalizing pathnames, determining whether or not they refer to
 the
   same files, generating C strings that can be passed to 3rd party
   libraries, etc.
 
  I think this is very dangerous.
 
  I think Greg should not be talking to the filesystem when inserting
  filenames into a set. We don't allow filesystem access from the UI
  thread of Chrome, and I think other parts of our system should also
  not do filesystem access on their critical threads, especially if they
  want to be more part of Chrome in the future.
 
  Well, so the use I have for this in O3D at the moment is in our
 importer,
  which currently is a separate command-line tool that reads Collada
 files and
  writes out our wire format for geometry.  So it isn't meant to be
 occuring
  in a UI thread, but I could see times when it might be useful to know
 for
  sure if two files reference the same file in the UI thread (dragging
 and
  dropping a file onto a drop zone, for instance).
  I do need to know if I have the same file more than once in a set
 because
  the COLLADA file might reference the same texture multiple times, or
 (more
  dangerous) it might reference a file that is one file on Windows,
  but (incorrectly) maps to two different files in the (Unix-path-format)
 .tgz
  files.  To detect that, I need canonicalization.
 
  You can't actually canonicalize a filename on Windows, so I think it's
  dangerous to write a component that claims to do it.

 I guess you could call GetShortPathName every time you see a name. But
 I think that's a crazy solution. I still think you should do my
 suggestion below.


  I think you just need to come up with some simple rules that makes it
  work most of the time. Personally I would do ASCII lowercasing and
  stop worrying about it. If you use ICU to lower-case correctly,
  Windows won't necessarily agree and you won't be able to use that
  file.

 



--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-29 Thread Mark Mentovai

Greg Spencer wrote:
 So there's currently no right way to do the conversion, but I still think
 that the FilePath constructor is probably in the best position to inspect
 LC_ALL, etc. and do as close to the right thing as possible.  I doubt most
 Linux developers even think about this, and so the chances that they will
 implement anything other than assuming that it's ASCII are slim -- this
 would allow us to at least implement a baseline for them.

Not doing the conversion is kinda the point.  Well, it's exactly the point.

(Hi, I'm the author of FilePath.)

If you've got an arbitrary path, it might be encoded in some scheme,
and it might not, and it might contain a mix of encodings.  The point
of FilePath is we know it's a path and we don't necessarily know
anything else.  Chromium didn't used to have FilePath.  Everything
was a wstring which implied UTF-16/32, and the conversions implied
UTF-8 because we couldn't do anything smarter, and there was all sorts
of potential for messing things up.  Not a pretty story.  When
FilePath was born, the *Hack methods showed up to give us a way to
transition the old-style wstring APIs to new-style FilePath APIs at
reasonable cut points, instead of having to do everything all at once.

I understand your problem.  You're saying I have user-supplied data
that I want to build a filename from, and I have this pathname that
I want to display back to the user.  I agree that it would be good to
have a way to handle these cases in base.  I don't know if FilePath
proper is the right place to do it.  If we do it in FilePath, it still
won't really be right.  If we had something, it should probably be
made to operate only on single pathname components, and it should be
the caller's responsibility to only deal with user-created names with
this interface.

 2) I'd like to make it possible to instantiate a POSIX FilePath object on
 Windows and a Windows FilePath on POSIX platforms.  This is because some
 libraries (e.g. the zip library, or tar files), use POSIX semantics for
 their paths even on Windows (I haven't seen a use case for Windows paths on
 POSIX yet, actually).   This would make it possible to use the nice API that
 FilePath has to manipulate paths appropriately for these other libraries.
 This could be easily accomplished by having POSIX and Windows versions of
 FilePath, and then typedef'ing FilePath differently on different platforms
 to one of these versions.

Sounds pretty Pythonic.

FilePath already sort of has some support for this - it does a bunch
of things based on feature macros, mostly so that as I was writing it,
I could test the Windows semantics without having to (shudder) resort
to running on Windows.  These could probably be adapted to do what
you're asking.

 3) It would be helpful to have real path normalization for each of the
 platforms (although I know what a testing nightmare that can be).  I might
 try and tackle this if people think it would be beneficial.

It's also a specification and implementation nightmare.  Everyone has
a different idea of what normalization means.  What's your idea?

 4) Make sure we handle case sensitivity vs case preservation correctly.
 It's unclear to me that FilePath does this correctly on the Mac -- Mac file
 names are case preserving, but case insensitive, Unix filenames are both
 (and windows filenames are neither :-).

Again with the normalization.  What do you want this stuff for?
What's your idea of how this should work?

Remember: FilePath is specified to be light and to never touch the
disk.  If you've got a disk-touching operation, it probably doesn't
belong in FilePath proper.

Mark

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-29 Thread Greg Spencer
On Wed, Apr 29, 2009 at 12:22 PM, Mark Mentovai m...@chromium.org wrote:

 I understand your problem.  You're saying I have user-supplied data
 that I want to build a filename from, and I have this pathname that
 I want to display back to the user.  I agree that it would be good to

have a way to handle these cases in base.  I don't know if FilePath
 proper is the right place to do it.  If we do it in FilePath, it still
 won't really be right.


OK, so it sounds like you're telling me not to use FilePath to represent
file paths from a disk for my purposes because they can't ever be converted
reliably to a particular encoding on Linux (which is a requirement for me,
because of the third party libraries that require a particular encoding).

That's fine, but what do I do instead?  Roll my own FilePath clone that has
some encoding assumptions?  I can do that, but it has the same issues as the
ones you're worried about with FilePath, so it seems better to solve the
issue in one place rather than have two versions that are both insufficient.
 Man, it would be better if FilePath could reliably know its encoding!  (I
realize that Linux makes this impossible, it just seems like it would be
better that way. :-)

Since Linux is the only platform where the encoding is unclear, what if we
did the best we could on Linux:

When constructing a FilePath from a char* string on Linux:
- Test the input string for values  127 to determine if it's really just
ASCII (and if so, we're out of the woods).
- Then check LANG, LC_CTYPE, LC_ALL (through appropriate Linux APIs) for an
encoding that we can support, and note the encoding for later if we are
requested to do a conversion.
- If we run into an invalid sequence during a conversion, or an encoding we
can't convert from, then use a CHECK to crash.

This should work on most filenames, in almost all situations -- I'll bet
most filenames are ASCII, even on foreign systems, and the ones that aren't
ASCII have set LANG to something in /etc/profile, so all filenames created
by any app running on that machine should match that encoding.

Where they don't do that correctly, they're already getting garbage (and
should expect garbage) from any application they use, not just Chrome, since
there is no way *any *app can decode a path with multiple encodings in it,
or where the encoding is different than LANG (or LC_*) says it is.

Chrome already crashes like this when it encounters situations where it's
just impossible to know what's right, so it's consistent with Chrome's
behavior in other areas.


 it should be the caller's responsibility to only deal with user-created
 names with
 this interface.


What do you mean here?  Isn't that the case now with FilePath?  (It's the
file_util routines that actually read the filesystem and make FilePaths out
of them, afterall).  As for your suggestion to only deal with path
components, how would you propose to parse user-supplied paths into one of
these?


  2) I'd like to make it possible to instantiate a POSIX FilePath object on
  Windows and a Windows FilePath on POSIX platforms.  This is because some
  libraries (e.g. the zip library, or tar files), use POSIX semantics for
  their paths even on Windows (I haven't seen a use case for Windows paths
 on
  POSIX yet, actually).   This would make it possible to use the nice API
 that
  FilePath has to manipulate paths appropriately for these other libraries.
  This could be easily accomplished by having POSIX and Windows versions of
  FilePath, and then typedef'ing FilePath differently on different
 platforms
  to one of these versions.

 Sounds pretty Pythonic.

 FilePath already sort of has some support for this - it does a bunch
 of things based on feature macros, mostly so that as I was writing it,
 I could test the Windows semantics without having to (shudder) resort
 to running on Windows.  These could probably be adapted to do what
 you're asking.


Cool.


  3) It would be helpful to have real path normalization for each of the
  platforms (although I know what a testing nightmare that can be).  I
 might
  try and tackle this if people think it would be beneficial.

 It's also a specification and implementation nightmare.  Everyone has
 a different idea of what normalization means.  What's your idea?


Yes, I know it's a nightmare all around, but I think it would be useful to
have something that addresses this.  My idea would be the same as Python's
os.path.normpath, mainly because it's a well-tested, seasoned example with
test cases.  Windows also has a routine for this (PathCanonicalize) that
could be used (but I know it doesn't work for UNC paths).

 4) Make sure we handle case sensitivity vs case preservation correctly.
  It's unclear to me that FilePath does this correctly on the Mac -- Mac
 file
  names are case preserving, but case insensitive, Unix filenames are both
  (and windows filenames are neither :-).

 Again with the normalization.  What do you want this stuff for?
 What's your 

[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Thomas Van Lenten
On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer gspen...@google.com wrote:

 Hi Chromium Developers,

 I'm working on Google's O3D (http://code.google.com/p/o3d), and we
 (naturally) share some of Chrome's base classes for our code, including the
 very useful class FilePath.

 However, in using FilePath in the last few months, I've seen that it needs
 some refinement.  I'd like to augment the FilePath class with some things
 that would make it more generally useful -- it's very nicely set up, but
 it's missing a few things that make it harder to work with than it needs to
 be:

 1) I'd like to add some explicit routines for converting to/from UTF8 and
 UTF16.  While it's nice (and important) that FilePath uses the platform's
 native string, we've found that many third party libraries have made other
 assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
 regardless of platform, and converting a FilePath to and from those forms is
 a platform-dependent exercise which should be centralized into the class
 (i.e. adding ToUTF8 and ToWide functions to the class, and explicit
 constructors that take each type).

 2) I'd like to make it possible to instantiate a POSIX FilePath object on
 Windows and a Windows FilePath on POSIX platforms.  This is because some
 libraries (e.g. the zip library, or tar files), use POSIX semantics for
 their paths even on Windows (I haven't seen a use case for Windows paths on
 POSIX yet, actually).   This would make it possible to use the nice API that
 FilePath has to manipulate paths appropriately for these other libraries.
 This could be easily accomplished by having POSIX and Windows versions of
 FilePath, and then typedef'ing FilePath differently on different platforms
 to one of these versions.

 3) It would be helpful to have real path normalization for each of the
 platforms (although I know what a testing nightmare that can be).  I might
 try and tackle this if people think it would be beneficial.

 4) Make sure we handle case sensitivity vs case preservation correctly.
 It's unclear to me that FilePath does this correctly on the Mac -- Mac file
 names are case preserving, but case insensitive, Unix filenames are both
 (and windows filenames are neither :-).


FYI - it's a drive format time option on the Mac, so they can be case
preserving and case sensitive.

TVL




 So, is there any resistance to any of the above?  Do you have other
 suggestions that I might take into account?  Am I violating any design
 assumptions of FilePath?  For #2, is speed/size enough of a concern to avoid
 a virtual base class (I wouldn't think so, but you never know..)?

 -Greg.

 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Evan Martin

On Tue, Apr 28, 2009 at 1:39 PM, Greg Spencer gspen...@google.com wrote:
 1) I'd like to add some explicit routines for converting to/from UTF8 and
 UTF16.  While it's nice (and important) that FilePath uses the platform's
 native string, we've found that many third party libraries have made other
 assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
 regardless of platform, and converting a FilePath to and from those forms is
 a platform-dependent exercise which should be centralized into the class
 (i.e. adding ToUTF8 and ToWide functions to the class, and explicit
 constructors that take each type).

Can you give some examples of where this is needed?  We've
historically fought against this pretty hard, and as soon as accessors
are available users will get lazy about it.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 1:57 PM, Thomas Van Lenten thoma...@chromium.orgwrote:

 On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer gspen...@google.com wrote:

 4) Make sure we handle case sensitivity vs case preservation correctly.
 It's unclear to me that FilePath does this correctly on the Mac -- Mac file
 names are case preserving, but case insensitive, Unix filenames are both
 (and windows filenames are neither :-).


 FYI - it's a drive format time option on the Mac, so they can be case
 preserving and case sensitive.


Thanks for pointing that out. In fact, NTFS is actually case sensitive,
where FAT32 is not (see http://support.microsoft.com/kb/100625).  So we have
issues there as well.  The real issue would be dealing with relative paths
that don't exist yet -- there would be no way to inspect the file location
to find out what mode it was in.  I think I would just punt and go with the
widely-used defaults (the ones I mentioned above), since most apps seem to
assume those limitations.  An alternative would be to have an API to specify
the desired mode, and default to the common case on each platform.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Peter Kasting
On Tue, Apr 28, 2009 at 1:39 PM, Greg Spencer gspen...@google.com wrote:

 1) I'd like to add some explicit routines for converting to/from UTF8 and
 UTF16.  While it's nice (and important) that FilePath uses the platform's
 native string, we've found that many third party libraries have made other
 assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
 regardless of platform, and converting a FilePath to and from those forms is
 a platform-dependent exercise which should be centralized into the class
 (i.e. adding ToUTF8 and ToWide functions to the class, and explicit
 constructors that take each type).


I'm pretty strongly against this for the same reasons as Evan.  I think
consumers who need to convert should be doing the conversion using their own
routines (e.g. Chrome uses ones in our base/ module).

PK

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Amanda Walker

On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer gspen...@google.com wrote:
 1) I'd like to add some explicit routines for converting to/from UTF8 and
 UTF16.  While it's nice (and important) that FilePath uses the platform's
 native string, we've found that many third party libraries have made other
 assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
 regardless of platform, and converting a FilePath to and from those forms is
 a platform-dependent exercise which should be centralized into the class
 (i.e. adding ToUTF8 and ToWide functions to the class, and explicit
 constructors that take each type).

One thing many of us have found, across multiple projects, is that
wchar_t is fraught with complication as soon as more than one platform
is involved. wchar_t == UTF16 is a Windowsism (gcc defaults to 4
bytes, for example, and Lmumble gets stored in UCS-4, not UTF-16).
Chrome started with more or less what you are suggesting, and we moved
off of it after much pain.

--Amanda

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 2:31 PM, Peter Kasting pkast...@google.com wrote:

 On Tue, Apr 28, 2009 at 1:39 PM, Greg Spencer gspen...@google.com wrote:

 1) I'd like to add some explicit routines for converting to/from UTF8 and
 UTF16.  While it's nice (and important) that FilePath uses the platform's
 native string, we've found that many third party libraries have made other
 assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
 regardless of platform, and converting a FilePath to and from those forms is
 a platform-dependent exercise which should be centralized into the class
 (i.e. adding ToUTF8 and ToWide functions to the class, and explicit
 constructors that take each type).


 I'm pretty strongly against this for the same reasons as Evan.  I think
 consumers who need to convert should be doing the conversion using their own
 routines (e.g. Chrome uses ones in our base/ module).


So, I was unable to find the conversion utilities in base that do the
conversion to/from UTF8.  What are they called?  If I missed them (and I
looked for a while before I gave up), then maybe they need to be more
prominent?

What is the danger here of being lazy?  Is it that developers will
unwittingly do expensive conversions?  If so, I would expect that a member
function called ToUTF8 would be just as much of a performance warning as a
helper function called FilePathToUTF8, but be a heck of a lot more
convenient (since it would not require the developer to create a local
variable for use as a return value from the helper, and can be used as an
argument to another library's functions).  I can see the argument for not
having a casting constructor that isn't from the platform native form, but
in that case, a factory method called CreateFromUTF8 should be a
sufficient warning to the developer that it might be expensive.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Peter Kasting
On Tue, Apr 28, 2009 at 2:48 PM, Greg Spencer gspen...@google.com wrote:

 So, I was unable to find the conversion utilities in base that do the
 conversion to/from UTF8.  What are they called?  If I missed them (and I
 looked for a while before I gave up), then maybe they need to be more
 prominent?


See base/string_util.h, UTF8ToUTF16() etc.

What is the danger here of being lazy?  Is it that developers will
 unwittingly do expensive conversions?


Yes, partly because including dedicated helpers like this makes it sound as
if the class is somehow special-cased or fastpathed to deal better with
these than a generic converter would be.

The other argument is simply that converting utf8 to utf16 is a generic sort
of functionality that belongs in base/ or another similar general-purpose
location, rather than specifically in FilePath.

PK

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Erik Kay
(resend - arg)

On Tue, Apr 28, 2009 at 2:47 PM, Greg Spencer gspen...@google.com wrote:

 On Tue, Apr 28, 2009 at 2:41 PM, Amanda Walker ama...@chromium.orgwrote:


 On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer gspen...@google.com
 wrote:
  1) I'd like to add some explicit routines for converting to/from UTF8
 and
  UTF16.  While it's nice (and important) that FilePath uses the
 platform's
  native string, we've found that many third party libraries have made
 other
  assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t)
 paths
  regardless of platform, and converting a FilePath to and from those
 forms is
  a platform-dependent exercise which should be centralized into the class
  (i.e. adding ToUTF8 and ToWide functions to the class, and explicit
  constructors that take each type).

 One thing many of us have found, across multiple projects, is that
 wchar_t is fraught with complication as soon as more than one platform
 is involved. wchar_t == UTF16 is a Windowsism (gcc defaults to 4
 bytes, for example, and Lmumble gets stored in UCS-4, not UTF-16).
 Chrome started with more or less what you are suggesting, and we moved
 off of it after much pain.


 I understand those issues quite well (but I probably should call the
 conversion method ToUTF16, now that you mention it).  And char* isn't
 necessarily UTF8 on all platforms either.

 OK, so what's the currently recommended path for converting to UTF16 or
 UTF8 from a FilePath?


The biggest problem with this change is that it's not possible to do this
conversion on Linux in a safe way.  In Linux, there is no charset defined by
the filesystem.  Each filename is just a blob of bytes.  Apps are supposed
to respect an environment variable, but since this environment variable
could change over time and be different from user to user, there's no
reliable way to know what the charset is, so you can't convert from a
FilePath on Linux to UTF8 or UTF16 unless you were the one who created the
path to begin with.

Erik

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 3:19 PM, Greg Spencer gspen...@google.com wrote:

 On Tue, Apr 28, 2009 at 3:11 PM, Erik Kay erik...@google.com wrote:

 The biggest problem with this change is that it's not possible to do this
 conversion on Linux in a safe way.


And besides -- this problem isn't introduced by this change: it exists
already because currently there's no safe way to convert, regardless of the
API (since a consumer of a FilePath doesn't know what encoding it contains).

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Erik Kay
On Tue, Apr 28, 2009 at 3:19 PM, Greg Spencer gspen...@google.com wrote:

 On Tue, Apr 28, 2009 at 3:11 PM, Erik Kay erik...@google.com wrote:

 The biggest problem with this change is that it's not possible to do this
 conversion on Linux in a safe way.  In Linux, there is no charset defined by
 the filesystem.  Each filename is just a blob of bytes.  Apps are supposed
 to respect an environment variable, but since this environment variable
 could change over time and be different from user to user, there's no
 reliable way to know what the charset is, so you can't convert from a
 FilePath on Linux to UTF8 or UTF16 unless you were the one who created the
 path to begin with.


 But that's exactly the point.  FilePath is the class that created the path
 to begin with.  So it can know what the LC_*/LANG variables were was when it
 was created, and do the right conversion when you ask the FilePath to
 convert to UTF16.  Also, if the developer calls something called
 FilePath::CreateFromUTF8, then it can know it was supposed to be UTF8 and
 remember that.



If you created it yourself, that's fine.  FilePaths aren't always created
manually by users.  They often are populated from system APIs where you
can't know.  See file_util* for some examples.  So the problem is that if
you add this API, people will mistakenly use the conversion functions when
they can't be safe.  I agree it sucks.  I just don't know of a reasonable
solution.

Erik

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 3:26 PM, Erik Kay erik...@chromium.org wrote:

 On Tue, Apr 28, 2009 at 3:19 PM, Greg Spencer gspen...@google.com wrote:

 But that's exactly the point.  FilePath is the class that created the path
 to begin with.  So it can know what the LC_*/LANG variables were was when it
 was created, and do the right conversion when you ask the FilePath to
 convert to UTF16.  Also, if the developer calls something called
 FilePath::CreateFromUTF8, then it can know it was supposed to be UTF8 and
 remember that.


 If you created it yourself, that's fine.  FilePaths aren't always created
 manually by users.  They often are populated from system APIs where you
 can't know.  See file_util* for some examples.  So the problem is that if
 you add this API, people will mistakenly use the conversion functions when
 they can't be safe.  I agree it sucks.  I just don't know of a reasonable
 solution.


So there's currently no right way to do the conversion, but I still think
that the FilePath constructor is probably in the best position to inspect
LC_ALL, etc. and do as close to the right thing as possible.  I doubt most
Linux developers even think about this, and so the chances that they will
implement anything other than assuming that it's ASCII are slim -- this
would allow us to at least implement a baseline for them.  Or would that
just screw things up worse?

Doesn't this mean that it's possible that the path manipulation routines
fail for sufficiently odd encodings? (jis or something where an encoded char
might include a /?)

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---