Re: [Python-Dev] os.path.normcase rationale?

2010-10-08 Thread Chris Withers

On 05/10/2010 12:04, Steven D'Aprano wrote:

On Tue, 5 Oct 2010 07:21:15 pm Chris Withers wrote:

On 25/09/2010 04:25, Steven D'Aprano wrote:

1. Return the case of a filename in some canonical form which
depends on the file system?
2. Return the case of a filename as it is actually stored on disk?


How do 1 and 2 differ?


Case #1 imposes a particular canonical form, regardless of what is
actually stored on disk. It is similar to normpath, except that we
could have different canonical forms depending on what the file system
was. normpath merely generalises from the operating system, and never
looks at the file system.


Ah, okay, yeah, that's actually an anti-goal for me ;-)


Case #2 says to actually look at the file and see what the file system
considers it's name to be. Consider a NTFS file system. By default it
is case-preserving and case-insensitive, although that can be changed.
(Just because a file system is NTFS doesn't mean that will be
case-insensitive. NTFS can also run in a POSIX mode which is
case-sensitive. But I digress.)


Yeah, this is definitely where I think the missing use case lies...


FWIW, the use case that setuptools has (and
for which it currently incorrectly uses normpath) is number 2.


4. Return the case of a filename in some arbitrarily-chosen
canonical form which does not depend on the file system?


This is what normpath does, but only if you're on Windows ;-)


Not quite. macpath.normcase() also lowercases the path. So does the
module for OS/2.


Interesting, since I develop on MacOS, Linux and Windows and only 
experienced the problem caused by setuptools normcase'ing distribution 
names on Windows. The MacOS case also isn't in the docs.



In any case, Windows is not a file system. It is quite possible to have
virtually any combination of case-destroying, case-preserving,
-sensitive and -insensitive file systems on the one Windows system. Say,
a FAT12 floppy, an NTFS partition, and an ext2 USB stick. Windows
doesn't ship with native support for ext2, but that doesn't mean it
can't be installed with third party drivers.


yes, exactly!


normpath pays no attention to any of this, and just lowercases the path.
At least that's cheap, and consistent, even if it solves the wrong
problem :)


...and creates a few more along the way ;-)

Chris

--
Simplistix - Content Management, Batch Processing  Python Consulting
- http://www.simplistix.co.uk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-10-08 Thread Michael Foord

 On 08/10/2010 09:41, Chris Withers wrote:

On 05/10/2010 12:04, Steven D'Aprano wrote:

On Tue, 5 Oct 2010 07:21:15 pm Chris Withers wrote:

On 25/09/2010 04:25, Steven D'Aprano wrote:
[snip...]
FWIW, the use case that setuptools has (and
for which it currently incorrectly uses normpath) is number 2.


4. Return the case of a filename in some arbitrarily-chosen
canonical form which does not depend on the file system?


This is what normpath does, but only if you're on Windows ;-)


Not quite. macpath.normcase() also lowercases the path. So does the
module for OS/2.


Interesting, since I develop on MacOS, Linux and Windows and only 
experienced the problem caused by setuptools normcase'ing distribution 
names on Windows. The MacOS case also isn't in the docs.




Unless you're using Mac OS 9 you will be using posixpath and not macpath 
though. :-)


Michael

--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-10-08 Thread Ronald Oussoren
On 08 Oct, 2010,at 11:38 AM, Michael Foord fuzzy...@voidspace.org.uk wrote:  On 08/10/2010 09:41, Chris Withers wrote:
 4. Return the case of a filename in some arbitrarily-chosen
 canonical form which does not depend on the file system?AFAIK this is what the function is supposed to do: return a platform-dependent canonical form of the filename. And that is hopelessly naive on modern systems, on both linux and OSX some file systems are case insensitive and others are not. The default for Linux is case sensitive, but some filesystems are not (VFAT, CIFS), and the default on OSX is case insensitive, but some filesystems are case sensitive (NFS, case sensitive HFS+)Ronald
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-10-05 Thread Chris Withers

On 25/09/2010 04:25, Steven D'Aprano wrote:

1. Return the case of a filename in some canonical form which depends
on the file system?
2. Return the case of a filename as it is actually stored on disk?


How do 1 and 2 differ? FWIW, the use case that setuptools has (and for 
which it currently incorrectly uses normpath) is number 2.



4. Return the case of a filename in some arbitrarily-chosen canonical
form which does not depend on the file system?


This is what normpath does, but only if you're on Windows ;-)
I still don't really get the use case of normpath in its current form, 
at all...



Various people have posted links to recipes that solve case #2. Note
though that this necessarily demands that if the file doesn't exist, it
should raise an exception.


Fine by me, shame it seems to require iteration to find an answer though :-S


The very concept of canonical form for file names is troublesome.


I would have thought whatever is shown when doing an ls/dir/etc
(and don't be smart and think about mentioning that oyu can get dir to 
output 8.3 as well as the full path ;-) )


Chris

--
Simplistix - Content Management, Batch Processing  Python Consulting
- http://www.simplistix.co.uk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-10-05 Thread Chris Withers

On 25/09/2010 15:45, Guido van Rossum wrote:

The solution may well be OS specific. Solutions for Windows and OS X
have already been pointed out. If it can't be done for other Unix
versions, I think returning the input unchanged on those platform is a
fine fallback (as it is for non-existent filenames).


Spot on, especially as the default of case perserving and case 
sensitive will likely cover anything where an FS-specific solution can't 
be found - I'd hazard a guess that the reason the FS-sepcific solution 
can't be found in some cases is because for case preserving and case 
sensitive situations, there really is no need for such an api ;-)


Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-10-05 Thread Steven D'Aprano
On Tue, 5 Oct 2010 07:21:15 pm Chris Withers wrote:
 On 25/09/2010 04:25, Steven D'Aprano wrote:
  1. Return the case of a filename in some canonical form which
  depends on the file system?
  2. Return the case of a filename as it is actually stored on disk?

 How do 1 and 2 differ?

Case #1 imposes a particular canonical form, regardless of what is 
actually stored on disk. It is similar to normpath, except that we 
could have different canonical forms depending on what the file system 
was. normpath merely generalises from the operating system, and never 
looks at the file system.

Some file systems are case-preserving, and don't have a canonical form. 
We might choose to arbitrarily impose one, as normcase already does. 
Some are case-folding, in which case it might be sensible to choose the 
same canonical form as the file system actually uses. However, this may 
be implementation dependent e.g. under FAT12 or FAT16, the file system 
will take a file name like pArRoT.tXt and fold it to PARROT.TXT, or 
possibly parrot.txt, or Parrot.txt. Even if that's not the case for 
FAT12, it may be the case for other case-folding file systems. And the 
behaviour of FAT16 will differ according to whether or not it has been 
built with support for long file names.


Case #2 says to actually look at the file and see what the file system 
considers it's name to be. Consider a NTFS file system. By default it 
is case-preserving and case-insensitive, although that can be changed. 
(Just because a file system is NTFS doesn't mean that will be 
case-insensitive. NTFS can also run in a POSIX mode which is 
case-sensitive. But I digress.)

For simplicity, suppose you're on Windows using NTFS with the standard 
non-POSIX behaviour. You create a file named pArRoT.tXt. This will be 
stored on disk using the exact characters that you typed. The file 
system does no case-folding and merely uses whatever characters are fed 
to it, which in the case of Windows apps is likely to be whatever 
characters the user types. In this case, we don't try to impose a 
particular case on file names, but return whatever actually exists on 
disk.


 FWIW, the use case that setuptools has (and 
 for which it currently incorrectly uses normpath) is number 2.

  4. Return the case of a filename in some arbitrarily-chosen
  canonical form which does not depend on the file system?

 This is what normpath does, but only if you're on Windows ;-)

Not quite. macpath.normcase() also lowercases the path. So does the 
module for OS/2.

In any case, Windows is not a file system. It is quite possible to have 
virtually any combination of case-destroying, case-preserving, 
-sensitive and -insensitive file systems on the one Windows system. Say, 
a FAT12 floppy, an NTFS partition, and an ext2 USB stick. Windows 
doesn't ship with native support for ext2, but that doesn't mean it 
can't be installed with third party drivers.

normpath pays no attention to any of this, and just lowercases the path. 
At least that's cheap, and consistent, even if it solves the wrong 
problem :)



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-10-03 Thread Dan Villiom Podlaski Christiansen

On 3 Oct 2010, at 02:35, Nir Soffer wrote:


On Sat, Sep 25, 2010 at 1:36 AM, James Y Knight f...@fuhm.net wrote:


An OSX code sketch is available here (summary: call FSPathMakeRef  
to get an
FSRef from a path string, then FSRefMakePath to make it back into a  
path,
which will then have the correct case). And note that it only works  
if the

file actually exists.


http://stackoverflow.com/questions/370186/how-do-i-find-the-correct-case-of-a-filename

It would indeed be useful to have that be available in Python.



There is a much simpler way:


from Carbon import File
File.FSRef('/tmp/foo').as_pathname()

'/private/tmp/Foo'

Note that this is much slower compared to os.path.exists.


This won't work in py3k; the Carbon modules were removed in 3.0. A  
simpler alternative would probably be the F_GETPATH fcntl. An example:


Python 3.1.2 (r312:79147, Jul 11 2010, 18:21:56)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type help, copyright, credits or license for more information.
 from fcntl import fcntl
 from os.path import basename, exists
 from os import remove

 F_GETPATH = 50

 if exists('/tmp/å'):
...   remove('/tmp/å')
...
 open('/tmp/å', 'w').close()
 f = open(b'/tmp/A\xcc\x8a')

 a = f.name
 b = fcntl(f, F_GETPATH, b'\0' * 1024).rstrip(b'\0')

 a, b
(b'/tmp/A\xcc\x8a', b'/private/tmp/\xc3\xa5')
 a.decode('utf-8'), b.decode('utf-8')
('/tmp/Å', '/private/tmp/å')

--

Dan Villiom Podlaski Christiansen
dan...@gmail.com



smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-10-03 Thread James Y Knight
On Oct 3, 2010, at 9:18 AM, Dan Villiom Podlaski Christiansen wrote:
 A simpler alternative would probably be the F_GETPATH fcntl. An example:

That requires that you have permission to open the file (and to actually do so 
which might have other effects), while the File Manager's FSRef method does not.

If Python adds a cross-platform function to do this canonicalization, users 
don't have to worry about how easy it is to invoke in pure-python...

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-10-02 Thread Nir Soffer
On Sat, Sep 25, 2010 at 1:36 AM, James Y Knight f...@fuhm.net wrote:

 An OSX code sketch is available here (summary: call FSPathMakeRef to get an
 FSRef from a path string, then FSRefMakePath to make it back into a path,
 which will then have the correct case). And note that it only works if the
 file actually exists.


 http://stackoverflow.com/questions/370186/how-do-i-find-the-correct-case-of-a-filename

 It would indeed be useful to have that be available in Python.


There is a much simpler way:

 from Carbon import File
 File.FSRef('/tmp/foo').as_pathname()
'/private/tmp/Foo'

Note that this is much slower compared to os.path.exists.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-26 Thread Paul Moore
On 25 September 2010 23:57, Greg Ewing greg.ew...@canterbury.ac.nz wrote:
 Paul Moore wrote:

 Windows has (I believe) user definable filesystems, too, but the OS
 has get me the real filename style calls,

 Does it really, though? The suggestions I've seen for doing
 this involve abusing the short/long filename translation
 machinery, and I'm not sure they're guaranteed to return the
 actual case rather than something that happens to work.

There's another call available. I've been too lazy to go and look it
up, but I'll do so sometime today.
Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-26 Thread Paul Moore
On 26 September 2010 09:01, Paul Moore p.f.mo...@gmail.com wrote:
 On 25 September 2010 23:57, Greg Ewing greg.ew...@canterbury.ac.nz wrote:
 Paul Moore wrote:

 Windows has (I believe) user definable filesystems, too, but the OS
 has get me the real filename style calls,

 Does it really, though? The suggestions I've seen for doing
 this involve abusing the short/long filename translation
 machinery, and I'm not sure they're guaranteed to return the
 actual case rather than something that happens to work.

 There's another call available. I've been too lazy to go and look it
 up, but I'll do so sometime today.

Hmm, I can't find the one I was thinking of. GetLongFileName correctly
sets the case of all but the final part, and FindFile can be used to
find the last part, but that's not what I recall.

GetFinalPathNameByHandle works, and is documented to do so, but (a) it
works on an open file handle, so you need to open the file, and (b)
it's Vista and later only...

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-26 Thread Dirkjan Ochtman
On Sun, Sep 26, 2010 at 13:36, Paul Moore p.f.mo...@gmail.com wrote:
 Hmm, I can't find the one I was thinking of. GetLongFileName correctly
 sets the case of all but the final part, and FindFile can be used to
 find the last part, but that's not what I recall.

 GetFinalPathNameByHandle works, and is documented to do so, but (a) it
 works on an open file handle, so you need to open the file, and (b)
 it's Vista and later only...

FWIW, here's what Mercurial uses to get the real path name on Windows:

http://hg.intevation.org/mercurial/crew/file/66a07fb76ceb/mercurial/util.py#l633

(I don't know much about that code or this topic, but maybe someone
finds it useful. It doesn't use any special Windows API, so if there
is any, it's something the hg hackers don't know about.)

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-26 Thread James Y Knight

On Sep 26, 2010, at 7:36 AM, Paul Moore wrote:

 On 26 September 2010 09:01, Paul Moore p.f.mo...@gmail.com wrote:
 On 25 September 2010 23:57, Greg Ewing greg.ew...@canterbury.ac.nz wrote:
 Paul Moore wrote:
 
 Windows has (I believe) user definable filesystems, too, but the OS
 has get me the real filename style calls,
 
 Does it really, though? The suggestions I've seen for doing
 this involve abusing the short/long filename translation
 machinery, and I'm not sure they're guaranteed to return the
 actual case rather than something that happens to work.
 
 There's another call available. I've been too lazy to go and look it
 up, but I'll do so sometime today.
 
 Hmm, I can't find the one I was thinking of. GetLongFileName correctly
 sets the case of all but the final part, and FindFile can be used to
 find the last part, but that's not what I recall.
 
 GetFinalPathNameByHandle works, and is documented to do so, but (a) it
 works on an open file handle, so you need to open the file, and (b)
 it's Vista and later only...

Were you thinking of SHGetFileInfo?

http://stackoverflow.com/questions/74451/getting-actual-file-name-with-proper-casing-on-windows

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-26 Thread Paul Moore
On 26 September 2010 13:37, James Y Knight f...@fuhm.net wrote:

 Were you thinking of SHGetFileInfo?

 http://stackoverflow.com/questions/74451/getting-actual-file-name-with-proper-casing-on-windows

It wasn't, but it looks possible. Only gives the last component,
though, so you still have to walk up the path components :-(

I suspect I was thinking of GetLongFileName, which puts everything
*but* the last component into the right case. I missed the problem
with the last component :-(

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-26 Thread Brian Curtin
On Sun, Sep 26, 2010 at 06:36, Paul Moore p.f.mo...@gmail.com wrote:

 On 26 September 2010 09:01, Paul Moore p.f.mo...@gmail.com wrote:
  On 25 September 2010 23:57, Greg Ewing greg.ew...@canterbury.ac.nz
 wrote:
  Paul Moore wrote:
 
  Windows has (I believe) user definable filesystems, too, but the OS
  has get me the real filename style calls,
 
  Does it really, though? The suggestions I've seen for doing
  this involve abusing the short/long filename translation
  machinery, and I'm not sure they're guaranteed to return the
  actual case rather than something that happens to work.
 
  There's another call available. I've been too lazy to go and look it
  up, but I'll do so sometime today.

 GetFinalPathNameByHandle works, and is documented to do so, but (a) it
 works on an open file handle, so you need to open the file, and (b)
 it's Vista and later only...


FYI, this is currently exposed as nt._getfinalpathname, and is used for
os.path.samefile on Vista and beyond.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-25 Thread Paul Moore
On 24 September 2010 23:43, Glenn Linderman v+pyt...@g.nevcal.com wrote:
 Hmm.  There is no need for the function on a case sensitive file system,
 because the name had better be spelled with matching case: that is, if it is
 spelled with non-matching case it is an attempt to reference a non-existent
 file (or at least a different file).

On Linux, I don't believe there's a way to ask is this filesystem
case insensitive?

In fact, with userfs, I believe it's possible to do massively
pathological things like having a filesystem which treats anagrams as
the same file (foo is the same file as oof or ofo). (More
realistically, MacOS does Unicode normalisation).

Windows has (I believe) user definable filesystems, too, but the OS
has get me the real filename style calls, which the filesystem
should support, so no matter how nasty a filesystem implementer gets,
he has to deal with his own mess :-)

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-25 Thread Stephen J. Turnbull
Paul Moore writes:

  In fact, with userfs, I believe it's possible to do massively
  pathological things like having a filesystem which treats anagrams
  as the same file (foo is the same file as oof or ofo). (More
  realistically, MacOS does Unicode normalisation).

Nitpick: Mac OS X doesn't do Unicode normalization.  The default
filesystem implementation does.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-25 Thread Guido van Rossum
On Fri, Sep 24, 2010 at 8:25 PM, Steven D'Aprano st...@pearwood.info wrote:
 On Sat, 25 Sep 2010 09:22:47 am Guido van Rossum wrote:

 I think that, like os.path.realpath(), it should not fail if the file
 does not exist.

 Maybe the API could be called os.path.unnormpath(), since it is in a
 sense the opposite of normpath() (which removes case) ? But I would
 want to write it so that even on Unix it scans the filesystem, in
 case the filesystem is case-preserving (like the default fs on OS X).

 It is not entirely clear to me what this function is meant to actually
 do? Should it:

 1. Return the case of a filename in some canonical form which depends
   on the file system?
 2. Return the case of a filename as it is actually stored on disk?

This one. This is actually useful (on case-preserving filesystems).
There is no doubt in my mind that this is the requested and needed
functionality.

 3. Something else?

 and just for completeness:

 4. Return the case of a filename in some arbitrarily-chosen canonical
   form which does not depend on the file system?

 These are not the same, either conceptually or in practice.

 If you want #4, you already have it in os.path.normcase.

 I think that the OP, Chris, wants #1, but it isn't entirely clear to me.

I don't think this is where the issue lies.

 It's possible that he wants #2.

 Various people have posted links to recipes that solve case #2. Note
 though that this necessarily demands that if the file doesn't exist, it
 should raise an exception.

No it needn't; realpath() uses the filesystem but leaves non-existing
parts alone. Also some of the path may exist (e.g. a parent
directory).

 In the case of #1, if the file system doesn't exist, we can't predict
 what the canonical form should be.

 The very concept of canonical form for file names is troublesome. If the
 file system is case-preserving, the file system doesn't define a
 canonical form: the case of the file name will depend on how the file
 is initially named. If the file system is case-destructive the
 behaviour will depend on the file system itself: e.g. FAT12 and ISO
 9660 both uppercase file names, but other file systems may make other
 choices. For some arbitrary path, where we don't know what file system
 it is, or if the path doesn't actually exist, we have no way of telling
 what the file system's canonical form will be, or even whether it will
 have one.

 Note that I've been talking about case preservation, not case
 sensitivity. That's because case preservation is orthogonal to
 sensitivity. You can see three of the four combinations, e.g.:

 Preserving + insensitive:  fat32, NTFS under Win32, normally HFS+
 Preserving + sensitive:  ext3, NTFS under POSIX, optionally HFS+
 Destructive + insensitive:  fat12, fat16 without long file name support

 To the best of my knowledge, destructive + sensitive doesn't exist. It
 could, in principle, but it would be silly to do so.

 Note that just knowing the file system type is not enough to tell what
 its behaviour will be. Given an arbitrary file system, there's no
 obvious way to determine what it will do to file names short of trying
 to create a file and see what happens.

This operation should not do any writes.

The solution may well be OS specific. Solutions for Windows and OS X
have already been pointed out. If it can't be done for other Unix
versions, I think returning the input unchanged on those platform is a
fine fallback (as it is for non-existent filenames).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-25 Thread Greg Ewing

Paul Moore wrote:


Windows has (I believe) user definable filesystems, too, but the OS
has get me the real filename style calls,


Does it really, though? The suggestions I've seen for doing
this involve abusing the short/long filename translation
machinery, and I'm not sure they're guaranteed to return the
actual case rather than something that happens to work.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Chris Withers

On 18/09/2010 23:36, Guido van Rossum wrote:

course, exists() and isdir() etc. do, and so does realpath(), but the
pure parsing functions don't.


Yes, but:

H:\echo foo  TeSt.txt
... import os.path
 os.path.realpath('test.txt')
'H:\\test.txt'
 os.path.normcase('TeSt.txt')
'test.txt'

Both feel unsatisfying to me :-S

How can I get 'TeSt.txt' from 'test.txt' (which feels like the contract 
normcase *should* have...)



They can be used without a working
filesystem even. (E.g. you can import ntpath on a Unix box and happily
parse Windows paths.)


But what value does that add over just doing a .lower() on the path?

Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Ned Batchelder

 On 9/24/2010 6:13 AM, Chris Withers wrote:

On 18/09/2010 23:36, Guido van Rossum wrote:

course, exists() and isdir() etc. do, and so does realpath(), but the
pure parsing functions don't.


Yes, but:

H:\echo foo  TeSt.txt
... import os.path
 os.path.realpath('test.txt')
'H:\\test.txt'
 os.path.normcase('TeSt.txt')
'test.txt'

Both feel unsatisfying to me :-S

How can I get 'TeSt.txt' from 'test.txt' (which feels like the 
contract normcase *should* have...)



http://stackoverflow.com/questions/3692261/in-python-how-can-i-get-the-correctly-cased-path-for-a-file

They can be used without a working
filesystem even. (E.g. you can import ntpath on a Unix box and happily
parse Windows paths.)


But what value does that add over just doing a .lower() on the path?

Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread R. David Murray
On Fri, 24 Sep 2010 11:13:46 +0100, Chris Withers ch...@simplistix.co.uk 
wrote:
 On 18/09/2010 23:36, Guido van Rossum wrote:
  course, exists() and isdir() etc. do, and so does realpath(), but the
  pure parsing functions don't.
 
 Yes, but:
 
 H:\echo foo  TeSt.txt
 ... import os.path
   os.path.realpath('test.txt')
 'H:\\test.txt'
   os.path.normcase('TeSt.txt')
 'test.txt'
 
 Both feel unsatisfying to me :-S
 
 How can I get 'TeSt.txt' from 'test.txt' (which feels like the contract 
 normcase *should* have...)

You can't, and you shouldn't be able to.  normalization is something
that happens without reference to existing objects, the whole point
is to put the thing into standard form so that you can compare
strings obtained from different sources and know that they will
represent the same object on that filesystem.

  They can be used without a working
  filesystem even. (E.g. you can import ntpath on a Unix box and happily
  parse Windows paths.)
 
 But what value does that add over just doing a .lower() on the path?

It does what is appropriate for thatoh, yeah.  For that OS, not
for that filesystem.  (e.g. on Unix normcase does nothing since files
with different cases but the same letters are different files.) 

Being os specific rather than file system type specific is the usability bug.
But to fix it we'll need to introduce a 'filesystems' module enumerating
the different file systems we support, with tools for figuring out
what filesystem your program is talking to.  But normacase still,
wouldn't (shouldn't) do what you want.

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Guido van Rossum
On Fri, Sep 24, 2010 at 5:17 AM, R. David Murray rdmur...@bitdance.com wrote:
 On Fri, 24 Sep 2010 11:13:46 +0100, Chris Withers ch...@simplistix.co.uk 
 wrote:
 On 18/09/2010 23:36, Guido van Rossum wrote:
  course, exists() and isdir() etc. do, and so does realpath(), but the
  pure parsing functions don't.

 Yes, but:

 H:\echo foo  TeSt.txt
 ... import os.path
   os.path.realpath('test.txt')
 'H:\\test.txt'
   os.path.normcase('TeSt.txt')
 'test.txt'

 Both feel unsatisfying to me :-S

 How can I get 'TeSt.txt' from 'test.txt' (which feels like the contract
 normcase *should* have...)

 You can't, and you shouldn't be able to.  normalization is something
 that happens without reference to existing objects, the whole point
 is to put the thing into standard form so that you can compare
 strings obtained from different sources and know that they will
 represent the same object on that filesystem.

Clearly there is another use case where people want to display the
filename back to the user with the correct case. This is a reasonable
request and I think it makes sense for us to add another API to
os.path that does this by looking up the path on the filesystem, or
making an OS-specific call.

  They can be used without a working
  filesystem even. (E.g. you can import ntpath on a Unix box and happily
  parse Windows paths.)

 But what value does that add over just doing a .lower() on the path?

 It does what is appropriate for thatoh, yeah.  For that OS, not
 for that filesystem.  (e.g. on Unix normcase does nothing since files
 with different cases but the same letters are different files.)

Yeah, which is wrong on Mac OS X -- that's Unix but the default
filesystem is case-preserving (though apparently it's possible to
mount case-sensitive filesystems too). I've heard that on Windows
there are also case-sensitive filesystems (part of a POSIX compliance
package?). And on Linux you can mount FAT32 filesystems which are
case-preserving.

 Being os specific rather than file system type specific is the usability bug.

Agreed.

 But to fix it we'll need to introduce a 'filesystems' module enumerating
 the different file systems we support, with tools for figuring out
 what filesystem your program is talking to.  But normacase still,
 wouldn't (shouldn't) do what you want.

I don't think we should try to reimplement what the filesystem does. I
think we should just ask the filesystem (how exactly I haven't figured
out yet but I expect it will be more OS-specific than
filesystem-specific). It will have to be a new API -- normcase() at
least is *intended* to return a case-flattened name on OSes where
case-preserving filesystems are the default, and changing it to look
at the filesystem would break too much code. For a new use case we
need a new API.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Paul Moore
On 24 September 2010 15:29, Guido van Rossum gu...@python.org wrote:
 I don't think we should try to reimplement what the filesystem does. I
 think we should just ask the filesystem (how exactly I haven't figured
 out yet but I expect it will be more OS-specific than
 filesystem-specific). It will have to be a new API -- normcase() at
 least is *intended* to return a case-flattened name on OSes where
 case-preserving filesystems are the default, and changing it to look
 at the filesystem would break too much code. For a new use case we
 need a new API.

I dug into this once, and as far as I could tell, it's possible to get
the information on Windows, but there's no way on Linux to ask the
filesystem. From my researches, the standard interfaces a filesystem
has to implement on Linux don't offer any means of asking this
question.

Of course, (a) I'm no Linux expert so what do I know, and (b) it may
well be possible to come up with a good enough solution by ignoring
pathologically annoying theoretical cases.

I'm happy to provide Windows code if someone needs it.
Paul

PS There were some places I'd have been glad of this feature (and from
what I recall, Mercurial could have used it too) so I'm +1 on the
idea.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Greg Ewing

Paul Moore wrote:


I dug into this once, and as far as I could tell, it's possible to get
the information on Windows, but there's no way on Linux to ask the
filesystem.


Maybe we could use a heuristic such as:

1) Search the directory for an exact match to the name given,
return it if found.

2) Look for a match ignoring case. If one is found, test it to
see if it refers to the same file as the given path, and if so
return it.

3) Otherwise, raise an exception.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Glenn Linderman

 On 9/24/2010 3:10 PM, Greg Ewing wrote:

Paul Moore wrote:


I dug into this once, and as far as I could tell, it's possible to get
the information on Windows, but there's no way on Linux to ask the
filesystem.


Maybe we could use a heuristic such as:

1) Search the directory for an exact match to the name given,
return it if found.

2) Look for a match ignoring case. If one is found, test it to
see if it refers to the same file as the given path, and if so
return it.

3) Otherwise, raise an exception.



Hmm.  There is no need for the function on a case sensitive file system, 
because the name had better be spelled with matching case: that is, if 
it is spelled with non-matching case it is an attempt to reference a 
non-existent file (or at least a different file).


So the API could do the right thing for case preserving or case 
ignoring file systems, but for case sensitive file systems, at most an 
existence check would be warranted.


In other words, the API, should it be created, should be What is the 
actual name of the file that matches this if it exists in the 
filesystem, so the first check is to see if it exists in the file 
system (this may raise an exception if it doesn't exist), and then if it 
does, then on those filesystems for which it might be different, obtain 
the different name.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Guido van Rossum
I think that, like os.path.realpath(), it should not fail if the file
does not exist.

Maybe the API could be called os.path.unnormpath(), since it is in a
sense the opposite of normpath() (which removes case) ? But I would
want to write it so that even on Unix it scans the filesystem, in case
the filesystem is case-preserving (like the default fs on OS X).

--Guido

On Fri, Sep 24, 2010 at 3:43 PM, Glenn Linderman v+pyt...@g.nevcal.com wrote:
  On 9/24/2010 3:10 PM, Greg Ewing wrote:

 Paul Moore wrote:

 I dug into this once, and as far as I could tell, it's possible to get
 the information on Windows, but there's no way on Linux to ask the
 filesystem.

 Maybe we could use a heuristic such as:

 1) Search the directory for an exact match to the name given,
 return it if found.

 2) Look for a match ignoring case. If one is found, test it to
 see if it refers to the same file as the given path, and if so
 return it.

 3) Otherwise, raise an exception.


 Hmm.  There is no need for the function on a case sensitive file system,
 because the name had better be spelled with matching case: that is, if it is
 spelled with non-matching case it is an attempt to reference a non-existent
 file (or at least a different file).

 So the API could do the right thing for case preserving or case ignoring
 file systems, but for case sensitive file systems, at most an existence
 check would be warranted.

 In other words, the API, should it be created, should be What is the actual
 name of the file that matches this if it exists in the filesystem, so the
 first check is to see if it exists in the file system (this may raise an
 exception if it doesn't exist), and then if it does, then on those
 filesystems for which it might be different, obtain the different name.
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread James Y Knight
On Sep 24, 2010, at 10:53 AM, Paul Moore wrote:
 On 24 September 2010 15:29, Guido van Rossum gu...@python.org wrote:
 I don't think we should try to reimplement what the filesystem does. I
 think we should just ask the filesystem (how exactly I haven't figured
 out yet but I expect it will be more OS-specific than
 filesystem-specific). It will have to be a new API -- normcase() at
 least is *intended* to return a case-flattened name on OSes where
 case-preserving filesystems are the default, and changing it to look
 at the filesystem would break too much code. For a new use case we
 need a new API.
 
 I dug into this once, and as far as I could tell, it's possible to get
 the information on Windows, but there's no way on Linux to ask the
 filesystem. From my researches, the standard interfaces a filesystem
 has to implement on Linux don't offer any means of asking this
 question.
 
 Of course, (a) I'm no Linux expert so what do I know, and (b) it may
 well be possible to come up with a good enough solution by ignoring
 pathologically annoying theoretical cases.
 
 I'm happy to provide Windows code if someone needs it.
 Paul

An OSX code sketch is available here (summary: call FSPathMakeRef to get an 
FSRef from a path string, then FSRefMakePath to make it back into a path, which 
will then have the correct case). And note that it only works if the file 
actually exists.

http://stackoverflow.com/questions/370186/how-do-i-find-the-correct-case-of-a-filename

It would indeed be useful to have that be available in Python.

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Greg Ewing

Guido van Rossum wrote:


Maybe the API could be called os.path.unnormpath(), since it is in a
sense the opposite of normpath() (which removes case) ?


Cute, but not very intuitive. Something like actualpath()
might be better -- although that's somewhat arbitrarily
different from realpath().

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-24 Thread Steven D'Aprano
On Sat, 25 Sep 2010 09:22:47 am Guido van Rossum wrote:

 I think that, like os.path.realpath(), it should not fail if the file
 does not exist.

 Maybe the API could be called os.path.unnormpath(), since it is in a
 sense the opposite of normpath() (which removes case) ? But I would
 want to write it so that even on Unix it scans the filesystem, in
 case the filesystem is case-preserving (like the default fs on OS X).

It is not entirely clear to me what this function is meant to actually 
do? Should it:

1. Return the case of a filename in some canonical form which depends 
   on the file system?
2. Return the case of a filename as it is actually stored on disk?
3. Something else?

and just for completeness:

4. Return the case of a filename in some arbitrarily-chosen canonical 
   form which does not depend on the file system?

These are not the same, either conceptually or in practice.

If you want #4, you already have it in os.path.normcase.

I think that the OP, Chris, wants #1, but it isn't entirely clear to me. 
It's possible that he wants #2.

Various people have posted links to recipes that solve case #2. Note 
though that this necessarily demands that if the file doesn't exist, it 
should raise an exception.

In the case of #1, if the file system doesn't exist, we can't predict 
what the canonical form should be.

The very concept of canonical form for file names is troublesome. If the 
file system is case-preserving, the file system doesn't define a 
canonical form: the case of the file name will depend on how the file 
is initially named. If the file system is case-destructive the 
behaviour will depend on the file system itself: e.g. FAT12 and ISO 
9660 both uppercase file names, but other file systems may make other 
choices. For some arbitrary path, where we don't know what file system 
it is, or if the path doesn't actually exist, we have no way of telling 
what the file system's canonical form will be, or even whether it will 
have one.

Note that I've been talking about case preservation, not case 
sensitivity. That's because case preservation is orthogonal to 
sensitivity. You can see three of the four combinations, e.g.:

Preserving + insensitive:  fat32, NTFS under Win32, normally HFS+
Preserving + sensitive:  ext3, NTFS under POSIX, optionally HFS+
Destructive + insensitive:  fat12, fat16 without long file name support

To the best of my knowledge, destructive + sensitive doesn't exist. It 
could, in principle, but it would be silly to do so.

Note that just knowing the file system type is not enough to tell what 
its behaviour will be. Given an arbitrary file system, there's no 
obvious way to determine what it will do to file names short of trying 
to create a file and see what happens.



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] os.path.normcase rationale?

2010-09-18 Thread Chris Withers

Hi All,

I'm curious as to why, with a file called Foo.txt on a case 
descriminating but case insensitive filesystem, 
os.path.normcase('FoO.txt') will return foo.txt rather than Foo.txt?


Yes, I know the behaviour is documented, but I'm wondering if anyone can 
remember the rationale for that behaviour?


cheers,

Chris

--
Simplistix - Content Management, Batch Processing  Python Consulting
- http://www.simplistix.co.uk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.path.normcase rationale?

2010-09-18 Thread Guido van Rossum
On Sat, Sep 18, 2010 at 2:39 PM, Chris Withers ch...@simplistix.co.uk wrote:
 I'm curious as to why, with a file called Foo.txt on a case descriminating
 but case insensitive filesystem, os.path.normcase('FoO.txt') will return
 foo.txt rather than Foo.txt?

 Yes, I know the behaviour is documented, but I'm wondering if anyone can
 remember the rationale for that behaviour?

Because normcase() and friends never look at the filesystem. Of
course, exists() and isdir() etc. do, and so does realpath(), but the
pure parsing functions don't. They can be used without a working
filesystem even. (E.g. you can import ntpath on a Unix box and happily
parse Windows paths.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com