On Tuesday 30 September 2008, M.-A. Lemburg wrote: > On 2008-09-30 08:00, Martin v. Löwis wrote: > >> Change the default file system encoding to store bytes in Unicode is > >> like introducing a new Python type: <fake Unicode for filename hacks>. > > > > Exactly. Seems like the best solution to me, despite your polemics. > > Not a bad idea... have os.listdir() return Unicode subclasses that work > like file handles, ie. they have an extra buffer that holds the original > bytes value received from the underlying C API.
Why does it have to be a Unicode subclass? In my eyes, a Unicode object promises a few things, in particular that it contains a Unicode string. If it now suddenly contains bytes without any further meaning, that would be bad. What I wonder is what the requirements on path handling are. I'll try to list the ones I can see: 1. A path received from the system should be preserved, so it can be given to the system later on. IOW, the internal representation should not loose any information compared to the one used by the OS. 2. Typical operations like joining two path segments or moving to the parent dir should be defined. 3. There must be a way to display the path to the user. IOW, there should be a way to turn the path into a string that the user can recognise, according to some encoding. Note that this is not always possible, so this can fail. 4. There must be a way to receive a path from the user. That means that there must be a way from a user-entered string to a path. Note that this, too, isn't always possible and can fail. 5. The conversion between a string and a path should be configurable, defaults retrieved from the system. This is so that most operations will just work and do the thing that the user expects. 6. There should be a way to modify the path data itself. This of course requires knowledge about the internals but gives full power to the programmer. For requirement 3, I would say a lossy conversion to a string would be enough, i.e. try to convert the path to a Unicode string and use a question mark or some escaping to mark parts that can't be decoded. It will allow users to recognise the decodeable parts of the path with hopefully just a few characters left without decoding. For requirement 4, a failure to encode a string to a path must result in a loud failure, i.e. an exception. This is because the user entered a path that we can't use, any guessing what the user might have wanted is futile. Are there any points to add? Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at <http://www.satorlaser.de/> ************************************************************************************** Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden. E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich. ************************************************************************************** _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com