[EMAIL PROTECTED] wrote: > > On 06:07 am, [EMAIL PROTECTED] wrote: >> Most apps aren't file managers or ftp clients but when they interact >> with files (for instance, a file selection dialog) they need to be able >> to show the user all the relevant files. So on an app-by-app basis the >> need for this is high. > > While I tend to agree emphatically with this, the *real* solution here > is a path-abstraction library.
Why don't you send me some information offlist. I'm not sure I agree that a path-abstraction library can work correctly but if it can it would be nice to have that at a level higher than the file-dialog libraries that I was envisioning. [snip] >> ... but that still >> doesn't help me identify when someone would expect that asking python >> for a list of all files in a directory or a specific set of files in a >> directory should, without warning, return only a subset of them. In >> what situations is this appropriate behaviour? > > If you say listdir(unicode) on a POSIX OS, your program is saying "I > only know how to deal with unicode results from this function, so please > only give me those.". No. (explained below) > If your program is smart enough to deal with > bytes, then you would have asked for bytes, no? Yes (explained below) > Returning only > filenames which can be properly decoded makes sense. Otherwise everyone > needs to learn about this highly confusing issue, even for the simplest > scripts. > os.listdir(unicode) (currently) means that the *programmer* is asking that the stdlib return the decodable filenames from this directory. The question is whether the programmer understood that this is what they were asking for and whether it is what they most likely want. I would make the following statements WRT to this: 1) The programmer most likely does not want decodable filenames and only decodable filename. If they were, we'd see a lot of python2.x code that turns pathnames into unicode and discards everything that wasn't decodable. No one has given a use case for finding only the *decodable* subset of files. If I request to see all *.py files in a directory, I want to see all of the *.py files in the directory, decodable or not. If you can show how programmers intend "90%" of their calls to os.listdir()/glob.glob('*.txt') to show only the decodable subset of the results, then the foundation of my arguments is gone. So please, give examples to prove this wrong. - If this is true, a definition of os.listdir(<type 'str'>) that would better meet programmer expectation would be: "Give me all files in a directory with the output as str type". The definition of os.listdir(<type 'bytes'>) would be "Give me all files in a directory with the output as bytes type". Raising an exception when the filenames are undecodable is perfectly reasonable in this situation. 2) For the programmer to understand the difference between os.listdir(<type 'bytes'>) and os.listdir(<type 'str'>) they have to understand the "highly confusing issue" and what it means for their code. So the current method is forcing programmers to understand it even for the simplest scripts if their environment is not uniform with no clue from the interpreter that there is an issue. - Similarly, raising an exception on undecodable values means that the programmer can ignore the issue in any scripts in sane environments and will be told that they need to deal with it (via an exception) when their script runs in a non-sane environment. 3) The usage of unicode vs bytes is easy to miss for someone starting with py2.x or windows and moving to a multi-platform or unix project. Even simple testing won't reveal the problem unless the programmer knows that they have to test what happens when encodings are mixed. Once again, this is requiring the programmer to understand the encoding issue without help from the interpreter. > Skipping undecodable values is good enough that it will work 90% of the > time. You and Guido have now made this claim to defend not raising an exception but I still don't have a use case. Here are use cases that I see: * Bill is coding an application for use inside his company. His company only uses utf-8. His code naively uses os.listdir(<type 'str'>). - The code does not throw an exception whether we use the current os.listdir() or one that could throw an exception because the system admins have sanitised the environment. Bill did not need to understand the implications of encoding for his code to work in this script whether simple or complex. * Mary is coding an application for use inside her company. It finds all html files on a system and updates her company's copyright, privacy policy, and other legal boilerplate. Her expectation is that after her program runs every file will have been updated. Her environment is a mixture of different filename encodings due to having many legacy documents for users in different locales. Mary's code also naively uses os.listdir(<type 'str'>). Her test case checks that the code does the right thing on many languages but unfortunately doesn't check with different encodings because she'd have to already understand the encoding issue to check for that. - With the current approach, the code will silently do the wrong thing in production for years, until someone notices and alerts the company that something is wrong with certain files in certain locales. By then, Mary may no longer be involved with the company and there are thousands of users who thought they were operating under the old legal terms instead of the new ones. - With exceptions raised, Mary will be alerted of the problem when she tries to run the code in production for the first time. She can then do a little research and fix it to run correctly. The traceback that's issued can be googled and the line that it points to will show where the error is occurring. * Arthur's company has shipped some of his code in a product. The code uses os.listdir(<type 'str'>) to find images and movies in a directory subsequent to deciding if they contain pornography. A cron job runs the code and the messages it prints are sent by cron to the system admins to take action on. A customer calls to complain that the code did not detect that a recently fired employee had a 30 minute pornographic movie on his office computer. Arthur has to figure out why. - With the current code, Arthur might start with the algorithms that examines the movies, try to get samples of the pornography from the company, and look in many wrong places before finding out that the code that searches for files is not listing all the files in directories. - With tracebacks raised, the system admins, at least, will have received messages from cron stating that the undecodable filenames are causing errors that need to be addressed. They can call Arthur's company when they notice this and Arthur can fix it quickly because the traceback contains all the necessary information. -Toshio
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com