Re: escaping characters in filenames

2009-07-29 Thread J Kenneth King
Nobody  writes:

> On Wed, 29 Jul 2009 09:29:55 -0400, J Kenneth King wrote:
>
>> I wrote a script to process some files using another program.  One thing
>> I noticed was that both os.listdir() and os.path.walk() will return
>> unescaped file names (ie: "My File With Spaces & Stuff" instead of "My\
>> File\ With\ Spaces\ \&\ Stuff").  I haven't had much success finding a
>> module or recipe that escapes file names and was wondering if anyone
>> could point me in the right direction.
>> 
>> As an aside, the script is using subprocess.call() with the "shell=True"
>> parameter.  There isn't really a reason for doing it this way (was just
>> the fastest way to write it and get a prototype working).  I was
>> wondering if Popen objects were sensitive to unescaped names like the
>> shell.  I intend to refactor the function to use Popen objects at some
>> point and thought perhaps escaping file names may not be entirely
>> necessary.
>
> Note that subprocess.call() is nothing more than:
>
>   def call(*popenargs, **kwargs):
>   return Popen(*popenargs, **kwargs).wait()
>
> plus a docstring. It accepts exactly the same arguments as Popen(), with
> the same semantics.
>
> If you want to run a command given a program and arguments, you
> should pass the command and arguments as a list, rather than trying to
> construct a string.
>
> On Windows the value of shell= is unrelated to whether the command is
> a list or a string; a list is always converted to string using the
> list2cmdline() function. Using shell=True simply prepends "cmd.exe /c " to
> the string (this allows you to omit the .exe/.bat/etc extension for
> extensions which are in %PATHEXT%).
>
> On Unix, a string is first converted to a single-element list, so if you
> use a string with shell=False, it will be treated as the name of an
> executable to be run without arguments, even if contains spaces, shell
> metacharacters etc.
>
> The most portable approach seems to be to always pass the command as a
> list, and to set shell=True on Windows and shell=False on Unix.
>
> The only reason to pass a command as a string is if you're getting a
> string from the user and you want it to be interpreted using the
> platform's standard shell (i.e. cmd.exe or /bin/sh). If you want it to be
> interpreted the same way regardless of platform, parse it into a
> list using shlex.split().

I understand; I think I was headed towards subprocess.Popen() either
way.  It seems to handle the problem I posted about.  And I got to learn
a little something on the way.  Thanks!

Only now there's a new problem in that the output of the program is
different if I run it from Popen than if I run it from the command line.
The program in question is 'pdftotext'.  More investigation to ensue.

Thanks again for the helpful post.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping characters in filenames

2009-07-29 Thread Nobody
On Wed, 29 Jul 2009 09:29:55 -0400, J Kenneth King wrote:

> I wrote a script to process some files using another program.  One thing
> I noticed was that both os.listdir() and os.path.walk() will return
> unescaped file names (ie: "My File With Spaces & Stuff" instead of "My\
> File\ With\ Spaces\ \&\ Stuff").  I haven't had much success finding a
> module or recipe that escapes file names and was wondering if anyone
> could point me in the right direction.
> 
> As an aside, the script is using subprocess.call() with the "shell=True"
> parameter.  There isn't really a reason for doing it this way (was just
> the fastest way to write it and get a prototype working).  I was
> wondering if Popen objects were sensitive to unescaped names like the
> shell.  I intend to refactor the function to use Popen objects at some
> point and thought perhaps escaping file names may not be entirely
> necessary.

Note that subprocess.call() is nothing more than:

def call(*popenargs, **kwargs):
return Popen(*popenargs, **kwargs).wait()

plus a docstring. It accepts exactly the same arguments as Popen(), with
the same semantics.

If you want to run a command given a program and arguments, you
should pass the command and arguments as a list, rather than trying to
construct a string.

On Windows the value of shell= is unrelated to whether the command is
a list or a string; a list is always converted to string using the
list2cmdline() function. Using shell=True simply prepends "cmd.exe /c " to
the string (this allows you to omit the .exe/.bat/etc extension for
extensions which are in %PATHEXT%).

On Unix, a string is first converted to a single-element list, so if you
use a string with shell=False, it will be treated as the name of an
executable to be run without arguments, even if contains spaces, shell
metacharacters etc.

The most portable approach seems to be to always pass the command as a
list, and to set shell=True on Windows and shell=False on Unix.

The only reason to pass a command as a string is if you're getting a
string from the user and you want it to be interpreted using the
platform's standard shell (i.e. cmd.exe or /bin/sh). If you want it to be
interpreted the same way regardless of platform, parse it into a
list using shlex.split().

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: escaping characters in filenames

2009-07-29 Thread Dave Angel

J Kenneth King wrote:

I wrote a script to process some files using another program.  One thing
I noticed was that both os.listdir() and os.path.walk() will return
unescaped file names (ie: "My File With Spaces & Stuff" instead of "My\
File\ With\ Spaces\ \&\ Stuff").  I haven't had much success finding a
module or recipe that escapes file names and was wondering if anyone
could point me in the right direction.

As an aside, the script is using subprocess.call() with the "shell=True"
parameter.  There isn't really a reason for doing it this way (was just
the fastest way to write it and get a prototype working).  I was
wondering if Popen objects were sensitive to unescaped names like the
shell.  I intend to refactor the function to use Popen objects at some
point and thought perhaps escaping file names may not be entirely
necessary.

Cheers

  
There are dozens of meanings for escaping characters in strings.  
Without some context, we're wasting our time.


For example, if the filename is to be interpreted as part of a URL, then 
spaces are escaped by using %20.   Exactly who is going to be using this 
string you think you have to modify?  I don't know of any environment 
which expects spaces to be escaped with backslashes.


Be very specific.  For example, if a Windows application is parsing its 
own command line, you need to know what that particular application is 
expecting -- Windows passes the entire command line as a single string.  
But of course you may be invoking that application using 
subprocess.Popen(), in which case some transformations happen to your 
arguments before the single string is built.  Then some more 
transformations may happen in the shell.  Then some more in the C 
runtime library of the new process (if it happens to be in C, and if it 
happens to use those libraries).


I'm probably not the one with the answer.  But until you narrow down 
your case, you probably won't attract the attention of whichever person 
has the particular combination of experience that you're hoping for.


DaveA

--
http://mail.python.org/mailman/listinfo/python-list


Re: escaping characters in filenames

2009-07-29 Thread MRAB

J Kenneth King wrote:

I wrote a script to process some files using another program.  One thing
I noticed was that both os.listdir() and os.path.walk() will return
unescaped file names (ie: "My File With Spaces & Stuff" instead of "My\
File\ With\ Spaces\ \&\ Stuff").  I haven't had much success finding a
module or recipe that escapes file names and was wondering if anyone
could point me in the right direction.


That's only necessary if you're building a command line and passing it
as a string.


As an aside, the script is using subprocess.call() with the "shell=True"
parameter.  There isn't really a reason for doing it this way (was just
the fastest way to write it and get a prototype working).  I was
wondering if Popen objects were sensitive to unescaped names like the
shell.  I intend to refactor the function to use Popen objects at some
point and thought perhaps escaping file names may not be entirely
necessary.


Pass the command line to Popen as a list of strings.
--
http://mail.python.org/mailman/listinfo/python-list