Mitko Haralanov wrote:
I have a question about finding out whether a string contains binary
data?
In my application, I am reading from a file which could contain
binary data. After I have read the data, I transfer it using xmlrpclib.
However, xmlrpclib has trouble unpacking XML which contains binary data
and my application throws an exception. The solution to this problem is
to use base64 encoding of the data but I don't know how to check
whether the encoding will be needed?
If I read in a string containing some binary data from the file, the
type of that string is <type 'str'> which is not different from any
other string, so I can't use that as a check.
The only other check that I can think of is to check every character in
the read-in string against string.printable but that will take a long
time.
Can anyone suggest a better way to handle the check? Thank you in
advance.
All the data is binary. But perhaps you mean ASCII (7 bits), or you
mean between 20-7f. or something.
The way I'd tackle it is to build a translation table for your
definition of "binary." Then simply do something like:
if data != data.translate(table):
..... Convert to bin64 or whatever...
The translation table would be defined such that table[ch] == ch for
all ch that are "nonbinary" and table[ch] != ch for all ch that are
"binary." And naturally you only build the table once, and reuse it
on each buffer.
This should be quicker than any for loop you could write, though there
may be other builltin functions that are even quicker. It's a start,
though.
Note that you will probably also be escaping the xml special
characters, such as &, <, and >. So you might get clever about letting
a single translate pass tell you whether the data can be stored
unmodified, then do a second translate to decide which way to modify
it. Whether this is worthwhile depends in part on how often the buffer
fits into which category.
--
http://mail.python.org/mailman/listinfo/python-list