On Nov 14, 2009, at 11:02 AM, Luca Fabbri wrote:
Hi all.
I'm looking for a way to be able to load a generic file from the
system and understand if he is plain text.
The mimetype module has some nice methods, but for example it's not
working for file without extension.
Hi Luca,
You have to define what you mean by "text" file. It might seem
obvious, but it's not.
Do you mean just ASCII text? Or will you accept Unicode too? Unicode
text can be more difficult to detect because you have to guess the
file's encoding (unless it has a BOM; most don't).
And do you need to verify that every single byte in the file is
"text"? What if the file is 1GB, do you still want to examine every
single byte?
If you give us your own (specific!) definition of what "text" means,
or perhaps a description of the problem you're trying to solve, then
maybe we can help you better.
Cheers
Philip
--
http://mail.python.org/mailman/listinfo/python-list