On Nov 14, 2009, at 11:02 AM, Luca Fabbri wrote:

Hi all.

I'm looking for a way to be able to load a generic file from the
system and understand if he is plain text.
The mimetype module has some nice methods, but for example it's not
working for file without extension.

Hi Luca,
You have to define what you mean by "text" file. It might seem obvious, but it's not.

Do you mean just ASCII text? Or will you accept Unicode too? Unicode text can be more difficult to detect because you have to guess the file's encoding (unless it has a BOM; most don't).

And do you need to verify that every single byte in the file is "text"? What if the file is 1GB, do you still want to examine every single byte?

If you give us your own (specific!) definition of what "text" means, or perhaps a description of the problem you're trying to solve, then maybe we can help you better.

Cheers
Philip
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to