My initial post was concerned with trying to guess whether the file
that the user pointed at was likely to be formatted correctly... and
I was just looking for plain ASCII. I learned even more than expected!
On Jul 10, 2006, at 11:20 AM, Dar Scott wrote:
On Jul 9, 2006, at 12:59 AM, Scott Morrow wrote:
Does anyone have a method for determining whether a file is plain
text that they would be willing to share?
I don't think plain text or not is the right question. How sure do
you want to be? This can take a lot of processing.
Do you mean plain text vs binary? Plain text vs RTF? Plain text
ASCII vs plain text UTF-8?
For example: I have a function I use that tries to "guess" the
Unicode encoding form of a file. My approach is not to ask "is
this this format?" but "is this more likely this one than the
others under consideration?". (That gets hard under some perverse
cases of UTF-16BE vs UTF-16LE. Brag: My Unicode recognizer code
beats my Microsoft programs in encoding guessing.) I have a few
hard rules to handle the easy cases, but for the most part I build
up evidence points and then compare.
Also, I don't look at the whole file (except in some special
cases). I look at only the characters near the end and near the
front. That puts an upper bound on determination time.
Is the question "Should I dump this into a field or should I
convert to hex first?" ?
Dar Scott
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution