On Jul 9, 2006, at 12:59 AM, Scott Morrow wrote:
Does anyone have a method for determining whether a file is plain text that they would be willing to share?
I don't think plain text or not is the right question. How sure do you want to be? This can take a lot of processing.
Do you mean plain text vs binary? Plain text vs RTF? Plain text ASCII vs plain text UTF-8?
For example: I have a function I use that tries to "guess" the Unicode encoding form of a file. My approach is not to ask "is this this format?" but "is this more likely this one than the others under consideration?". (That gets hard under some perverse cases of UTF-16BE vs UTF-16LE. Brag: My Unicode recognizer code beats my Microsoft programs in encoding guessing.) I have a few hard rules to handle the easy cases, but for the most part I build up evidence points and then compare.
Also, I don't look at the whole file (except in some special cases). I look at only the characters near the end and near the front. That puts an upper bound on determination time.
Is the question "Should I dump this into a field or should I convert to hex first?" ?
Dar Scott _______________________________________________ use-revolution mailing list [email protected] Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
