Thanks all, That is a nice list of options.
I couldn't think of a "simple" way and you've confirmed my
suspicions. : )
-Scott
On Jul 9, 2006, at 5:45 PM, [EMAIL PROTECTED] wrote:
In a message dated 7/9/06 11:38:23 AM,
<[EMAIL PROTECTED]> writes:
Does anyone have a method for determining whether a file is plain
text that they would be willing to share?
This is not a simple question to answer. Consider that a *web
page* is
plain text -- what makes it a "web page" is what a browser does
with/to it when
you run across it in the course of your websurfing. So perhaps it
might be
appropriate for you to explain what *you* mean when you say "plain
text"?
Depending on your definition of "plain text", the method of
detecting it may well
vary...
That said, here's a couple of possible methods which, even if
they don't
do what you want, may help set you on the right road to finding
your answer...
# possible answer 1: what's the file extension?
function IsItText1 TheFilename
# all we care about here is the *name* of the file
set the itemDelimiter to "."
put item -1 of TheFilename into Fred
# "text" and "txt" are the most common extensions denoting
# text files; if you know of any others, you can add them in, too
put "text,txt" into TextExtensions
repeat for each item ThisExt in TextExtensions
if Fred = ThisExt then return true
end repeat
return false
end IsItText2
# possible answer 2: does the file contain weird characters?
function IsItText2 TheText
# assumes that you've already read the file from disc,
# and are fiddling with the file's content
put the length of TheText into OldLength
# garden-variety ASCII text only has characters in it whose
# ASCII code numbers are 127 *or less*. thus, if there's
# anything in there with an ASCII code number *greater than 127*,
# it's prolly not "plain text"
repeat with K1 = 127 to 255
put numToChar (K1) into BadChar
replace BadChar with "" in TheText
end repeat
put the length of TheText into NewLength
return (OldLength = NewLength)
# if OldLength is the same as NewLength, this will return "true";
# otherwise, it returns "false". since the only way NewLength *can*
# be different from OldLength is if some characters got nuked
# in the loop, you'll get The Right Answer here
end IsItText2
Neither of these functions is perfect; both of them can be
fooled, whether
by intent or by accident. Suppose some joker slapped the name
"Budget2006.txt" onto an Excel spreadsheet file, for instance; the
IsItText1 function above
would say "Yes, it's a text file, alright", but IsItText2 would
*not* be so
fooled. As for IsItText2, *that* function will turn up bits nose at
any file
which contains curly-quotes rather than straight-quotes, which
means that yes,
there are genuine, honest-to-God *text files* which IsItText2 will
*wrongly* deem
"not plain text".
Again, once you know what *you* consider a "plain text file" to
be, it'll
be easier to come up with a solution.
Hope this helps...
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution