Thanks all,  That is a nice list of options.
I couldn't think of a "simple" way and you've confirmed my suspicions. : )
-Scott
On Jul 9, 2006, at 5:45 PM, [EMAIL PROTECTED] wrote:

In a message dated 7/9/06 11:38:23 AM, <[EMAIL PROTECTED]> writes:
Does anyone have a method for determining whether a file is plain
text that they would be willing to share?
This is not a simple question to answer. Consider that a *web page* is plain text -- what makes it a "web page" is what a browser does with/to it when you run across it in the course of your websurfing. So perhaps it might be appropriate for you to explain what *you* mean when you say "plain text"? Depending on your definition of "plain text", the method of detecting it may well
vary...
That said, here's a couple of possible methods which, even if they don't do what you want, may help set you on the right road to finding your answer...

# possible answer 1: what's the file extension?
function IsItText1 TheFilename
  # all we care about here is the *name* of the file

  set the itemDelimiter to "."
  put item -1 of TheFilename into Fred
  # "text" and "txt" are the most common extensions denoting
  # text files; if you know of any others, you can add them in, too
  put "text,txt" into TextExtensions
  repeat for each item ThisExt in TextExtensions
    if Fred = ThisExt then return true
  end repeat
  return false
end IsItText2

# possible answer 2: does the file contain weird characters?
function IsItText2 TheText
  # assumes that you've already read the file from disc,
  # and are fiddling with the file's content

  put the length of TheText into OldLength
  # garden-variety ASCII text only has characters in it whose
  # ASCII code numbers are 127 *or less*. thus, if there's
  # anything in there with an ASCII code number *greater than 127*,
  # it's prolly not "plain text"
  repeat with K1 = 127 to 255
    put numToChar (K1) into BadChar
    replace BadChar with "" in TheText
  end repeat
  put the length of TheText into NewLength
  return (OldLength = NewLength)
  # if OldLength is the same as NewLength, this will return "true";
  # otherwise, it returns "false". since the only way NewLength *can*
  # be different from OldLength is if some characters got nuked
  # in the loop, you'll get The Right Answer here
end IsItText2

Neither of these functions is perfect; both of them can be fooled, whether
by intent or by accident. Suppose some joker slapped the name
"Budget2006.txt" onto an Excel spreadsheet file, for instance; the IsItText1 function above would say "Yes, it's a text file, alright", but IsItText2 would *not* be so fooled. As for IsItText2, *that* function will turn up bits nose at any file which contains curly-quotes rather than straight-quotes, which means that yes, there are genuine, honest-to-God *text files* which IsItText2 will *wrongly* deem
"not plain text".
Again, once you know what *you* consider a "plain text file" to be, it'll
be easier to come up with a solution.

   Hope this helps...
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to