From: Trausti Thor Johannsson <[EMAIL PROTECTED]>
Date: Fri, 21 Jul 2006 15:53:41 +0200
Is there any way for me to check and see if a text file is "safe to
display" ? That is, it would not be a picture inside the text file ?
not encrypted and pretty much, just a plain text file ?
To complicate matters, the file would be Unicode and so forth.
Actually, that could simplify matters :)
Unicode has many unallowed code points, and some serialisations that
are also unallowed. UTF-8 especially is very easy to verify if it is
good UTF-8. UTF-16 much less so.
My ElfData plugin has a function .Scan_Verify, which returns an
integer of the first bad byte in the UTF-8 string. If the entire
string is good, .Scan_Verify returns 0.
I'd imagine that almost no picture or other media file will validate
as UTF-8.
ElfData.Scan_Verify doesn't check for byte 0, however. Character 0 is
actually a valid Unicode character, although it is a non-textual
character. Unicode doesn't say that control codes can't be used in a
Unicode string.
.Scan_Verify only validates text according to the Unicode standard of
what a Unicode string should be, it doesn't validate it according to
what we think a piece of text should be. Text doesn't contain control
codes usually, except for LF, CF and TAB.
--
http://elfdata.com/plugin/
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>