On Fri, Feb 13, 2009 at 11:11 AM, Alan Gauld <alan.ga...@btinternet.com>wrote:
> >> {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf440 >> > > The problem is that this is an RTF format which is a binary format. > You can process binary data ion Python (see the box on the files > topic page) but it is much more difficult that normal plain text files. > I'd like to correct a misapprehension here: RTF files are not a "binary format" in the usual sense of that phrase - there are no non-ASCII characters, newlines are handled normally, etc. They're text files, but with lots of funky formatting codes included - which are themselves all text. They're a lot like badly-formatted, insanely complicated XML, actually. They're full of gibberish-looking strings like the above, and the nesting of curly braces can run to dozens of layers deep - however, if you want to get a quick idea of what's going on in an RTF, the fastest way is to unfold it (search-and-replace each "{" and "}" with "{\n" and "}\n" respectively) and indent each opened curly brace. Then if you want to give yourself a headache, you can sit down with Microsoft's RTF specification and figure out what all those codes do. -- www.fsrtechnologies.com
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor