Does anyone know of an existing parser or library that may be able to extract the user inputted text from a Cadkey .prt file?
I don't think it should be that hard to implement. For instance, opening prt files with a text editor shows that user inputted text fields are stored as ASCII(?) characters. Here's an image of a prt file open in notepad [http://i.imgur.com/CPTU0.png] I'm not sure about the file encoding (UTF-8?), but it's these characters that I would like to extract. I've played around with writing a few parsers; registering them to org.apache.tika.parser.Parser, and adding them to tika-mimetypes.xml. Currently I have a dummy PRTparser, that functions inside of the tika-app. It has seems to have accurate magic mime-type detection. Match value="0M3C" type="string" offset="8" but it only outputs dummy data at the moment. Can someone point me in the right direction in order to create a parser for these files?
