On Mon, 20 Jun 2011, Troy Witthoeft wrote:
I made some changes, and brought inline with other tika parser examples I have seen. I've looked over IOUtils, however I'm a bit rusty on my Java. By rusty I mean inept.
If you want, open a new jira and upload a sample small cadkey file along with your code so far. I'll be happy to take a look and tweak it slightly when I next have a minute
Note: I found a simpler prefix that delineates the start of user text. [0x01] [0x1F]
Unless we can figure out the file/record structure better, it might be safer to search for a longer sequence than that (eg all the 33s you mentioned in another email). 0x01 0x1f could potentially turn up elsewhere in a file, so we should aim for a more discrimination test
Nick
