I created TIKA-679 to continue this discussion. https://issues.apache.org/jira/browse/TIKA-679
On Tue, Jun 21, 2011 at 4:17 PM, Nick Burch <[email protected]> wrote: > On Mon, 20 Jun 2011, Troy Witthoeft wrote: > >> I made some changes, and brought inline with other tika parser examples I >> have seen. I've looked over IOUtils, however I'm a bit rusty on my Java. By >> rusty I mean inept. >> > > If you want, open a new jira and upload a sample small cadkey file along > with your code so far. I'll be happy to take a look and tweak it slightly > when I next have a minute > > > Note: I found a simpler prefix that delineates the start of user text. >> [0x01] [0x1F] >> > > Unless we can figure out the file/record structure better, it might be > safer to search for a longer sequence than that (eg all the 33s you > mentioned in another email). 0x01 0x1f could potentially turn up elsewhere > in a file, so we should aim for a more discrimination test > > Nick >
