Hello, I am trying to figure out whether POI's HDF stuff will do what I need and am hoping someone here has some experience/insight.
Background: I'm working on a web crawler in java and we're hoping to be able to get links out of word documents (among others). Our primary concern is coverage, we want to get everything, but we are also concerned about efficiency to a lesser degree. My basic question, and I apologize that it's not more specific (I blame it on the scant javadocs), is whether the hdf stuff is well-suited for this at all, and even if it is, whether it might be overkill. For example, it seems like the java equivalent of 'strings <file>' and a regexp might be good enough, but this might miss things like relative links. In the best-case I'd have a class/classes that allowed me to fetch an array of all URIs in a word doc, which I could then iterate through. Thanks in advance for any suggestions, pt. -- Parker Thompson The Internet Archive 510.541.0125 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
