He everybody! I'm trying to use the class org.apache.poi.hwpf.extractor.WordExtractor, what I downloaded as a part of Apache POI <http://poi.apache.org/download.html>.
*Could somebody, please*, kindly help me to resolve this little issue. My goal is to get MS Word file contents as one single String, containing all control characters. I need it for further (hand-made!) splitting text into paragraphs, words, etc. When I pass to MY PROGRAM <https://gist.github.com/1589465> sequentially both TestDoc1.doc and TestDoc2.doc I got the same result, although TestDoc2.doc has one additional tab <http://en.wikipedia.org/wiki/Tab_key>before the text. Could you please advice me how to get all contents of .doc file with taking into account all control symbols? Looking forward to your reply. Thanks in advance. -- *Best Regards * * Andrei Your java programming colleague *
