Tab symbols parsing in WORD document issue: org.apache.poi.hwpf.extractor.WordExtractor

Andrei Khveras Tue, 10 Jan 2012 07:51:48 -0800

He everybody!

I'm trying to use the class org.apache.poi.hwpf.extractor.WordExtractor,
what
I downloaded as a part of Apache POI <http://poi.apache.org/download.html>.


*Could somebody, please*, kindly help me to resolve this little issue. My
goal is to get
MS Word file contents as one single String, containing all control
characters. I need it
for further (hand-made!) splitting text into paragraphs, words, etc. When I
pass to MY PROGRAM <https://gist.github.com/1589465>
sequentially both TestDoc1.doc and TestDoc2.doc I got the same result,
although
TestDoc2.doc has one additional tab
<http://en.wikipedia.org/wiki/Tab_key>before the text.

Could you please advice me how to get all contents of .doc file with taking
into account
all control symbols?

Looking forward to your reply. Thanks in advance.

-- 
*Best Regards
*
*
Andrei
Your java programming colleague
*

Tab symbols parsing in WORD document issue: org.apache.poi.hwpf.extractor.WordExtractor

Reply via email to