I tried what you suggested:
WordExtractor wordExt = new WordExtractor(is);
String bodyText = WordExtractor.stripFields(wordExt.getText());
But the is still in the text.
maxSchlein wrote:
>
> It appears that when I use WordExtractor.getText(), and there are tables
> in the document, it returns for every table column. Is there a way to
> have this filtered out other than looping thru the returned text. Or is
> there something else I should be doing? Thanks in advance for the help...
>
>
> The reason this is an issue is I am using Lucene's WhiteSpaceAnalyzer and
> it is not treating this as whitespace. so a search a given word/phrase
> that happens to be next to one of these 's is not found.
>
>
>
--
View this message in context:
http://old.nabble.com/WordExtractor.getText%28%29-returns-%15-on-word-docs.-tp27111308p27112657.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]