I accept that this is far from a suitable solution but it could offer you a short term fix. Simply use the replace() method that is defined on the java.lang.String class to replace all of those characters/character strings with something that makes more sense in this case; I would guess with nothing.
Yours Mark B maxSchlein wrote: > > I tried what you suggested: > > WordExtractor wordExt = new WordExtractor(is); > String bodyText = WordExtractor.stripFields(wordExt.getText()); > > But the is still in the text. > > > maxSchlein wrote: >> >> It appears that when I use WordExtractor.getText(), and there are tables >> in the document, it returns for every table column. Is there a way to >> have this filtered out other than looping thru the returned text. Or is >> there something else I should be doing? Thanks in advance for the >> help... >> >> >> The reason this is an issue is I am using Lucene's WhiteSpaceAnalyzer and >> it is not treating this as whitespace. so a search a given word/phrase >> that happens to be next to one of these 's is not found. >> >> >> > > -- View this message in context: http://old.nabble.com/WordExtractor.getText%28%29-returns-%15-on-word-docs.-tp27111308p27113150.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
