I can, and will create a bug for this, but I would think that someone else
out there, somewhere, has had this issue with poi.
The solution that uses Apache StringUtils, is here:
WordExtractor wordExt = new WordExtractor(is);
String bodyText = WordExtractor.stripFields(wordExt.getText());
StringBuffer cleanString = new StringBuffer();
StringBuffer dirtyString = new StringBuffer(bodyText);
while(!StringUtils.isAsciiPrintable(dirtyString.toString()))
{
char c;
int index = 0;
c = dirtyString.charAt(index);
while(StringUtils.isAsciiPrintable(String.valueOf(c)))
{
index++;
c = dirtyString.charAt(index);
}
dirtyString = new
StringBuffer(dirtyString.toString().replaceAll(String.valueOf(dirtyString.charAt(index)),
" "));
}
return dirtyString.toString();
Nick Burch-11 wrote:
>
> On Mon, 11 Jan 2010, maxSchlein wrote:
>> I tried what you suggested:
>>
>> WordExtractor wordExt = new WordExtractor(is);
>> String bodyText = WordExtractor.stripFields(wordExt.getText());
>>
>> But the is still in the text.
>
> Can you create a new bug on bugzilla, and upload a sample file that shows
> this behaviour? In the mean time, you'll need to go with Mark's suggestion
> of manually removing them though
>
> Cheers
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>
--
View this message in context:
http://old.nabble.com/WordExtractor.getText%28%29-returns-%15-on-word-docs.-tp27111308p27113524.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]