Re: No-break space and middle dot in String produced with WordExtractor

Nick Burch Fri, 14 Jul 2006 02:36:31 -0700

On Thu, 13 Jul 2006, [EMAIL PROTECTED] wrote:

I used WordExtractor to extract texts from MS Word documents. Thedocuments have many non-text charaters that display as squares, andsometimes as lines. However, most of the texts appear clearly. I did hexdumps of the texts and found that some squares have the values A0 andsome have B7. I tried to remove them using the String method "Stringreplace(char oldChar, char newChar)", but it does not remove them.

That sounds like it's a string replacement issue, and not a poi issue. Myguess is that you're not correctly identifying the codes for thecharacters. Most good learning java books should help you there.


Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Re: No-break space and middle dot in String produced with WordExtractor

Reply via email to