Hi All
I'm thinking about adding a simple text extractor utility to hwpf, since
everyone is currently rolling their own, and that's not very
programmer efficient!
When I get text out, I normally use something like:
StringBuffer text = new StringBuffer();
Range r = wdoc.getRange();
for(int i=0; i < r.numParagraphs(); i++) {
Paragraph p = r.getParagraph(i);
text.append(p.text());
}
However, I've also seen people advocate an approach like:
StringBuffer text = new StringBuffer();
Iterator textPieces = doc.getTextTable().getTextPieces().iterator();
while (textPieces.hasNext()) {
TextPiece piece = (TextPiece) textPieces.next();
String encoding = "Cp1252";
if (piece.usesUnicode()) {
encoding = "UTF-16LE";
}
text.append(new String(piece.getRawBytes(), encoding));
}
(normally accompanied by some stripping out of macros)
Is there any reason why I shouldn't use the first version?
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/