I've created a program to read .doc and .docx text. I now want to search and replace all newline characters (the ones created with shift+enter in Word) with the following: "<br>" For some reason, however, newline characters aren't being read properly in HWPF and XWPF.
I use the following to read .doc: WordExtractor wx = new WordExtractor(document); String docText = wx.getText(); I use the following to read .docx: XWPFWordExtractor wx = new XWPFWordExtractor(document); String docxText = wx.getText(); Let's say I'm reading a Word document formatted as follows: Bojo<br>the clown<p>Funny (assume, instead of <br>, in Word I use the shift+enter line feed/new line, and instead of <p>, in Word I use the regular enter carriage return) Using HWPF, docText will print (using System.out.println): Bojo the clown Funny Using XWPF, docxText will print: Bojothe clown Funny Notice how neither HWPF nor XWPF show the "shift+enter" return, but both reflect the normal "enter" return. Also notice that XWPF doesn't even show the empty space for the "shift+enter" return, unlike HWPF, which at least shows a whitespace character. What is going on? Why can't I display the "shift+enter" character? -- View this message in context: http://apache-poi.1045710.n5.nabble.com/HWPF-and-XWPF-How-to-read-newline-tp3323805p3323805.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
