I've been having a lot of success reading Word docs with the XWPF set of
classes. I am having a problem with one document and I need to know if there
is a way to get the one word that is missing.

 

The document was originally a .doc and I converted it to a .docx using my
current copy of Word. (This was done because I didn't want to have to fight
my way through the HWPF stuff having already used XWPF.)

 

I am using POI 3.8 beta 4 for the poi, poi-ooxml and poi-xml-schemas jars.

 

I open the document with 

                XWPFDocument doc = new XWPFDocument( is);              // is
is a FileInputStream on top of a File

 

I iterate over the list of BodyElements and use a combination of table
header rows and paragraphs with their style to find the tables I need to
harvest.

 

One table has a problem with a cell that shows up in the Word document as:

Phase transition:

Departure to Mission

 

The word "Mission" is not present in the XWPFTableCell's getText() method,
or if I get the cell's paragraphs or runs. 

 

In browsing around while tracing, I noticed the following hierarchy:

 

1)      XWPFTableCell has field

2)      paragraphs - an ArrayList<E>

3)      elementData Object[10]

4)      [0] XWPFParagraph

5)      paragraph CTPimpl


When I select the paragraph field, the following XML shows up in the box
below the variables (in Eclipse debug).

 

<xml-fragment w:rsidR="00712B88" w:rsidRDefault="00712B88"
xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanva
s" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006";
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships
" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math";
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDraw
ing"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDra
wing" xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main";
xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml";
xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup
"
xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk";
xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml";
xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape
">

  <w:pPr>

    <w:rPr>

      <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>

      <w:sz w:val="18"/>

      <w:szCs w:val="18"/>

    </w:rPr>

  </w:pPr>

  <w:r>

    <w:rPr>

      <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>

      <w:sz w:val="18"/>

      <w:szCs w:val="18"/>

    </w:rPr>

    <w:t>Phase transition:</w:t>

  </w:r>

  <w:r>

    <w:rPr>

      <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>

      <w:sz w:val="18"/>

      <w:szCs w:val="18"/>

    </w:rPr>

    <w:br/>

    <w:t xml:space="preserve">Departure to</w:t>

  </w:r>

  <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
w:element="City">

    <w:r>

      <w:rPr>

        <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>

        <w:sz w:val="18"/>

        <w:szCs w:val="18"/>

      </w:rPr>

      <w:t>Mission</w:t>

    </w:r>

  </w:smartTag>

</xml-fragment>

 

Note, the three highlighted lines show the whole block of text. The word
"Mission" however, is not returned. I see that it is tagged as a smarttag
with element name of City.

 

Is there a way to read the content of this block? I'm sure the creator of
this document would not know what to do if I said "Please recreate it
without using a smarttag for "Mission". (I would not know what to do
either.)


Any light you can shine on this would be appreciated.

 

David R. Patterson

Senior Software Engineer

 <mailto:[email protected]> [email protected]

Intelligent Automation, Inc.

(301) 294-4632

 



__________________________________________________________________________
This message and all attachments are PRIVATE, and contain information that
is PROPRIETARY to Intelligent Automation, Inc. You are not authorized to
transmit or otherwise disclose this message or any attachments to any
third party whatsoever without the express written consent of Intelligent
Automation, Inc. If you received this message in error or you are not
willing to view this message or any attachments on a confidential basis,
please immediately delete this email and any attachments and notify
Intelligent Automation, Inc.

Reply via email to