Looking at the xml markup, the good news is that this fix should be possible
but that it might take a day or so. Not wishing to get into too much detail,
bookmarks appear in the xml markup as a matching pair of bookmarkStart and
bookmarkEnd tags. The match between a pair of tags is enforced by the value
of the id attribute and typically, you will see something like this in a
document;
<w:bookmarkStart w:id="2" w:name="EDMS_Bookmark3"/>
<w:r>
<w:t>Sample3</w:t>
</w:r>
<w:bookmarkEnd w:id="2"/>
Which, as you can see, is an example I took from your first test document.
Bookmarked table cells are a different beast altogether. There is no neatly
matched start and end tag – the start tag appears within the record for the
table cell as you might expect but it’s end tag is placed after the end of
the row element which, I must admit surprised me. As a result, matching up
the start and end tags and stepping between them as the code did before is
not the answer to the problem and it is the reason why the existing code
failed.
Luckily, the whole bookmarkStart tag is different. It includes attributes
that identify the location of the cell. Using again an example from your
latest test document, the xml looks like this;
<w:bookmarkStart w:id="0" w:name="BookmarkForCell" w:colFirst="1"
w:colLast="1"/>
As a result, I am going to modify the existing code so that the first thing
it does when encountering a bookmarkStart tag is to look for the colFirst
and colLast attributes. If it finds them, it will assume that the bookmark
relates to a table cell which it must update. With luck, there will already
be support in the API to handle this process – inserting a run into a table
cell – and if this is the case then the fix ought to be fairly quick. It
appears as if there is no way to identify one table in the document from
another and, as a result, I will need to change the way the code processes
tables. Currently, the focus is all on paragraphs. All the code does is get
a table cell, recover the contents of any paragraphs it contains and then
processes them. It passes the list of paragraphs to exactly the same piece
of code that processes the list of paragraphs recovered from the document. I
think I need an overloaded method that can be passed a reference to a table
cell as well as the list of paragraphs. This way, if the bookmark does
relate to that specific table cell, the object is available for update.
I do have one question that I would ask you to put to your users if you
could please – that is assuming you are dealing with a experienced Word user
body of course. As you can see from the bookmark tag above, it is possible
for the bookmark to relate to more than one table cell. If this is the case
– say <w:bookmarkStart w:id="0" w:name="BookmarkForCell" w:colFirst="1"
w:colLast="3"/> - what would they expect to see in the resulting document?
Assuming that such a tag would not relate to a document where the cells in
columns 1 to 3 were merged, what ought the code to do? Should it merge the
cells in that row together? Should it copy the value into those three cells
– in this case? Also, I am going to guess that the tag could include
rowFirst and rowLast attributes although I am going to check this out.
Will post if and when I make any progress and the first iteration of the
code will assume the bookmark relates to just a single cell in a table for
the sake of simplicity.
Yours
Mark B
--
View this message in context:
http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-tp5710184p5710312.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]