Re: Deleted text in Word document

Nick Burch Fri, 27 Aug 2021 08:10:20 -0700

On Fri, 27 Aug 2021, Peter Kronenberg wrote:

When Tika extracts from a Microsoft Word document, deleted text isextracted, with no indication that it is deleted. In fact, if a wordwas deleted and replaced by another word, both words just show upside-by-side. Is there a way to get some sort of annotation thatindicates the status of the text? Or extract it in some sort ofstructured (e.g., XML) format?

How are you calling Tika? Is the XHTML output sufficiently marked-up tolet you spot it?


Nick

Re: Deleted text in Word document

Reply via email to