No, it doesn't appear to. Here's what I get
<p class="list_Paragraph">12.2 <u>Time of EssenceImportance</u>. Time is of the essenceimportance with respect <REDACTED>.</p> Peter Kronenberg | SENIOR AI ANALYTIC ENGINEER C: 703.887.5623 4303 W. 119th St., Leawood, KS 66209 WWW.TORCH.AI -----Original Message----- From: Nick Burch <[email protected]> Sent: Friday, August 27, 2021 11:10 AM To: [email protected] Subject: Re: Deleted text in Word document On Fri, 27 Aug 2021, Peter Kronenberg wrote: > When Tika extracts from a Microsoft Word document, deleted text is > extracted, with no indication that it is deleted. In fact, if a word > was deleted and replaced by another word, both words just show up > side-by-side. Is there a way to get some sort of annotation that > indicates the status of the text? Or extract it in some sort of > structured (e.g., XML) format? How are you calling Tika? Is the XHTML output sufficiently marked-up to let you spot it? Nick
