No, it doesn't appear to.  Here's what I get


<p class="list_Paragraph">12.2 <u>Time of EssenceImportance</u>. Time is of the 
essenceimportance with respect  <REDACTED>.</p>

Peter Kronenberg  |  SENIOR AI ANALYTIC ENGINEER 
C: 703.887.5623

4303 W. 119th St., Leawood, KS 66209
WWW.TORCH.AI


-----Original Message-----
From: Nick Burch <[email protected]> 
Sent: Friday, August 27, 2021 11:10 AM
To: [email protected]
Subject: Re: Deleted text in Word document

On Fri, 27 Aug 2021, Peter Kronenberg wrote:
> When Tika extracts from a Microsoft Word document, deleted text is 
> extracted, with no indication that it is deleted.  In fact, if a word 
> was deleted and replaced by another word, both words just show up 
> side-by-side.  Is there a way to get some sort of annotation that 
> indicates the status of the text?  Or extract it in some sort of 
> structured (e.g., XML) format?

How are you calling Tika? Is the XHTML output sufficiently marked-up to let you 
spot it?

Nick

Reply via email to