When Tika extracts from a Microsoft Word document, deleted text is extracted, with no indication that it is deleted. In fact, if a word was deleted and replaced by another word, both words just show up side-by-side. Is there a way to get some sort of annotation that indicates the status of the text? Or extract it in some sort of structured (e.g., XML) format? Similarly for highlighted text or other mark-up. Any way to get that?
For example [cid:[email protected]] Time of Essence was changed Time of Importance Peter Kronenberg | Senior AI Analytic ENGINEER C: 703.887.5623 [Torch AI]<http://www.torch.ai/> 4303 W. 119th St., Leawood, KS 66209 WWW.TORCH.AI<http://www.torch.ai/>
