When Tika extracts from a Microsoft Word document, deleted text is extracted, 
with no indication that it is deleted.  In fact, if a word was deleted and 
replaced by another word, both words just show up side-by-side.  Is there a way 
to get some sort of annotation that indicates the status of the text?  Or 
extract it in some sort of structured (e.g., XML) format?  Similarly for 
highlighted text or other mark-up.  Any way to get that?

For example
[cid:[email protected]]

Time of Essence was changed Time of Importance

Peter Kronenberg  |  Senior AI Analytic ENGINEER
C: 703.887.5623
[Torch AI]<http://www.torch.ai/>
4303 W. 119th St., Leawood, KS 66209
WWW.TORCH.AI<http://www.torch.ai/>


Reply via email to