RE: Deleted text in Word document

2021-08-27 Thread Peter Kronenberg
No, it doesn't appear to. Here's what I get 12.2 Time of EssenceImportance. Time is of the essenceimportance with respect . Peter Kronenberg  |  SENIOR AI ANALYTIC ENGINEER  C: 703.887.5623 4303 W. 119th St., Leawood, KS 66209 WWW.TORCH.AI -Original Message- From: Nick Burch

Re: Deleted text in Word document

2021-08-27 Thread Nick Burch
On Fri, 27 Aug 2021, Peter Kronenberg wrote: When Tika extracts from a Microsoft Word document, deleted text is extracted, with no indication that it is deleted. In fact, if a word was deleted and replaced by another word, both words just show up side-by-side. Is there a way to get some sort

Form fields and other issues with PDF files

2021-08-27 Thread Peter Kronenberg
* When extracting text from PDF files (no OCR), there doesn't seem to be any way to link the text that was filled in with the name of the form field. For example, if there is a field marked 'First Name' and the user fills that in, they likely appear on different lines and different

Deleted text in Word document

2021-08-27 Thread Peter Kronenberg
When Tika extracts from a Microsoft Word document, deleted text is extracted, with no indication that it is deleted. In fact, if a word was deleted and replaced by another word, both words just show up side-by-side. Is there a way to get some sort of annotation that indicates the status of