Hi Deepak If you're using a later version of ES, you can just add the Ingest Plugin to ES. Alternatively, add a field name for the Content field in the MFC ES configuration. I'll check it when I get back. Steph
-----Original Message----- From: "Dileepa Jayakody" <[email protected]> Sent: 06/10/2017 07:39 To: "[email protected]" <[email protected]> Subject: Re: How to extract text content and index in elastic-search Guys, I'm using the latest 2.8.1 release. Thanks On Fri, Oct 6, 2017 at 6:05 PM, Dileepa Jayakody <[email protected]> wrote: Hi All, I'm trying out a small demo, with a file system repository connector and elastic search output connector to extract spreadsheet documents and index. I've also added tika transform connector in the job. When I run the documents get indexed in elastic-search but the content is been indexed in binary. See below the indexed content in ES. Can I please know how to extract the spread-sheet content to text format here? Even for a text file, I see the content is been indexed as binary. Is there a configuration I need to do here to get the text content extracted and indexed in ES? { "_index": "test", "_type": "generictype", "_id": "file:/home/dileepa/Documents/hackathon/test_data/MI%20-%20Project2%20-%20Estimation%20v1.0.xlsx", "_score": 1, "_source": { "stream_size": "101613", "X-Parsed-By": "org.apache.tika.parser.DefaultParser", "stream_name": "MI - Project2 - Estimation v1.0.xlsx", "protected": "false", "resourceName": "MI - Project2 - Estimation v1.0.xlsx", "uri": "/home/dileepa/Documents/hackathon/test_data/MI - Project2 - Estimation v1.0.xlsx", "Content-Type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "allow_token_document": "__nosecurity__", "deny_token_document": "__nosecurity__", "allow_token_share": "__nosecurity__", "deny_token_share": "__nosecurity__", "allow_token_parent": "__nosecurity__", "deny_token_parent": "__nosecurity__", "file": { "_content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "_name": "MI - Project2 - Estimation v1.0.xlsx", "_content": "RGV2ZWxvcG1lbnQgRXN0aW1hdGVzCglTZWN0aW9uCUZlYXR1cmUJQXNzdW1wdGlvbnMgYW5kIHNjb3BlCUFkZGl0aW9uYWwgaJlYWxpMAkwCTAJ....." } } ] } } Thanks, Dileepa
