Guys, I'm using the latest 2.8.1 release. Thanks
On Fri, Oct 6, 2017 at 6:05 PM, Dileepa Jayakody <[email protected]> wrote: > Hi All, > > I'm trying out a small demo, with a file system repository connector and > elastic search output connector to extract spreadsheet documents and index. > I've also added tika transform connector in the job. > > When I run the documents get indexed in elastic-search but the content is > been indexed in binary. > > See below the indexed content in ES. Can I please know how to extract the > spread-sheet content to text format here? > Even for a text file, I see the content is been indexed as binary. > Is there a configuration I need to do here to get the text content > extracted and indexed in ES? > > { > "_index": "test", > "_type": "generictype", > "_id": "file:/home/dileepa/Documents/hackathon/test_data/MI%20-% > 20Project2%20-%20Estimation%20v1.0.xlsx", > "_score": 1, > "_source": { > "stream_size": "101613", > "X-Parsed-By": "org.apache.tika.parser.DefaultParser", > "stream_name": "MI - Project2 - Estimation v1.0.xlsx", > "protected": "false", > "resourceName": "MI - Project2 - Estimation v1.0.xlsx", > "uri": "/home/dileepa/Documents/hackathon/test_data/MI - > Project2 - Estimation v1.0.xlsx", > "Content-Type": "application/vnd.openxmlformats-officedocument. > spreadsheetml.sheet", > "content_type": "application/vnd.openxmlformats-officedocument. > spreadsheetml.sheet", > "allow_token_document": "__nosecurity__", > "deny_token_document": "__nosecurity__", > "allow_token_share": "__nosecurity__", > "deny_token_share": "__nosecurity__", > "allow_token_parent": "__nosecurity__", > "deny_token_parent": "__nosecurity__", > "file": { > "_content_type": "application/vnd. > openxmlformats-officedocument.spreadsheetml.sheet", > "_name": "MI - Project2 - Estimation v1.0.xlsx", > "_content": "RGV2ZWxvcG1lbnQgRXN0aW1hdGVzCg > lTZWN0aW9uCUZlYXR1cmUJQXNzdW1wdGlvbnMgYW5kIHNjb3BlCUFkZGl0aW > 9uYWwgaJlYWxpMAkwCTAJ....." > } > } > ] > } > } > > Thanks, > Dileepa >
