Hi Deepak
If you're using a later version of ES, you can just add the Ingest Plugin to ES.
Alternatively, add a field name for the Content field in the MFC ES 
configuration.
I'll check it when I get back.
Steph

-----Original Message-----
From: "Dileepa Jayakody" <[email protected]>
Sent: ‎06/‎10/‎2017 07:39
To: "[email protected]" <[email protected]>
Subject: Re: How to extract text content and index in elastic-search

Guys, I'm using the latest 2.8.1 release.


Thanks



On Fri, Oct 6, 2017 at 6:05 PM, Dileepa Jayakody <[email protected]> 
wrote:

Hi All,


I'm trying out a small demo, with a file system repository connector and 
elastic search output connector to extract spreadsheet documents and index.

I've also added tika transform connector in the job.



When I run the documents get indexed in elastic-search but the content is been 
indexed in binary.


See below the indexed content in ES. Can I please know how to extract the 
spread-sheet content to text format here? 

Even for a text file, I see the content is been indexed as binary. 

Is there a configuration I need to do here to get the text content extracted 
and indexed in ES?


{
        "_index": "test",
        "_type": "generictype",
        "_id": 
"file:/home/dileepa/Documents/hackathon/test_data/MI%20-%20Project2%20-%20Estimation%20v1.0.xlsx",
        "_score": 1,
        "_source": {
          "stream_size": "101613",
          "X-Parsed-By": "org.apache.tika.parser.DefaultParser",
          "stream_name": "MI - Project2 - Estimation v1.0.xlsx",
          "protected": "false",
          "resourceName": "MI - Project2 - Estimation v1.0.xlsx",
          "uri": "/home/dileepa/Documents/hackathon/test_data/MI - Project2 - 
Estimation v1.0.xlsx",
          "Content-Type": 
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
          "content_type": 
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
          "allow_token_document": "__nosecurity__",
          "deny_token_document": "__nosecurity__",
          "allow_token_share": "__nosecurity__",
          "deny_token_share": "__nosecurity__",
          "allow_token_parent": "__nosecurity__",
          "deny_token_parent": "__nosecurity__",
          "file": {
            "_content_type": 
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
            "_name": "MI - Project2 - Estimation v1.0.xlsx",
            "_content": 
"RGV2ZWxvcG1lbnQgRXN0aW1hdGVzCglTZWN0aW9uCUZlYXR1cmUJQXNzdW1wdGlvbnMgYW5kIHNjb3BlCUFkZGl0aW9uYWwgaJlYWxpMAkwCTAJ....."
        }
      }
    ]
  }
}


Thanks,

Dileepa

Reply via email to