Johny, I have a template that I created before the Elasticsearch processors were available, it uses InvokeHttp to do a query, then later uses InvokeHttp to get the individual files (if you didn't ask for the full doc text to be returned by the query). The latter one can be replaced with FetchElasticsearch or FetchElasticsearchHttp, and after 1.1.0 comes out, the first one can be replaced by either QueryElasticsearchHttp or ScrollElasticsearchHttp (depending on how you want to page the results). For now, it sounds like you want the first part of the flow, to create a flow file and configure the InvokeHttp processor to query an ES index, then parse the JSON results.
I put the template up as a gist: https://gist.github.com/mattyb149/f612d052adb07434c975e4f930a995eb Regards, Matt On Wed, Oct 26, 2016 at 12:50 PM, johny casanova <[email protected]> wrote: > Matt, > > I'm trying out the 1.0 version of nifi. I'm trying to get documents using > the FetchElasticSearch(Http) Maybe that's the problem I'm having. I was not > aware or noticed in the docs mentioning to use the invokehttp. So basically > what I'm trying to do is get all the syslogs in a specific index using nifi > then store them on HDFS. > > On Tue, Oct 25, 2016 at 6:34 PM, Matt Burgess <[email protected]> wrote: >> >> Johny, >> >> What version of NiFi are you using? Also are you trying to get >> documents from ES using FetchElasticSearch(Http) or put docs to it >> using PutElasticsearch(Http)? For Fetching, the Document Identifier >> is the _id of the document you want to retrieve. If you're looking to >> do a search on documents from a given index, type, etc. then (before >> NiFi 1.1.0 comes out) you'd have to use InvokeHttp to interact with >> the Elasticsearch REST API, then parse the response to get the >> document identifiers for each of the results and put that into >> FetchElasticsearch. NiFi 1.1.0 will have QueryElasticsearchHttp and >> ScrollElasticsearchHttp [1], which are made for getting results from >> searches vs direct "gets" (via FetchES). Out of curiosity, what REST >> endpoint are you using with curl? >> >> If you are trying to put docs into ES, then the field is named >> Document Identifier Attribute, and that refers to the name of a >> FlowFile attribute whose value is the identifier you want to use for >> the document (whose body is the content of the FlowFile). >> PutElasticsearchHttp supports leaving that field blank when adding to >> an index (the ID will be auto-generated), but it is an open issue [2] >> to support auto-generation in PutElasticsearch. >> >> Does this answer your question? If not please let me know and I can >> provide more info. >> >> Regards, >> Matt >> >> [1] https://issues.apache.org/jira/browse/NIFI-2417 >> [2] https://issues.apache.org/jira/browse/NIFI-1576 >> >> On Tue, Oct 25, 2016 at 2:36 PM, johny casanova >> <[email protected]> wrote: >> > >> > >> > >> > Hello, >> > >> > Do you guys have an example config of how this processor should look? I >> > have >> > a regular easticsearch install that is only receiving syslogs. I'm >> > trying to >> > figure out how to find or what to put for document identifier. I did a >> > curl >> > in elasticsearch and saw a field "id" but, it does not look like that >> > works. >> > > >
