Re: Proposal for ElasticSearch support

2019-02-20 Thread Mike Thomsen
Matt, I think we could proceed like this: 1. Deprecate v2 and v5 in 1.10 2. Remove v5 from the assembly in 1.11. 3. Announce that v2 will be removed from the assembly in 1.13 or 1.14--whichever coincides with the end of Q4/early Q1 2020. That gives us a whole year to work on the REST API bundle

Re: Proposal for ElasticSearch support

2019-02-20 Thread Mike Thomsen
Normally, I would agree with that, but ES is very popular so I wanted to give users a chance to shout "hey don't do that because [reason]." On Wed, Feb 20, 2019 at 10:27 AM Joe Witt wrote: > ...probably for dev thread but I am starting to think we should just start > removing certain nars from

Re: How do you use ElasticSearch with NiFi?

2019-02-20 Thread Mike Thomsen
> I haven't used any special processor dedicated to ES, just HTTP Request processor, Why did you decide to do that? Thanks, Mike On Wed, Feb 20, 2019 at 11:55 AM Luis Carmona wrote: > Hi everyone, > > I've been using Nifi for the last 6 months interacting with ES 6.X. Made > queries, read

Re: How do you use ElasticSearch with NiFi?

2019-02-20 Thread Luis Carmona
Hi everyone, I've been using Nifi for the last 6 months interacting with ES 6.X. Made queries, read and write data to it. All of that in production environments. I haven't used any special processor dedicated to ES, just HTTP Request processor, and everything has worked nicely. In terms of

Re: How do you use ElasticSearch with NiFi?

2019-02-20 Thread Mike Thomsen
I've got a PR for a new bulk ingest processor, so I could easily add batching the record ingest to that plus something like your PR. I think it might be useful to have some enforcement mechanisms that prevent a request from being way too big. Last documentation I saw said about 32MB/payload. What

Re: How do you use ElasticSearch with NiFi?

2019-02-20 Thread Joe Percivall
Hey Mike, As a data point, we're ingesting into ES v6 using PutElasticsearchHttp and PutElasticsearchHttpRecord. We do almost no querying of anything in ES using NiFi. Continued improvement around ingesting into ES would be our core use-case. One item that frustrated me was the issue around

Re: Versioned flows not maintaining Load Balance Strategy

2019-02-20 Thread Chad Woodhead
Bryan, Wanted to update you. Spoke with Hortonworks and they informed me the version definitions included in the HDF 3.3.x versions used NiFi Registry 0.3.0 because 0.3.0 was supposed to go in those releases. Unfortunately it was discovered that NiFi Registry was pointed at the wrong repo and

Re: merging flowfiles?

2019-02-20 Thread Joe Witt
Hello You could put a funnel for your two jsonpath processors to send to then the funnel to mergecontent. That at least addresses the multi input paths comment. Whether your data can simply be merged like this or not is possibly another matter but I presume you have that in hand. Thanks On

Re: merging flowfiles?

2019-02-20 Thread Jerry Vinokurov
As pictured, your setup will not work because MergeContent will not bin the two connections together. What you'll want to do is to route both connections through a funnel, which will turn your two connections into one. Then route the output of the funnel to MergeContent. On Wed, Feb 20, 2019 at

Re: Proposal for ElasticSearch support

2019-02-20 Thread Otto Fowler
Maybe this would be a nice first use case for that strategy, we can wrap it up from top to bottom. On February 20, 2019 at 10:27:40, Joe Witt (joe.w...@gmail.com) wrote: ...probably for dev thread but I am starting to think we should just start removing certain nars from the convenience

Re: Proposal for ElasticSearch support

2019-02-20 Thread Joe Witt
...probably for dev thread but I am starting to think we should just start removing certain nars from the convenience build/assembly and documenting it in the migration guide for users that need those. We can then show them how to use the hot loading/etc.. Thanks On Wed, Feb 20, 2019 at 10:04

Re: Proposal for ElasticSearch support

2019-02-20 Thread Matt Burgess
+1 for deprecating both the v2 and v5 (those using a transport client) components in 1.10, to be removed later. What do you think about refactoring the top-level ES bundle into 3 (v2, v5, REST) and creating profiles (deactivated by default) for the v2 and v5 bundles? I guess that could wait until

Re: Proposal for ElasticSearch support

2019-02-20 Thread Otto Fowler
I think that there should be specific documentation guidance around this, like “Picking the right Elasticsearch Processors” to avoid issues. On February 20, 2019 at 08:02:18, Mike Thomsen (mikerthom...@gmail.com) wrote: I would like to mark the v5 Elastic bundle as deprecated in 1.10. Per

Re: Avoid duplicate rows when inserting into table

2019-02-20 Thread Mike Thomsen
I think the SQL processors other than PutDatabaseRecord also support "upsert" functionality, so that might also help. On Wed, Feb 20, 2019 at 8:33 AM Mike Thomsen wrote: > The easiest way to do this would be to create a UNIQUE constraint on the > project name and just send one insert at a time.

Re: Avoid duplicate rows when inserting into table

2019-02-20 Thread Mike Thomsen
The easiest way to do this would be to create a UNIQUE constraint on the project name and just send one insert at a time. Then each individual failed insert will get routed to failure. For the sake of safety here, if you have multiple flows that feed into a common SQL ingest point, you might want

How do you use ElasticSearch with NiFi?

2019-02-20 Thread Mike Thomsen
I'm looking for feedback from ElasticSearch users on how they use and how they **want** to use ElasticSearch v5 and newer with NiFi. So please respond with some use cases and what you want, what frustrates you, etc. so I can prioritize Jira tickets for the ElasticSearch REST API bundle. (Note:

Proposal for ElasticSearch support

2019-02-20 Thread Mike Thomsen
I would like to mark the v5 Elastic bundle as deprecated in 1.10. Per Elastic's official guidelines, the transport API--which it uses--is deprecated in Elastic 7 and to be removed from at least public accessibility in Elastic 8.

Re: Avoid duplicate rows when inserting into table

2019-02-20 Thread Adam Fisher
Maybe you could use something like SplitRecord, DetectDuplicate, MergeRecord to get the file how you want it. This would split it into smaller FlowFiles, check if the record has been seen before and keep only unique ones and then merge them back into one file. I'm actually collaborating with