Re: AVRO is the only output format with ExecuteSQL

2018-08-13 Thread Boris Tyukin
great, thanks all! nice tool, Otto! On Mon, Aug 13, 2018 at 9:15 AM Otto Fowler wrote: > This script: > https://github.com/ottobackwards/Metron-and-Nifi-Scripts/blob/master/nifi/checkout-nifi-pr > will let you checkout any NIFI PR to a local directory and build it. > > Just: > cd tmp >

Re: AVRO is the only output format with ExecuteSQL

2018-08-13 Thread Otto Fowler
This script: https://github.com/ottobackwards/Metron-and-Nifi-Scripts/blob/master/nifi/checkout-nifi-pr will let you checkout any NIFI PR to a local directory and build it. Just: cd tmp checkout-nifi-pr 2945 Maybe useful. On August 13, 2018 at 08:36:04, Boris Tyukin (bo...@boristyukin.com)

Re: AVRO is the only output format with ExecuteSQL

2018-08-13 Thread Matt Burgess
Haha thanks, but I can't take credit for that much throughput ;) I moved 99% of ExecuteSQL out to a base class, since the main difference was a line or two of code to do the actual write and update the attributes, then the two processors just contain the differences in logic and properties between

Re: AVRO is the only output format with ExecuteSQL

2018-08-13 Thread Mike Thomsen
Boris, Yeah, you can fork either his branch or his entire repo and try it out. Also, usual caveat: user beware until it passes code review... Mike On Mon, Aug 13, 2018 at 8:36 AM Boris Tyukin wrote: > Matt, you are awesome! 15 files changes and 3k lines of code - man, do not > tell me you did

Re: AVRO is the only output format with ExecuteSQL

2018-08-13 Thread Boris Tyukin
Matt, you are awesome! 15 files changes and 3k lines of code - man, do not tell me you did that in just a few days :) since it has not been merged yet with the master, can I just use your personal branch to compile entire nifi? or is it better to cherry pick your commit into master? I would like

Re: AVRO is the only output format with ExecuteSQL

2018-08-10 Thread Matt Burgess
Boris et al, I put up a PR [1] to add ExecuteSQLRecord and QueryDatabaseTableRecord under NIFI-4517, in case anyone wants to play around with it :) Regards, Matt [1] https://github.com/apache/nifi/pull/2945 On Tue, Aug 7, 2018 at 8:30 PM Boris Tyukin wrote: > > Matt, you rock!! thank you!! > >

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Boris Tyukin
Matt, you rock!! thank you!! On Tue, Aug 7, 2018 at 5:16 PM Matt Burgess wrote: > Sounds good, it makes the underlying code a bit more complicated but I see > from y’all’s points that a “separate” processor is a better user > experience. I’m knee deep in it as we speak, hope to have a PR up in

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Matt Burgess
Sounds good, it makes the underlying code a bit more complicated but I see from y’all’s points that a “separate” processor is a better user experience. I’m knee deep in it as we speak, hope to have a PR up in a few days. Thanks, Matt > On Aug 7, 2018, at 5:07 PM, Andrew Grande wrote: > >

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Andrew Grande
I'd really like to see the Record suffix on the processor for discoverability, as already mentioned. Andrew On Tue, Aug 7, 2018, 2:16 PM Matt Burgess wrote: > Yeah that's definitely doable, most of the logic for writing a > ResultSet to a Flow File is localized (currently to JdbcCommon but >

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Matt Burgess
Yeah that's definitely doable, most of the logic for writing a ResultSet to a Flow File is localized (currently to JdbcCommon but also in ResultSetRecordSet), so I wouldn't think it would be too much refactor. What are folks thoughts on whether to add a Record Writer property to the existing

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Boris Tyukin
now this is really slick! thanks Mark for educating me! On Tue, Aug 7, 2018 at 1:15 PM Mark Payne wrote: > Boris, > > Using a Record-based processor does not mean that you need to define a > schema upfront. This is > necessary if the source itself cannot provide a schema. However, since it > is

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Andy LoPresto
Matt, Would extending the core ExecuteSQL processor with an ExecuteSQLRecord processor also work? I wonder about discoverability if only one processor is present and in other places we explicitly name the processors which handle records as such. If the ExecuteSQL processor handled all the SQL

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Andrew Grande
As a side note, one has to ha e a serious justification _not_ to use record-based processors. The benefits, including performance, are too numerous to call out here. Andrew On Tue, Aug 7, 2018, 1:15 PM Mark Payne wrote: > Boris, > > Using a Record-based processor does not mean that you need to

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Mark Payne
Boris, Using a Record-based processor does not mean that you need to define a schema upfront. This is necessary if the source itself cannot provide a schema. However, since it is pulling structured data and the schema can be inferred from the database, you wouldn't need to. As Matt was saying,

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Boris Tyukin
thanks for all the responses! it means I am not the only one interested in this topic. Record-aware version would be really nice, but a lot of times I do not want to use record-based processors since I need to define a schema for input/output upfront and just want to run SQL query and get

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Matt Burgess
I'm definitely interested in supporting a record-aware version as well (I wrote the Jira up last year [1] but haven't gotten around to implementing it), however I agree with Peter's comment on the Jira. Since ExecuteSQL is an oft-touched processor, if we had two processors that only differed in

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Bryan Bende
I would also add that the pattern of splitting to 1 record per flow file was common before the record processors existed, and generally this can/should be avoided now in favor of processing/manipulating records in place, and keeping them together in large batches. On Tue, Aug 7, 2018 at 9:10

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Andrew Grande
Careful, that makes too much sense, Joe ;) On Tue, Aug 7, 2018, 8:45 AM Joe Witt wrote: > i think we just need to make an ExecuteSqlRecord processor. > > thanks > > On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen wrote: > >> My guess is that it is due to the fact that Avro is the only record type >>

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Joe Witt
i think we just need to make an ExecuteSqlRecord processor. thanks On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen wrote: > My guess is that it is due to the fact that Avro is the only record type > that can match sql pretty closely feature to feature on data types. > On Tue, Aug 7, 2018 at 8:33 AM

Re: AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Mike Thomsen
My guess is that it is due to the fact that Avro is the only record type that can match sql pretty closely feature to feature on data types. On Tue, Aug 7, 2018 at 8:33 AM Boris Tyukin wrote: > I've been wondering since I started learning NiFi why ExecuteSQL processor > only returns AVRO

AVRO is the only output format with ExecuteSQL

2018-08-07 Thread Boris Tyukin
I've been wondering since I started learning NiFi why ExecuteSQL processor only returns AVRO formatted data. All community examples I've seen then convert AVRO to json and pretty much all of them then split json to multiple flows. I found myself doing the same thing over and over and over again.