Carlos,

If you are comfortable with Groovy I would suggest you look at using 
ExecuteScript [1] processor to prototype what you want the processor to do. 
That processor will take an (inline or read from file) Groovy script and 
execute it within the processor lifecycle. Matt Burgess has written some 
excellent blog posts on getting started with it [2][3].

Once you have that behaving the way you like (and feel free to continue to ask 
questions here), another developer would probably be able to help you convert 
it to a “real" custom processor.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.ExecuteScript/index.html
[2] 
https://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html 
<https://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html>
[3] 
https://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html
 
<https://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html>


Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 7, 2016, at 7:20 AM, João Henrique Freitas <[email protected]> wrote:
> 
> Hi.
> 
> Maybe a linkedin/databus client processor could be created to handle ETL.
> 
> 
> Em 06/10/2016 10:39, "Carlos Manuel Fernandes (DSI)" 
> <[email protected] <mailto:[email protected]>> 
> escreveu:
> Hi Uwe,
> 
> 
> 
> I saw you had developed similar approach of mine. Joe Witt lunched a 
> challenge  to build a processor based on Json structure I proposed.
> 
> 
> 
> I think  we can use the code of convertJSONtoSQl processor as a template for 
> this new processor.  This new processor will belong  to the category  - 
> JSONtoSQL (the convertJSONtoSQL is the first one).
> 
> 
> 
> We can  work together to reach this goal but first we must agree on the Json 
> structure for the input.
> 
> 
> 
> What you think?  You can contact me directly.
> 
> 
> 
> Thanks
> 
> 
> 
> Carlos
> 
> 
> 
> From: Uwe Geercken [mailto:[email protected] <mailto:[email protected]>]
> Sent: terça-feira, 4 de Outubro de 2016 14:42
> To: [email protected] <mailto:[email protected]>
> Subject: Aw: Re: ELT on Nifi
> 
> 
> 
> Carlos,
> 
> 
> 
> I think that is a good point.
> 
> 
> 
> But I would like to bring up a little different view to it:
> 
> 
> 
> I have developed a business ruleengine (open source) written in Java and it 
> is meanwhile in production at least at two bigger companies - they both use 
> the Pentaho ETL tool together with the ruleengine. You can use the rules to 
> filter/evaluate conditions and there are also actions which execute or 
> transform data. The advantage is, that within Pentaho it is just a plugin and 
> the business logic (or if you will also IT logic) it managed externally 
> (through a web interface and possibly by users or superusers themselve and 
> not by IT). This keeps a proper seperation of responsibilities of business 
> logic and IT logic and the ETL process itself is much, much cleaner.
> 
> 
> 
> Likewise one could think of creating a plugin for Nifi which takes a similar 
> approach: you have a processor that in the background calls the ruleengine. 
> It runs and deliveres the results back to the process. Instead of having 
> complex connections between transformation processors, which clutter the Nifi 
> desktop there would be one processor for the ruleengine (of course also 
> multiple ones).
> 
> 
> 
> In one of my later projects I have implemented the complete invoicing process 
> for the company I work for using the ruleengine. The ETL is very clean and 
> contains only IT logic (formatting of fields, splitting of fields, renaming, 
> etc) and the rest is in external rule projects which contain the business 
> logic.
> 
> 
> 
> My thinking is that the devision of responsibilities for the logic and a 
> clean ETL or in the Nifi case a clean Flow diagram is a very strong argument 
> for this approach.
> 
> 
> 
> Of course there is nothing to say against a mixed approach - custom 
> processors and ruleengine - I just wanted to explain my point a little bit. 
> Everything is available on github.com/uwegeercken 
> <http://github.com/uwegeercken>.
> 
> 
> 
> I could write the Nifi code for the processor I guess, but I will need some 
> help with testing, documentation and also packaging the nar file (I am not 
> used to Maven and have struggled in the past to create a proper nar archive).
> 
> 
> 
> Greetings,
> 
> 
> 
> Uwe
> 
> 
> 
> Gesendet: Dienstag, 04. Oktober 2016 um 04:48 Uhr
> Von: "Matt Burgess" <[email protected] <mailto:[email protected]>>
> An: [email protected] <mailto:[email protected]>
> Betreff: Re: ELT on Nifi
> 
> Carlos,
> 
> 
> 
> The extensible nature of NiFi, whether the overall architecture was intended 
> for ETL/ELT and/or RDBMS/DW concepts or not, means that many of these kinds 
> of operations are welcome (but possibly not yet present) in NiFi. Some might 
> warrant framework changes, but for a good portion, many RDBMS/DW processors 
> are possible but just haven't been added/contributed yet. In my experience, 
> ETL/ELT tools have focused mainly on this kind of "processor" and in contrast 
> can't handle the level of throughput, data formats, provenance/lineage, 
> security, and/or data integrity that NiFi can. In exchange, NiFi doesn't have 
> as many of the RDBMS/DW-specific processors available at this time. I see a 
> few categories (please feel free to add/change/delete/discuss), mostly having 
> to do with tabular (row-oriented, character-delimited) data:
> 
> 
> 
> 1) Row-level operations. This includes projections (select fields from row), 
> alter fields (change timestamp of column 'last_updated', e.g.), add 
> column(s), replace-with-lookup, etc.
> 
> 2) Table-level operations. This includes joins, grouping/aggregates, 
> transposition, etc.
> 
> 3) Composition/Application of the other two. This includes normalization & 
> denormalization (star/snowflake schemas, e.g.), dimension updates (Kimball's 
> SCD Type 2, e.g.), etc.
> 
> 4) Bulk Loading. These usually involve custom code (although in many cases 
> for NiFi you can deploy a command-line tool for bulk loading to a DB and use 
> ExecuteProcess or ExecuteStreamCommand to make it happen). These are usually 
> native processes for getting lots of data into the DB using an end-run around 
> their own interfaces, possibly bypassing mechanisms that NiFi embraces, such 
> as provenance. But they are often faster than their SQL interface 
> counterparts for large data ingest.
> 
> 5) Transactions. This involves executing a number of SQL statements as an 
> atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support 
> this (and many have their own dialects for such things).
> 
> 
> 
> That's a lot of feature surface to cover! Luckily we have an ever-growing 
> community filled with folks representing a whole spectrum of experience and a 
> shared passion for data :)  I am very interested in your thoughts on where 
> NiFi could improve on these (or other) fronts with respect to ETL/ELT, I 
> think we can get some good discussions (and code contributions!) going on 
> this. Alternatively, if you'd like to pursue a discussion on how to offload 
> data transformations, I'm sure the community has thoughts on that as well.
> 
> 
> 
> Regards,
> 
> Matt
> 
> 
> 
> P.S. I didn't include push-down optimization on the list because of its 
> complexity and in NiFi terms involves things like dynamic flow-rewrites and 
> other magic that IMHO is against the design principles of NiFi itself 
> (simplicity, accountability, e.g.).
> 
> 
> 
> On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) 
> <[email protected] <mailto:[email protected]>> 
> wrote:
> 
> Hi all,
> 
> 
> 
> When i saw Nifi for the first time , I try to build  a classical ETL/ELT flow 
> , and this question is recurrent for the new users.
> 
> 
> 
> Nifi has very good processors for the Extract and Load, the problem arise on 
> Transform, because in ETL/ELT  tools there are specific “processors”  (ex: 
> map, SCD, etc.)  binded to DW concepts  and sometimes binded  to a specific 
> database (ex: SCDNetezza) . The Transformer processors in Nifi  are general 
> purpose  and not correlated with  this concepts. The immediate solution is to 
> create a lot of Custom script processors but  the metadata of ELT (sql) turn 
> attributes or code of processors, not an ideal solution.
> 
> 
> 
> But, If we put  the logic of Transform  outside of Nifi, for example in some 
> Json structure , then its relative easy, construct a ELT NIFI Template 
> capable of run a generic ELT flows.
> 
> 
> 
> Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be 
> executed on PutSql in the same transaction)
> 
> {
> 
>        "Transformer": [{
> 
>              "name": "foo1",
> 
>              "type": "Map",
> 
>              "description": "Summarize the table foo from table bar",
> 
>              "flow": [{
> 
>                     "step": 1,
> 
>                     "description": "delete all data",
> 
>                     "stmt": "delete from  foo"
> 
>              }, {
> 
>                     "step": 2,
> 
>                     "Description": "Count f2 by f1",
> 
>                     "stmt": "insert into foo(c1, c2) select c1,sum(c2) from 
> bar group by c1"
> 
>              }]
> 
>        }, {
> 
>              "name": "foo2",
> 
>              "type": "SCD- Slowly change Dimensions type 1",
> 
>              "description": "Update a prod table based on stage table",
> 
>              "flow": [{
> 
>                     "step": 1,
> 
>                     "description": "Process type 1",
> 
>                     "stmt": "Update Prod Set Prod.columns = Stage.Columns 
> From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
> 
>              }]
> 
>        }]
> 
> }
> 
> 
> 
> Example of a  NIFI template who execute that Json structure :
> 
> 
> 
> <image001.png>
> 
> 
> 
> 
> 
> This make sense?  Give me feedback.
> 
> 
> 
> Carlos
> 
> 
> 
> 
> 
> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

  • Re: ELT on Nifi Andy LoPresto

Reply via email to