Andy, Good suggestion, i will do that , I had created several executeScript (in groovy) before.
Thanks Carlos From: Andy LoPresto [mailto:alopre...@apache.org] Sent: sexta-feira, 7 de Outubro de 2016 18:21 To: users@nifi.apache.org Subject: Re: ELT on Nifi Carlos, If you are comfortable with Groovy I would suggest you look at using ExecuteScript [1] processor to prototype what you want the processor to do. That processor will take an (inline or read from file) Groovy script and execute it within the processor lifecycle. Matt Burgess has written some excellent blog posts on getting started with it [2][3]. Once you have that behaving the way you like (and feel free to continue to ask questions here), another developer would probably be able to help you convert it to a “real" custom processor. [1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.ExecuteScript/index.html [2] https://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html [3] https://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html Andy LoPresto alopre...@apache.org<mailto:alopre...@apache.org> alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 On Oct 7, 2016, at 7:20 AM, João Henrique Freitas <joa...@gmail.com<mailto:joa...@gmail.com>> wrote: Hi. Maybe a linkedin/databus client processor could be created to handle ETL. Em 06/10/2016 10:39, "Carlos Manuel Fernandes (DSI)" <carlos.antonio.fernan...@cgd.pt<mailto:carlos.antonio.fernan...@cgd.pt>> escreveu: Hi Uwe, I saw you had developed similar approach of mine. Joe Witt lunched a challenge to build a processor based on Json structure I proposed. I think we can use the code of convertJSONtoSQl processor as a template for this new processor. This new processor will belong to the category - JSONtoSQL (the convertJSONtoSQL is the first one). We can work together to reach this goal but first we must agree on the Json structure for the input. What you think? You can contact me directly. Thanks Carlos From: Uwe Geercken [mailto:uwe.geerc...@web.de<mailto:uwe.geerc...@web.de>] Sent: terça-feira, 4 de Outubro de 2016 14:42 To: users@nifi.apache.org<mailto:users@nifi.apache.org> Subject: Aw: Re: ELT on Nifi Carlos, I think that is a good point. But I would like to bring up a little different view to it: I have developed a business ruleengine (open source) written in Java and it is meanwhile in production at least at two bigger companies - they both use the Pentaho ETL tool together with the ruleengine. You can use the rules to filter/evaluate conditions and there are also actions which execute or transform data. The advantage is, that within Pentaho it is just a plugin and the business logic (or if you will also IT logic) it managed externally (through a web interface and possibly by users or superusers themselve and not by IT). This keeps a proper seperation of responsibilities of business logic and IT logic and the ETL process itself is much, much cleaner. Likewise one could think of creating a plugin for Nifi which takes a similar approach: you have a processor that in the background calls the ruleengine. It runs and deliveres the results back to the process. Instead of having complex connections between transformation processors, which clutter the Nifi desktop there would be one processor for the ruleengine (of course also multiple ones). In one of my later projects I have implemented the complete invoicing process for the company I work for using the ruleengine. The ETL is very clean and contains only IT logic (formatting of fields, splitting of fields, renaming, etc) and the rest is in external rule projects which contain the business logic. My thinking is that the devision of responsibilities for the logic and a clean ETL or in the Nifi case a clean Flow diagram is a very strong argument for this approach. Of course there is nothing to say against a mixed approach - custom processors and ruleengine - I just wanted to explain my point a little bit. Everything is available on github.com/uwegeercken<http://github.com/uwegeercken>. I could write the Nifi code for the processor I guess, but I will need some help with testing, documentation and also packaging the nar file (I am not used to Maven and have struggled in the past to create a proper nar archive). Greetings, Uwe Gesendet: Dienstag, 04. Oktober 2016 um 04:48 Uhr Von: "Matt Burgess" <mattyb...@apache.org<mailto:mattyb...@apache.org>> An: users@nifi.apache.org<mailto:users@nifi.apache.org> Betreff: Re: ELT on Nifi Carlos, The extensible nature of NiFi, whether the overall architecture was intended for ETL/ELT and/or RDBMS/DW concepts or not, means that many of these kinds of operations are welcome (but possibly not yet present) in NiFi. Some might warrant framework changes, but for a good portion, many RDBMS/DW processors are possible but just haven't been added/contributed yet. In my experience, ETL/ELT tools have focused mainly on this kind of "processor" and in contrast can't handle the level of throughput, data formats, provenance/lineage, security, and/or data integrity that NiFi can. In exchange, NiFi doesn't have as many of the RDBMS/DW-specific processors available at this time. I see a few categories (please feel free to add/change/delete/discuss), mostly having to do with tabular (row-oriented, character-delimited) data: 1) Row-level operations. This includes projections (select fields from row), alter fields (change timestamp of column 'last_updated', e.g.), add column(s), replace-with-lookup, etc. 2) Table-level operations. This includes joins, grouping/aggregates, transposition, etc. 3) Composition/Application of the other two. This includes normalization & denormalization (star/snowflake schemas, e.g.), dimension updates (Kimball's SCD Type 2, e.g.), etc. 4) Bulk Loading. These usually involve custom code (although in many cases for NiFi you can deploy a command-line tool for bulk loading to a DB and use ExecuteProcess or ExecuteStreamCommand to make it happen). These are usually native processes for getting lots of data into the DB using an end-run around their own interfaces, possibly bypassing mechanisms that NiFi embraces, such as provenance. But they are often faster than their SQL interface counterparts for large data ingest. 5) Transactions. This involves executing a number of SQL statements as an atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support this (and many have their own dialects for such things). That's a lot of feature surface to cover! Luckily we have an ever-growing community filled with folks representing a whole spectrum of experience and a shared passion for data :) I am very interested in your thoughts on where NiFi could improve on these (or other) fronts with respect to ETL/ELT, I think we can get some good discussions (and code contributions!) going on this. Alternatively, if you'd like to pursue a discussion on how to offload data transformations, I'm sure the community has thoughts on that as well. Regards, Matt P.S. I didn't include push-down optimization on the list because of its complexity and in NiFi terms involves things like dynamic flow-rewrites and other magic that IMHO is against the design principles of NiFi itself (simplicity, accountability, e.g.). On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <carlos.antonio.fernan...@cgd.pt<mailto:carlos.antonio.fernan...@cgd.pt>> wrote: Hi all, When i saw Nifi for the first time , I try to build a classical ETL/ELT flow , and this question is recurrent for the new users. Nifi has very good processors for the Extract and Load, the problem arise on Transform, because in ETL/ELT tools there are specific “processors” (ex: map, SCD, etc.) binded to DW concepts and sometimes binded to a specific database (ex: SCDNetezza) . The Transformer processors in Nifi are general purpose and not correlated with this concepts. The immediate solution is to create a lot of Custom script processors but the metadata of ELT (sql) turn attributes or code of processors, not an ideal solution. But, If we put the logic of Transform outside of Nifi, for example in some Json structure , then its relative easy, construct a ELT NIFI Template capable of run a generic ELT flows. Example of a ELT JSon Structure (the “steps” inside the “flow” are to be executed on PutSql in the same transaction) { "Transformer": [{ "name": "foo1", "type": "Map", "description": "Summarize the table foo from table bar", "flow": [{ "step": 1, "description": "delete all data", "stmt": "delete from foo" }, { "step": 2, "Description": "Count f2 by f1", "stmt": "insert into foo(c1, c2) select c1,sum(c2) from bar group by c1" }] }, { "name": "foo2", "type": "SCD- Slowly change Dimensions type 1", "description": "Update a prod table based on stage table", "flow": [{ "step": 1, "description": "Process type 1", "stmt": "Update Prod Set Prod.columns = Stage.Columns From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 " }] }] } Example of a NIFI template who execute that Json structure : <image001.png> This make sense? Give me feedback. Carlos