Hi Joe, You are absolutely right what we are really looking for is for a "Job oriented" model.
I understand Nifi was not designed with this use case in mind but is this something that might be worked on in the future? One of the things we would like to see (coming from a Job scheduler such as Azkaban) is a DAG or tree view for a given process group hierarchy. Having said that, I am wondering what the "best practices" are (if any) for organizing process groups in NiFi or how best to implement ETL/ELT "Job" oriented workflows in NiFI. I am keen to know your thoughts and if you can expand on the current capabilities of NiFi that we can leverage for this use case. Regards, Roger -----Original Message----- From: Joe Witt [mailto:[email protected]] Sent: Tuesday, 4 October 2016 12:08 PM To: [email protected] Subject: Re: NiFi operational interface Roger, For people coming to NiFi from traditional ETL/ELT systems we're seeing a couple patterns emerge. They tend to want to view each flow discretely and have used other systems that way and are trying to use NiFi that way. And they tend to want a variety of relational data/database type constructs. We'll focus on the first one here since that is what this thread is about. The model you're talking about sounds like a 'job oriented' model. We've had some people ask about the process groups per job type approach and we've also had people ask about giving completely isolated canvas altogether. This isn't exactly the model under which NiFi was designed as you've rightly pointed out the user experience isn't really matching that. That said, NiFi has been used to handle hundreds or even thousands of different dataflows all through the same system/cluster. But, the flows weren't built on a per job basis. Common steps among the various flows were factored into things like process groups and those groups were connected as necessary. The data from these many different flows were routed through common or appropriate sections/groups as appropriate. Given that NiFi itself allows you to use context, content, and metadata we have a lot of flexibility in how data is acquired, routed, transformed, and delivered. It is just a different way of approaching these cases. So, that said I'd like to keep the conversation going to explore what sorts of things we could/should be doing to better align with the model you'd like to see. But, I'd also like to talk with you about whether there are opportunities to approach the problem you're trying to solve in a different way as well. I suspect the right answer is going to be some combination of these different perspectives. Thanks Joe On Mon, Oct 3, 2016 at 10:23 AM, Corey Flowers <[email protected]> wrote: > I think that if you were able to stop/start/configure via the summary > page, it would give you the UI functionality that you are looking for, > in terms of the stopping and starting of process groups from the UI. I > also believe this would help for faster change times of individual processors. > > On Mon, Oct 3, 2016 at 5:52 AM, Pierre Villard > <[email protected]> > wrote: >> >> In the Summary view, you have a tab "Process Group" but given the >> number of process groups you have it may not be ideal. >> In your situation, I guess, leveraging the REST API is probably the >> best way to go. >> >> Pierre >> >> 2016-10-03 10:19 GMT+02:00 Roger Marin >> <[email protected]>: >>> >>> Hi Pierre, >>> >>> Yes, the search toolbar in the canvas works but it seems to be >>> searching across processors whereas what we really want to see is an >>> interface centered around process groups. >>> >>> We encapsulate all of our etl/elt jobs in individual process groups >>> based around a specific data source, hence why we really need a "process >>> group" >>> focused UI that can allow us to search for and visualize process >>> groups as well as to quickly start, stop and schedule a process >>> group, we can get away with going to the canvas view to see the details of >>> a specific processor. >>> >>> I'm open to any ideas or comments from anyone doing similar things >>> in production, from what we have seen so far (and please correct me >>> if Im >>> wrong) when we are talking about having hundreds or even thousands >>> of process groups running, the canvas view is perhaps not the best >>> tool for someone who needs to monitor and support a production >>> ingestion batch on a per source basis. >>> >>> Regards, >>> Roger >>> >>> >>> >>> >>> >>> >>> >>> -------- Original message -------- >>> From: Pierre Villard <[email protected]> >>> Date: 3/10/2016 19:06 (GMT+10:00) >>> To: [email protected] >>> Subject: Re: NiFi operational interface >>> >>> Hi Roger, >>> >>> I don't know if it answers your needs but you have a search tool bar >>> to find any element on the canvas and quickly access it. In terms of >>> monitoring, you have the "Summary" panel available from the menu. It >>> gives you a way to quickly display information about everything on the >>> canvas. >>> >>> Pierre >>> >>> >>> 2016-10-03 5:26 GMT+02:00 Roger Marin >>> <[email protected]>: >>>> >>>> Hi all, >>>> >>>> >>>> >>>> We are looking at using NiFi to replace some of our existing >>>> (batch, >>>> microbatch) Data Ingestion & ETL/ELT processes but we have a few >>>> concerns around the “Canvas” interface. >>>> >>>> >>>> >>>> From a developer perspective the Canvas view is OK but we feel that >>>> it’s not really suitable from an operational point of view. For >>>> instance, we are looking at eventually having hundreds, perhaps >>>> even thousands of Process Groups running in production, we really >>>> need a way to easily search and monitor any running process groups at any >>>> given time. >>>> >>>> >>>> >>>> We were initially thinking of leveraging things like the ELK stack >>>> to build some kind of dashboard but that doesn’t give us the >>>> flexibility we need, e.g. if we need to stop a given process group >>>> or restart it we would still need to use the canvas view for that, >>>> given this we are looking at leveraging the NiFi API to build a new >>>> “operational” interface that we could use to monitor and perform >>>> basic actions on running process groups (start, stop, schedule), as >>>> well as quickly search and visualize running processors and process groups. >>>> >>>> >>>> >>>> I am not sure if this is something that’s already being worked on >>>> (I couldn’t find anything in JIRA) but I would like to know how >>>> other NiFi users in production are getting around this (if it’s >>>> even an issue for anyone else at all…). >>>> >>>> >>>> >>>> Regards, >>>> >>>> Roger >>>> >>>> >>>> >>>> >>> >>> >> > > > > -- > Corey Flowers > Vice President, Onyx Point, Inc > (410) 541-6699 > [email protected] > > -- This account not approved for unencrypted proprietary information > --
