Re: UpdateAttribute Failure Relationship

2024-02-09 Thread Lucas Ottersbach
I think that's a good approach which actually addresses the underlying
issue. Thank you Joe, Mark and all others.

As far as I know the default last resort behaviour of rollback + yield,
that a lot processors exhibit, is due to them being based on
AbstractProcessor.

Does it make sense to incorporate the outlined approach into
AbstractProcessor instead of UpdateAttribute?
This way, other processors can undergo the same (opt-in) behavioural change
without having to re-implement it on a per processor basis.
Every processor in question only would need to add the property declared by
AbstractProcessor to its list of properties. Processors that do not include
the property wouldn't support to configure the behaviour and thus default
to the current behaviour instead.

I think that would provide a good middle ground, both for users that want a
explicit failure relationship and for those that rather want a simpler flow
and ensure those kinds of errors won't happen another way.

What do you think?

Lucas

Joe Witt  schrieb am Sa., 10. Feb. 2024, 00:06:

> Lots of good commentary and great focus on minimizing impact to the users
> while fixing what is admittedly not the most desired behavior for some
> cases as it relates to the very popular UpdateAttribute.
>
> We do not want to enforce that all processors have failure relationships.
> Presumably that notion is specific to processors which take an input
> flowfile.  Even then though we have many examples where having a failure
> relationship does not add value.  A few examples such as DistributeLoad,
> DuplicateFlowFile and others which have concepts such as 'unmatched' etc..
> Certainly most things can and should but we dont need a strict policy
> here.  We should still let people building processors be thoughtful about
> what is best.
>
> UpdateAttribute specifically... There is history to why it works the way it
> does.  Things changed and it didn't evolve or couldn't because its use was
> so widespread and we didnt want to create too much pain for users.  But
> because of 2.0 and some improvements like the ability to code up migration
> behaviors on a per extension basis we can work our way out of this without
> causing pain for the users.
>
> For NiFi 1.x it should stay as is.
> For NiFi 2.x a solution is outlined in NIFI-6344.  It reads:
> Given the new capabilities for migrating configs in NiFi 2.0 we can fix
> this.
>
> Add a property to UpdateAttribute that is 'Failure Strategy' and the
> options are 'rollback' or 'route to failure'. If that property is set with
> rollback it behaves like it does now and I recommend that remain the
> default. If that property is set to 'route to failure' then we add a
> relationship which needs to be set which is of course called 'failure'. For
> flows being migrated from a version before this behavior was available to a
> version that has this capability we just set the value of this parameter to
> our default.
>
> This lets existing flows migrate over just fine. It lets us give users a
> failure path for the cases they want one. It lets us keep the vast majority
> of flows and uses of this where failure is not relevant stay clean. And it
> handles migration.
>
> The processor needs to be updated to catch the exceptions and then follow
> this logic. Today it just lets it fly to the framework which causes the
> processor to yield and penalizes the flowfile for the default time. When
> now catching the problem we should just avoid yielding and instead penalize
> the specific offending flowfile which lets everything else operate super
> fast.
>
> Thanks to Mark Payne for the chat on this.
>
> This can be done at any time by anyone that wants to take it on.  It is not
> a blocker for nifi 2.x.  The migration capabilities give us really nice
> options for many cases we've hit over the years going forward.
>
> Thanks
>
>
> On Fri, Feb 9, 2024 at 2:53 PM Adam Taft  wrote:
>
> > I mean, this really speaks to the principal that (in my humble opinion)
> All
> > Processors Shalt Have Failure Relationships as best practice.
> >
> > So I think the decision is really, how much pain do we want to impose on
> > NiFi 2.0 adopters. UpdateAttribute and RouteOnAttribute are used
> literally
> > everywhere (especially on large installations), and it would be quite the
> > chore(!) to update these if we were to make a non-compatible change for
> > 2.0.  But 2.0 is also a very logical time/place to make such backwards
> > breaking changes to UpdateAttribute, RouteOnAttribute and/or other
> > processors that are missing failure relationships.
> >
> > I just fundamentally have never liked the lack of control for flowfiles
> > getting requeued if a processor exception is left uncaught. You are
> almost
> > always just going to repeat the failure condition, it's rarely useful to
> > retry without some sort of deliberate flow manager action. The
> enhancements
> > to relationships (to enable retries, backoffs, etc.) even reinforces my
> > point of 

Re: [DISCUSS] Preparing for NiFi 2.0.0-M2

2024-01-16 Thread Lucas Ottersbach
Hey team,

looking forward to the upcoming releases.

Is there a chance we can land PR#8244 [1] / NIFI-12595 [2] or a variant
thereof in the next 1.x release?
I appreciate any feedback on the proposed extensions.

[1] https://github.com/apache/nifi/pull/8244
[2] https://issues.apache.org/jira/browse/NIFI-12595

Thanks,

Lucas


Pierre Villard  schrieb am Di., 16. Jan. 2024,
08:11:

> Sounds good to me David. As discussed on another thread, I am happy to take
> care of a 1.25 release as well to have some of the fixes we added there for
> migrating to 2.0.
>
> Thanks,
> Pierre
>
> Le mar. 16 janv. 2024 à 00:44, Joe Witt  a écrit :
>
> > Wonderful progress. Definitely time and an M2 seems like a good idea for
> > the reasons noted.
> >
> > Happy to help take RM if you get squeezed on time.
> >
> > Thanks
> >
> > On Mon, Jan 15, 2024 at 12:46 PM David Handermann <
> > exceptionfact...@apache.org> wrote:
> >
> > > Team,
> > >
> > > We have had some great feedback and many improvements since the
> > > release of NiFi 2.0.0-M1 at the end of November. With over 160 [1]
> > > Jira issues already resolved for the next iteration, we are in a good
> > > position to prepare for another milestone release version.
> > >
> > > The main branch now includes significant framework dependency upgrades
> > > such as Spring 6 and Jetty 12, along with several new components and
> > > other bug fixes. Considering the significant number of changes to
> > > framework libraries already in place, it would be useful to release a
> > > second milestone version before a full general availability release.
> > >
> > > There is a pull request for NIFI-9458 [2] that includes impactful
> > > lower-level changes to date and time parsing and formatting, which is
> > > part of the NiFi 2.0 Release Goals [3]. Following the review and
> > > incorporation of these changes, I would be glad to handle Release
> > > Manager responsibilities for a NiFi 2.0.0-M2 version, ideally at some
> > > point this week.
> > >
> > > Regards,
> > > David Handermann
> > >
> > > [1] https://issues.apache.org/jira/projects/NIFI/versions/12353861
> > > [2] https://github.com/apache/nifi/pull/8248
> > > [3]
> > >
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Release+Goals
> > >
> >
>


Re: New Project Website Design Staged for Deployment

2024-01-03 Thread Lucas Ottersbach
The new site looks refreshing.

Thank you James, David and all others contributing to this.

While browsing the site on mobile I noticed a small area of future
improvement. I added more details to the GitHub issue linked by David.
Maybe that's something we want to improve on in the future. However, that
shouldn't keep us from releasing the updated site.

Joe Witt  schrieb am Do., 4. Jan. 2024, 05:00:

> Super appreciative David and James.  A good step forward to refresh and
> retool and grow from.
>
> Thanks!
>
> On Wed, Jan 3, 2024 at 8:39 PM David Handermann <
> exceptionfact...@apache.org>
> wrote:
>
> > Team,
> >
> > Thanks to background work from several designers, and significant
> > recent effort from James Mingardi-Elliott [1], we have a refreshed
> > project website design deployed to staging and ready for production.
> >
> > The combined set of changes can be reviewed in the following pull
> request:
> >
> > https://github.com/apache/nifi-site/pull/82
> >
> > The staging site provides a functional way to view the new design,
> > based on the main-staging branch [2] of the nifi-site repository.
> >
> > https://nifi.staged.apache.org
> >
> > The new design includes a modernized home page and streamlined
> > navigation, with a focus on making the most popular pages quickly
> > accessible.
> >
> > There is room for improvement in areas like generated project
> > documentation, but the primary goal of this new design is to provide a
> > fresh foundation for future improvements.
> >
> > The pull request provides an opportunity for correcting any notable
> > problems, with the goal of pushing the new version to production early
> > next week.
> >
> > Regards,
> > David Handermann
> >
> > [1] https://github.com/james-elliott
> > [2] https://github.com/apache/nifi-site/tree/main-staging
> >
>


Re: Use of attribute uuid and other "native" attributes

2023-07-18 Thread Lucas Ottersbach
That was impression as well. Thank you for the quick response and the
clarification.


Best regards

Lucas

Mark Payne  schrieb am Di., 18. Juli 2023, 21:56:

> Lucas,
>
> You cannot control the UUID. It’s automatically generated by the
> framework. If you attempt to use ProcessSession.putAllAttributes or
> ProcessSession.putAttribute, it’ll ignore the “uuid” key.
>
> Thanks
> -Mark
>
>
> > On Jul 18, 2023, at 3:51 PM, Lucas Ottersbach <
> lucas.ottersb...@gmail.com> wrote:
> >
> > Hey Matt,
> >
> > you wrote that both `Session.create` and `Session.clone` set a new
> FlowFile
> > UUID to the resulting FlowFile. This somewhat sounds like there is an
> > alternative way where the UUID is not controlled by the framework itself?
> >
> > I've got a different use case than Russell, but was wondering whether it
> is
> > even possible to control the FlowFile UUID as a Processor developer? I've
> > got a processor pair for inter-cluster transfer of FlowFiles (where
> > Site-to-Site is not applicable). As of now, the UUID on the receiving
> side
> > differs from the original on the origin cluster, because I'm using
> > `Session.create`.
> > Is there a way to control the UUID of new FlowFiles?
> >
> >
> > Best regards,
> >
> > Lucas
> >
> > Matt Burgess  schrieb am Di., 18. Juli 2023,
> 20:23:
> >
> >> In general I recommend only sending on those attributes that will be
> >> used at some point downstream (unless you have an "original"
> >> relationship that should maintain the original state with respect to
> >> provenance). If you don't know that ahead of time you'll probably need
> >> to send all/most of the attributes just in case.
> >>
> >> Are you using session.create() or session.clone()? They both set a new
> >> "uuid" attribute on the created FlowFile, with at least the latter
> >> setting some other attributes as well (see the Developer Guide [1] for
> >> more details).
> >>
> >> Regards,
> >> Matt
> >>
> >> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
> >>
> >> On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman  >
> >> wrote:
> >>>
> >>> I have a custom processor, /SplitHl7v4Resources/, that splits out
> >>> individual FHIR resources (Patients, Observations, Encounters, etc.)
> >>> from great Bundle flowfiles. So, for a given flowfile, it's split into
> >>> hundreds of smaller ones.
> >>>
> >>> When I do this, I leave the existing NiFi attributes as they were on
> the
> >>> original flowfile.
> >>>
> >>> As I contemplate the uuid attribute, it occurs to me that I should find
> >>> out what its *significance is for provenance and other potential
> >>> debugging/tracing concerns*. I never really look at it, but, if there
> >>> were some kind of melt-down in a production environment, would I care
> >>> that it multiplied across hundreds of flowfiles besided the original
> one?
> >>>
> >>> Also these two other NiFi attributes remain unchanged:
> >>>
> >>>filename
> >>>path
> >>>
> >>>
> >>> I do garnish each flowfile with many pointed/significant new attributes
> >>> like resource.type that are my own. In my processing, I don't care
> about
> >>> NiFi's original attributes, but should I?
> >>>
> >>> Thanks,
> >>> Russ
> >>
>
>


Re: Use of attribute uuid and other "native" attributes

2023-07-18 Thread Lucas Ottersbach
Hey Matt,

you wrote that both `Session.create` and `Session.clone` set a new FlowFile
UUID to the resulting FlowFile. This somewhat sounds like there is an
alternative way where the UUID is not controlled by the framework itself?

I've got a different use case than Russell, but was wondering whether it is
even possible to control the FlowFile UUID as a Processor developer? I've
got a processor pair for inter-cluster transfer of FlowFiles (where
Site-to-Site is not applicable). As of now, the UUID on the receiving side
differs from the original on the origin cluster, because I'm using
`Session.create`.
Is there a way to control the UUID of new FlowFiles?


Best regards,

Lucas

Matt Burgess  schrieb am Di., 18. Juli 2023, 20:23:

> In general I recommend only sending on those attributes that will be
> used at some point downstream (unless you have an "original"
> relationship that should maintain the original state with respect to
> provenance). If you don't know that ahead of time you'll probably need
> to send all/most of the attributes just in case.
>
> Are you using session.create() or session.clone()? They both set a new
> "uuid" attribute on the created FlowFile, with at least the latter
> setting some other attributes as well (see the Developer Guide [1] for
> more details).
>
> Regards,
> Matt
>
> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
>
> On Tue, Jul 18, 2023 at 12:25 PM Russell Bateman 
> wrote:
> >
> > I have a custom processor, /SplitHl7v4Resources/, that splits out
> > individual FHIR resources (Patients, Observations, Encounters, etc.)
> > from great Bundle flowfiles. So, for a given flowfile, it's split into
> > hundreds of smaller ones.
> >
> > When I do this, I leave the existing NiFi attributes as they were on the
> > original flowfile.
> >
> > As I contemplate the uuid attribute, it occurs to me that I should find
> > out what its *significance is for provenance and other potential
> > debugging/tracing concerns*. I never really look at it, but, if there
> > were some kind of melt-down in a production environment, would I care
> > that it multiplied across hundreds of flowfiles besided the original one?
> >
> > Also these two other NiFi attributes remain unchanged:
> >
> > filename
> > path
> >
> >
> > I do garnish each flowfile with many pointed/significant new attributes
> > like resource.type that are my own. In my processing, I don't care about
> > NiFi's original attributes, but should I?
> >
> > Thanks,
> > Russ
>


Re: PostHTTP Deprecation Concerns

2023-02-12 Thread Lucas Ottersbach
Hi Adam,

I know this thread has been opened over a month ago, but we recently had to
move FlowFiles, including both attributes and content, from one NiFi
cluster to another and could not built upon the built-in Site-to-Site
transfer mechanisms due to network restrictions between the clusters.

We've built upon an existing solution from a community member which has
been dormant for some time. It uses a pair of two custom processors to
transfer FlowFile content and attributes using raw TCP connections.
You can find the solution under its name "nifi-flow-over-tcp" both on
GitHub and on Maven Central.
githubDOTcom/EndzeitBegins/nifi-flow-over-tcp


Maybe this can be helpful to you as well in the aforementioned cases you
previously made use of the PostHTTP processor.


Best regards

Adam Taft  schrieb am Do., 12. Jan. 2023, 05:39:

> David,
>
> Thank you for the reasonable response to my questions. Much appreciated.
>
> I'm not a huge fan of the MergeContent -> InvokeHTTP -> {} -> ListenHTTP ->
> UnpackContent approach to provide the same functionality. But I do
> acknowledge that's the most direct replacement option without PostHTTP.
> It's adding extract processors to the chain for something that is
> effectively a transport issue. NiFi-to-Nifi using PostHTTP was a simple
> transport-oriented solution, and packing the data with MergeContent first
> isn't quite the same level of fidelity. You also miss the two-phase commit
> built into those extra bits. MergeContent is often a bit of a beast
> in-and-of-itself too.
>
> Flowfile attributes conveyed as HTTP headers definitely don't work for
> complex attribute values. But yes, I know that the functionality exists
> (having some history with that processor myself).
>
> Thanks again for the response.
>
> /Adam
>
>
>
>
> On Wed, Jan 11, 2023 at 9:27 PM Adam Taft  wrote:
>
> > Hi Mathew,
> >
> > > It's quite remarkable you're advocating against standard practice
> > presumably
> > > for your own convenience.
> >
> > Wow, absolutely not stated nor implied in my message. And even borderline
> > offensive.
> >
> > What I asked was simply, why remove it, if it's not hurting anything. I
> > agree with your statement that there is a (very small) cost for
> maintaining
> > the component in the source tree. But PostHTTP is not in the same scope
> as
> > compared to a component that has a dependency on an abandoned, insecure,
> or
> > completely out of standards library (for example).
> >
> > PostHTTP has a reasonable use case (as I described) that is not directly
> > matched with other processors. The two-phase commit protocol sitting
> > between PostHTTP and ListenHTTP has demonstrated to bear good fruit over
> > many hardened years of use. I think it's a reasonable reply to my
> question
> > to just simply suggest that the interaction between PostHTTP and
> ListenHTTP
> > is just not supported by NiFi going forward. But please don't tell me my
> > question/concern is "out of convenience."
> >
> > There is lacking documentation as to the rationale behind the deprecation
> > of PostHTTP. I might be missing it, can you please send me the link to
> the
> > rationale? That's what this thread is trying to address. It sounds like,
> > from your answer, that the rationale is to reduce code footprint, which
> > isn't the strongest argument for its removal given its established
> > historical use. Seems like we'd want more than just reduced footprint for
> > such a heavily used processor, no?
> >
> > /Adam
> >
> >
> > On Wed, Jan 11, 2023 at 7:53 PM Matthew Hawkins 
> > wrote:
> >
> >> Hi Adam,
> >>
> >> PostHTTP was marked deprecated 3 years ago (aka six technology
> lifetimes).
> >> The successive technologies to replace it's functionality are well
> >> documented and proven in production. The technical reason to remove it
> is
> >> that it is superfluous code that has a cost to maintain and zero
> benefit.
> >> Backwards compatibility is never guaranteed for components marked
> >> deprecated for such a long length of time in any software product let
> >> alone
> >> nifi specifically.
> >>
> >> Your organisation is free to continue using the version of nifi it is on
> >> today and not take any further action. It is unhelpful to suggest every
> >> other organisation should be held back in progress because yours refuses
> >> to
> >> take the necessary flow maintenance action. One of the impetus for a
> major
> >> version upgrade is to specifically jettison deprecated components. It's
> >> quite remarkable you're advocating against standard practice presumably
> >> for
> >> your own convenience.
> >>
> >> Site to site connectivity is conducted with either raw sockets or http
> >> (which is https on secured nifi) so I'm highly skeptical there is any
> >> performance deprecation in InvokeHTTP or S2S over PostHTTP, given the
> >> former can take advantage of http/2 and the latter not. It's easy to
> >> monitor nifi and prove through metrics in any case. Sadly in enterprise
> >> 

Re: [VOTE] Adopt NiFi 2.0 Proposed Release Goals

2022-12-12 Thread Lucas Ottersbach
+1 (non-binding)

David Handermann  schrieb am Mo., 12. Dez.
2022, 18:02:

> Team,
>
> Following positive feedback on NiFi 2.0 Proposed Release Goals [1] on the
> recent discussion thread [2], I am calling this vote to adopt the following
> as Release Goals for NiFi 2.0:
>
> 1. Remove Java 8 support and require Java 11
> 2. Remove deprecated components
> 3. Remove deprecated component properties
> 4. Remove components integrating with unmaintained services
> 5. Remove compatibility classes and methods
> 6. Remove flow.xml.gz in favor of flow.json.gz
> 7. Remove duplicative features
> 8. Upgrade internal Java API references
> 9. Reorganize standard components
> 10. Implement migration tools for upgrading flows
>
> A positive vote indicates agreement on these goals and the initiation of
> the following actions:
>
> 1. Rename NiFi 2.0 Proposed Release Goals to NiFi 2.0 Release Goals
> 2. Create version 1 branch in Git for subsequent support releases on the
> version 1 series
> 3. Update the current main branch in Git to version 2.0.0-SNAPSHOT
>
> The vote will be open for 72 hours and follow standard procedures for
> release votes.
>
> Please review the linked goals and discussions for background.
>
> [ ] +1 Adopt NiFi 2.0 Release Goals
> [ ] +0 No opinion
> [ ] -1 Do not adopt NiFi 2.0 Release Goals for the following reasons...
>
> [1]
>
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Proposed+Release+Goals
> [2] https://lists.apache.org/thread/xo77p9t3xg4k70356xrqbdg4m9sg7sf8
>


Re: Unit tests for multiple processors

2021-07-22 Thread Lucas Ottersbach
Hey Phil,

we've set up a "test framework" internally, which orchestrates an existing
NiFi instance via the REST api to test configured processors or process
groups.

The way it works is, it copies a processor or process group (based on it's
name) adds an GenerateFlowFile processor up front and connections for the
downstream, all via the REST api.
This way we can "inject" flow files and examine the resulting flow files
from the output.

Unfortunately, we've not open-sourced our setup.
Hope this helps nonetheless.


Best regards

Lucas


Phil H  schrieb am Do., 22. Juli 2021, 05:26:

> Hi there,
>
> I use the built in unit tests in the maven processor archetype for each of
> my processors, but I would like to set up some integration testing using
> multiple processors.
>
> Is this possible? If so, anyone care to sling me some example code? Eg: say
> I wanted to run a test connecting GenerateFlowFile’s output to PutFile’s
> input.
>
> Thanks!
> Phil
>


Jira contributor access

2020-04-09 Thread Lucas Ottersbach
Hello,

I would like to receive contributor access to Jira.
My username is "EndzeitBegins".

Best regards
Lucas Ottersbach