from:"Adam Taft"

Re: UpdateAttribute Failure Relationship

2024-02-09 Thread Adam Taft

> >>>
> >>> 1) Default - status quo, exceptions cause it to yield
> >>> 2) Exception = moves forward to success w/ an error attribute, an error
> >> log
> >>> statement that triggers a bulletin, etc to let data manages know what's
> >>> happening.
> >>> 3) Exception = moves to a failure relationship that is otherwise
> >>> autoterminated
> >>>
> >>>> On Thu, Feb 8, 2024 at 7:12 PM Matt Burgess 
> >> wrote:
> >>>>
> >>>> Mike's option #2 seems solid but would take a lot of work and there
> will
> >>>> always be inputs we don't account for. I support that work but in code
> >>>> sometimes we just do a "catch(Throwable)" just so it doesn't blow up.
> >> What
> >>>> about a subjectless "try" or "trycatch" function you can wrap around
> >> your
> >>>> whole expression? If no exception is thrown, the evaluated value will
> be
> >>>> returned but if one is thrown, you can provide some alternate value
> that
> >>>> you can check downstream. As this is optional it would retain the
> >> current
> >>>> behavior unless you use it, and then it takes the place of all those
> >>>> ifElse(isXYZValid()) calls we'd need throughout the expression.
> >>>>
> >>>> Regards,
> >>>> Matt
> >>>>
> >>>>
> >>>> On Wed, Feb 7, 2024 at 8:11 PM Phillip Lord 
> >>>> wrote:
> >>>>
> >>>>> IMO... UpdateAttribute has been around since the beginning of time, I
> >>>> can't
> >>>>> see adding a failure relationship. At the same time I understand the
> >> want
> >>>>> for such exceptions to be handled more gracefully rather than rolling
> >>>> back
> >>>>> indefinitely.
> >>>>> I'd vote in favor of considering Moser's option #2... and being able
> to
> >>>>> implement an "if this then that" logic within your flow.
> >>>>>
> >>>>> Also just thinking... for every UA failure you have to consider a
> good
> >>>>> failure-management strategy, which MIGHT add a lot of noise to the
> >> flow.
> >>>>> Something that might otherwise easily be identified in a downstream
> >>>>> component and/or database/etc.
> >>>>>
> >>>>> My 2 cents **
> >>>>> Phil
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Wed, Feb 7, 2024 at 5:18 PM Adam Taft  wrote:
> >>>>>
> >>>>>> Or better, the failure relationship just doesn't even exist until
> the
> >>>>>> property "Has Failure Relationship" is set to True.  This involves
> >>>>> updating
> >>>>>> UpdateAttribute to have dynamic relationships (the failure
> >>>> relationships
> >>>>>> appearing on true), which isn't hard to do in processor code.
> >>>>>>
> >>>>>> This has the advantage of being backwards compatible for existing
> >> users
> >>>>> and
> >>>>>> allows the failure relationship to exist for new configurations.
> >>>>> Obviously
> >>>>>> the processor would need an update to catch Expression Language
> >>>>> exceptions
> >>>>>> and then route conditionally to failure.
> >>>>>>
> >>>>>> Just thinking out loud.
> >>>>>> /Adam
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Feb 7, 2024 at 1:48 PM u...@moosheimer.com <
> u...@moosheimer.com
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Mike,
> >>>>>>>
> >>>>>>> How about the option of introducing a new property that decides
> >>>> whether
> >>>>>> to
> >>>>>>> route to the 'failure' relationship in the event of an error?
> >>>>>>> If this property is set to false, then the 'failure' relationship
> is
> >>>>>>> automatically set to 'terminate' (since nothing is routed there
> >>>>

Re: UpdateAttribute Failure Relationship

2024-02-07 Thread Adam Taft

Or better, the failure relationship just doesn't even exist until the
property "Has Failure Relationship" is set to True.  This involves updating
UpdateAttribute to have dynamic relationships (the failure relationships
appearing on true), which isn't hard to do in processor code.

This has the advantage of being backwards compatible for existing users and
allows the failure relationship to exist for new configurations. Obviously
the processor would need an update to catch Expression Language exceptions
and then route conditionally to failure.

Just thinking out loud.
/Adam



On Wed, Feb 7, 2024 at 1:48 PM u...@moosheimer.com 
wrote:

> Hi Mike,
>
> How about the option of introducing a new property that decides whether to
> route to the 'failure' relationship in the event of an error?
> If this property is set to false, then the 'failure' relationship is
> automatically set to 'terminate' (since nothing is routed there anyway).
>
> Then everyone can decide whether and where they want to use this new
> feature or not.
> All other options would still be possible with such a solution.
>
> -- Uwe
>
> > Am 07.02.2024 um 22:15 schrieb Michael Moser :
> >
> > Hi Dan,
> >
> > This has been discussed in the past, as you found with those two Jira
> > tickets.  Personally, I'm still not sure whether a new failure
> relationship
> > on UpdateAttribute in 2.0 is a good approach.  I have heard from some
> > dataflow managers that would not want to go through their entire graph
> when
> > upgrading to 2.0 and update every UpdateAttribute configuration.
> >
> > I have heard some alternatives to a 'failure' relationship that I would
> > like to share as options.
> >
> > 1) Add a new property to UpdateAttribute that controls whether a flowfile
> > that causes an expression language exception either yields and rolls
> back,
> > or silently fails to update the attribute and sends the flowfile to
> > success.  I personally don't like this, because the use case for "silent
> > failure" seems really like a rarely needed edge case.
> >
> > 2) Identify all expression language methods that can throw an exception
> and
> > document that fact in the Expression Language Guide (some methods already
> > mention they can throw an "exception bulletin").  Then implement new
> > expression methods to check if an expression could fail, and use that in
> > UpdateAttribute advanced rules.  For example, if the format() and
> > formatInstant() methods can fail on a negative number, we create a new
> > method such as isValidMilliseconds().  This already exists for some
> cases,
> > such as isJson() which can do a quick check of some value before calling
> > jsonPathDelete() on it.
> >
> > I'm curious to hear more thoughts on this.
> >
> > -- Mike
> >
> >
> >
> >> On Wed, Jan 31, 2024 at 11:02 AM Dan S  wrote:
> >>
> >> My team is requesting a failure relationship for UpdateAttribute as
> seen in
> >> NIFI-5448  and
> NIFI-6344
> >>  as we are
> >> experiencing the same problem where a NIFI Expression Language is
> throwing
> >> an exception. In the PR for NIFI-5448 it was mentioned this feature
> would
> >> have to wait until NIFI 2.0.0. I wanted to know if there is any active
> work
> >> regarding this and whether eventually there will be a failure
> relationship
> >> added to UpdateAttribute?
> >>
>
>

Re: [discuss] Time for a NiFi 2.0 M1 release?

2023-09-26 Thread Adam Taft

I'm also hoping that both 1.x and 2.x lines can receive the PackageFlowFile
processor that Mike Moser recently proposed. That way, the M1 release and
the most recent 1.x release will have a simple (or logical) replacement for
PostHTTP.

In general, it would be nice to have 1.x lined up with 2.0-M1 so that the
transitional experience is as disruptive as it's going to be when 2.0-final
is released. That is, I want all the things that can break to break, once a
2.0 milestone is released. From that perspective, I agree with Pierre that
waiting for the flow.xml work to finalize makes the most sense, because
then users can start getting a feel for how it will affect them. Lots of
deployment scripts (think Ansible or equivalent) rely on the flow.xml.gz
file specifically.

The most disruptive parts of the 1.x to 2.x transition would ideally be
realized as early as possible. Understand and agree with the urgency to get
2.0-M1 released, but also concerned that it doesn't allow a proper
evaluation of all breaking changes just yet.

/Adam

On Tue, Sep 26, 2023 at 9:18 AM Pierre Villard 
wrote:

> Hey Joe,
>
> Definitely a +1 to get a M1 release ASAP. I'd still recommend waiting on
> the flow.xml removal work to be merged. The reason being that users may
> give useful feedback when they'll try NiFi 2.0 with existing flows coming
> from NiFi 1.x and getting rid of all of the XML based stuff. There is also
> a PR coming soon for the frontend work of the templates removal. Hopefully
> both can be completed this week or next week.
>
> Pierre
>
> Le mar. 26 sept. 2023 à 17:35, Joe Witt  a écrit :
>
> > Team,
> >
> > The NiFi 2.0 release has more than 700 resolved JIRAs on it [1] and
> growing
> > every day.
> >
> > The NiFi 2.0 deprecation plan is well underway and largely complete [2].
> >
> > We still need to remove a lot of now deprecated code, tests which are
> never
> > run and largely don't work, eliminate the flow.xml which has a JIRA/PR
> > underway.  And more.  But we're getting close and we need to start
> getting
> > this in the hands of users.
> >
> > The docker image can now be built in 'nifi-docker/dockermaven' after a
> full
> > build from root with 'mvn install -Pdocker'.  And it comes up with
> Ubuntu,
> > Java 21, Python 3.9, and NiFi 2.0 ready to roll with Python processors
> > enabled.
> >
> > I propose we start closing down soon to make a NiFi 2.0 M1 release happen
> > even before we have all the things done.  We need to start getting
> feedback
> > and giving people a chance to work with it.
> >
> > Lastly, a huge thank you to the folks in the community that have been
> > helping push towards 2.x with code changes, removals, reviews, bug
> reports,
> > etc..  Super awesome to see.  NiFi 2.x is shaping up nicely to be useful
> > not only for our well established user base which spans the globe and
> every
> > industry but now we are also seeing a lot of opportunity and fit for NiFi
> > in these exciting AI use cases particularly involving orchestrating the
> > data flows with embeddings, vector stores, and LLMs.  And the Python
> > capabilities in NiFi 2.x make NiFi far easier to use for the very
> important
> > data engineer user base.
> >
> > [1] https://issues.apache.org/jira/projects/NIFI/versions/12339599
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/NIFI/Deprecated+Components+and+Features
> >
> > Thanks
> > Joe
> >
>

Re: new PackageFlowFile processor

2023-09-08 Thread Adam Taft

Flow File Packager v3. You can find the source here:

https://github.com/apache/nifi/blob/main/nifi-commons/nifi-flowfile-packager/src/main/java/org/apache/nifi/util/FlowFilePackagerV3.java

It's a serialization format that is used for writing a flowfile (content
and attributes) to a stream (network, file, etc.). It's a simple binary
format, that is effectively the attributes serialized as key/value pairs
followed by the content. There are byte size markers written into the start
of each field, so that deserializing can read the values into a byte array
(or equivalent).

Flow File Packager v3 is primarily used in the MergeContent processor (for
bundling) and the UnpackContent processor (for extraction). But the
(deprecated) PostHTTP processor and the ListenHTTP processor has support
for this format somewhat transparently as well. Thus enabling two NiFi
systems to send a serialized flowfile across the wire using HTTP.

You might see this format name as "FlowFile Stream v3" or
"flowfile-stream-v3" when looking at either MergeContent or UnpackContent.





On Fri, Sep 8, 2023 at 2:14 PM Russell Bateman 
wrote:

> Uh, sorry, "Version 3" refers to what exactly?
>
> On 9/8/23 12:48, David Handermann wrote:
> > I agree that this would be a useful general feature. I also agree with
> > Joe that format support should be limited to*Version 3*  due to the
> > limitations of the earlier versions.
> >
> > This is definitely something that would be useful on the 1.x support
> > branch to provide a smooth upgrade path for NiFi 2.
> >
> > This general topic also came up on the dev channel on the Apache NiFi
> > Slack group:
> >
> > https://apachenifi.slack.com/archives/C0L9S92JY/p1692115270146369
> >
> > One key thing to note from that discussion is supporting
> > interoperability with services outside of NiFi. That may be too much
> > of a stretch for an initial implementation, but it is something I am
> > planning to evaluate as time allows.
> >
> > For now, something focused narrowly on FlowFile Version 3 encoding
> > seems like the best approach.
> >
> > I recommend referencing this discussion in a new Jira issue and
> > outlining the general design goals.
> >
> > Regards,
> > David Handermann
> >
> >
> > On Fri, Sep 8, 2023 at 1:11 PM Adam Taft  wrote:
> >> And also ... if we can land this in a 1.x release, this would help
> >> tremendously to those who are going to need a replacement for PostHTTP
> and
> >> don't want to "go dark" when they make the transition.
> >>
> >> That is, without this processor in 1.x, when a user upgrades from 1.x to
> >> 2.x, they will either have to have a MergeContent/InvokeHTTP solution in
> >> place already to replace PostHTTP, or they will have to take a
> (hopefully
> >> short) outage when they bring their canvas back up (removing PostHTTP
> and
> >> replacing with PackageFlowFile + InvokeHTTP).
> >>
> >> With this processor in 1.x, they can make that transition while
> PostHTTP is
> >> still available on their canvas. Wishful thinking that we can make the
> >> entire journey from 1.x to 2.x as smooth as possible, but this could
> >> potentially help some.
> >>
> >>
> >> On Fri, Sep 8, 2023 at 10:55 AM Adam Taft  wrote:
> >>
> >>> +1 on this as well. It's something I've kind of griped about before
> (with
> >>> the loss of PostHTTP).
> >>>
> >>> I don't think it would be horrible (as per Joe's concern) to offer a
> N:1
> >>> "bundling" property. It would just have to be stupid simple. No
> "groups",
> >>> timeouts, correlation attributes, minimum entries, etc. It should just
> >>> basically call the ProcessSession#get(int maxResults) where
> "maxResults" is
> >>> a configurable property. Whatever number of flowfiles returned in the
> list
> >>> is what is "bundled" into FFv3 format for output.
> >>>
> >>> /Adam
> >>>
> >>>
> >>> On Fri, Sep 8, 2023 at 7:19 AM Phillip Lord
> >>> wrote:
> >>>
> >>>> +1 from me.
> >>>> I’ve experimented with both methods.  The simplicity of a
> PackageFlowfile
> >>>> straight up 1:1 is convenient and straightforward.
> >>>> MergeContent on the other hand can be difficult to understand and
> tweak
> >>>> appropriately to gain desired results/throughput.
> >>>> On Sep 8, 2023 at 10:14 AM -0400, Joe Witt,
> wrote:
> >>

Re: new PackageFlowFile processor

2023-09-08 Thread Adam Taft

And also ... if we can land this in a 1.x release, this would help
tremendously to those who are going to need a replacement for PostHTTP and
don't want to "go dark" when they make the transition.

That is, without this processor in 1.x, when a user upgrades from 1.x to
2.x, they will either have to have a MergeContent/InvokeHTTP solution in
place already to replace PostHTTP, or they will have to take a (hopefully
short) outage when they bring their canvas back up (removing PostHTTP and
replacing with PackageFlowFile + InvokeHTTP).

With this processor in 1.x, they can make that transition while PostHTTP is
still available on their canvas. Wishful thinking that we can make the
entire journey from 1.x to 2.x as smooth as possible, but this could
potentially help some.


On Fri, Sep 8, 2023 at 10:55 AM Adam Taft  wrote:

> +1 on this as well. It's something I've kind of griped about before (with
> the loss of PostHTTP).
>
> I don't think it would be horrible (as per Joe's concern) to offer a N:1
> "bundling" property. It would just have to be stupid simple. No "groups",
> timeouts, correlation attributes, minimum entries, etc. It should just
> basically call the ProcessSession#get(int maxResults) where "maxResults" is
> a configurable property. Whatever number of flowfiles returned in the list
> is what is "bundled" into FFv3 format for output.
>
> /Adam
>
>
> On Fri, Sep 8, 2023 at 7:19 AM Phillip Lord 
> wrote:
>
>> +1 from me.
>> I’ve experimented with both methods.  The simplicity of a PackageFlowfile
>> straight up 1:1 is convenient and straightforward.
>> MergeContent on the other hand can be difficult to understand and tweak
>> appropriately to gain desired results/throughput.
>> On Sep 8, 2023 at 10:14 AM -0400, Joe Witt , wrote:
>> > Ok. Certainly simplifies it but likely makes it applicable to larger
>> > flowfiles only. The format is meant to allow appending and result in
>> large
>> > sets of flowfiles for io efficiency and specifically for storage as the
>> > small files/tons of files thing can cause poor performance pretty
>> quickly
>> > (10s of thousands of files in a single directory).
>> >
>> > But maybe that simplicity is fine and we just link to the MergeContent
>> > packaging option if users need more.
>> >
>> > On Fri, Sep 8, 2023 at 7:06 AM Michael Moser 
>> wrote:
>> >
>> > > I was thinking 1 file in -> 1 flowfile-v3 file out. No merging of
>> multiple
>> > > files at all. Probably change the mime.type attribute. It might not
>> even
>> > > have any config properties at all if we only support flowfile-v3 and
>> not v1
>> > > or v2.
>> > >
>> > > -- Mike
>> > >
>> > >
>> > > On Fri, Sep 8, 2023 at 9:56 AM Joe Witt  wrote:
>> > >
>> > > > Mike
>> > > >
>> > > > In user terms this makes sense to me. Id only bother with v3 or
>> whatever
>> > > is
>> > > > latest. We want to dump the old code. And if there are seriously
>> older
>> > > > versions v1,v2 then nifi 1.x can be used.
>> > > >
>> > > > The challenge is that you end up needing some of the same
>> complexity in
>> > > > implementation and config of merge content i think. What did you
>> have in
>> > > > mind for that?
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Fri, Sep 8, 2023 at 6:53 AM Michael Moser 
>> wrote:
>> > > >
>> > > > > Devs,
>> > > > >
>> > > > > I can't find if this was suggested before, so here goes. With the
>> > > demise
>> > > > > of PostHTTP in NiFi 2.0, the recommended alternative is to
>> > > MergeContent 1
>> > > > > file into FlowFile-v3 format then InvokeHTTP. What does the
>> community
>> > > > > think about supporting a new PackageFlowFile processor that is
>> simple
>> > > to
>> > > > > configure (compared to MergeContent!) and simply packages flowfile
>> > > > > attributes + content into a FlowFile-v[1,2,3] format? This would
>> also
>> > > > > offer a simple way to export flowfiles from NiFi that could later
>> be
>> > > > > re-ingested and recovered using UnpackContent. I don't want to
>> submit
>> > > a
>> > > > PR
>> > > > > for such a processor without first asking the community whether
>> this
>> > > > would
>> > > > > be acceptable.
>> > > > >
>> > > > > Thanks,
>> > > > > -- Mike
>> > > > >
>> > > >
>> > >
>>
>

Re: new PackageFlowFile processor

2023-09-08 Thread Adam Taft

+1 on this as well. It's something I've kind of griped about before (with
the loss of PostHTTP).

I don't think it would be horrible (as per Joe's concern) to offer a N:1
"bundling" property. It would just have to be stupid simple. No "groups",
timeouts, correlation attributes, minimum entries, etc. It should just
basically call the ProcessSession#get(int maxResults) where "maxResults" is
a configurable property. Whatever number of flowfiles returned in the list
is what is "bundled" into FFv3 format for output.

/Adam


On Fri, Sep 8, 2023 at 7:19 AM Phillip Lord  wrote:

> +1 from me.
> I’ve experimented with both methods.  The simplicity of a PackageFlowfile
> straight up 1:1 is convenient and straightforward.
> MergeContent on the other hand can be difficult to understand and tweak
> appropriately to gain desired results/throughput.
> On Sep 8, 2023 at 10:14 AM -0400, Joe Witt , wrote:
> > Ok. Certainly simplifies it but likely makes it applicable to larger
> > flowfiles only. The format is meant to allow appending and result in
> large
> > sets of flowfiles for io efficiency and specifically for storage as the
> > small files/tons of files thing can cause poor performance pretty quickly
> > (10s of thousands of files in a single directory).
> >
> > But maybe that simplicity is fine and we just link to the MergeContent
> > packaging option if users need more.
> >
> > On Fri, Sep 8, 2023 at 7:06 AM Michael Moser  wrote:
> >
> > > I was thinking 1 file in -> 1 flowfile-v3 file out. No merging of
> multiple
> > > files at all. Probably change the mime.type attribute. It might not
> even
> > > have any config properties at all if we only support flowfile-v3 and
> not v1
> > > or v2.
> > >
> > > -- Mike
> > >
> > >
> > > On Fri, Sep 8, 2023 at 9:56 AM Joe Witt  wrote:
> > >
> > > > Mike
> > > >
> > > > In user terms this makes sense to me. Id only bother with v3 or
> whatever
> > > is
> > > > latest. We want to dump the old code. And if there are seriously
> older
> > > > versions v1,v2 then nifi 1.x can be used.
> > > >
> > > > The challenge is that you end up needing some of the same complexity
> in
> > > > implementation and config of merge content i think. What did you
> have in
> > > > mind for that?
> > > >
> > > > Thanks
> > > >
> > > > On Fri, Sep 8, 2023 at 6:53 AM Michael Moser 
> wrote:
> > > >
> > > > > Devs,
> > > > >
> > > > > I can't find if this was suggested before, so here goes. With the
> > > demise
> > > > > of PostHTTP in NiFi 2.0, the recommended alternative is to
> > > MergeContent 1
> > > > > file into FlowFile-v3 format then InvokeHTTP. What does the
> community
> > > > > think about supporting a new PackageFlowFile processor that is
> simple
> > > to
> > > > > configure (compared to MergeContent!) and simply packages flowfile
> > > > > attributes + content into a FlowFile-v[1,2,3] format? This would
> also
> > > > > offer a simple way to export flowfiles from NiFi that could later
> be
> > > > > re-ingested and recovered using UnpackContent. I don't want to
> submit
> > > a
> > > > PR
> > > > > for such a processor without first asking the community whether
> this
> > > > would
> > > > > be acceptable.
> > > > >
> > > > > Thanks,
> > > > > -- Mike
> > > > >
> > > >
> > >
>

Re: [discuss] nifi 2.0 and Java 21…

2023-09-06 Thread Adam Taft

Yes, please. +1 Exactly what Mark said. Virtual threads have potential to
be extremely impactful to applications like NiFi.

/Adam

On Wed, Sep 6, 2023 at 7:26 AM Mark Payne  wrote:

> Thanks for bringing his up, Joe.
>
> I would definitely be a +1. I think the new virtual thread concept would
> have great impact on us.
> It would allow us to significantly simplify our scheduling logic, which
> would help with code maintainability
> but would also make configuration simpler. This is one of the most
> difficult things for users to configure,
> and I would very much welcome the ability to simplify this. It would
> likely also yield better off-heap memory
> utilization by reducing the number of native threads necessary.
>
> Thanks
> -Mark
>
>
> > On Sep 6, 2023, at 10:20 AM, Joe Witt  wrote:
> >
> > Team
> >
> > Thought it might be worth relighting this thread with Java 21 GA
> imminent.
> > Given the timing we should give consideration to having Java 21 as the
> > basis for nifi 2.x to buy maximum time with LTS alignment.  There are
> also
> > a couple interesting language features we can likely take advantage of.
> >
> > What do you think?
> >
> > Thanks
> > Joe
> >
> > On Wed, Jun 21, 2023 at 6:21 AM David Handermann <
> > exceptionfact...@apache.org> wrote:
> >
> >> Hi Dirk,
> >>
> >> Thanks for summarizing your findings in the referenced Jira issues. It
> >> sounds like subsequent discussion of Nashorn support may be better on a
> new
> >> thread.
> >>
> >> The Spring 6 and Jetty 11 upgrades are going to require significant
> work.
> >> One incremental step in that direction was making Java 17 the minimum
> >> version, and upgrading to Jetty 10 should also help move things forward.
> >>
> >> Although compiling NiFi modules with a reference to the standalone
> Nashorn
> >> library may introduce issues, there should be other options for
> referencing
> >> the library at runtime. That requires custom class loading, which some
> >> Processors support, so that seems like the general direction to go.
> >>
> >> If you have additional findings, feel free to start a new developer list
> >> thread and that may gather additional feedback.
> >>
> >> Regards,
> >> David Handermann
> >>
> >> On Wed, Jun 21, 2023 at 12:17 AM Dirk Arends  >
> >> wrote:
> >>
> >>> Since initially raising concerns about the move to Java 17 losing
> >> Nashorn,
> >>> I have been investigating the suggestion to use Nashorn as a standalone
> >>> package as potential easier alternative to GraalVM. [1]
> >>>
> >>> While making some progress, a number of issues have been encountered
> >> which
> >>> I haven't been able to resolve as yet. More details are included in
> >>> relevant JIRA tickets, but summarising:
> >>>
> >>> - Building NiFi with a recent Nashorn dependency leads to errors
> >>> "Unsupported class file major version 61" [2]
> >>> - Building NiFi using Java 17 highlights issues with the current Jetty
> >>> version, which I believe would require an upgrade from 9.4.51 to
> 11.0.15
> >>> [3]
> >>> - Jetty 11 then requires an upgrade of the Spring Framework version 5
> to
> >> 6.
> >>> [4]
> >>>
> >>> The current steps to remove references to "Javascript" as a
> preinstalled
> >>> scripting language [5] are understandable, but it does seem there is
> >> still
> >>> quite a bit to do before Nashorn or another external scripting engine
> >> could
> >>> be used.
> >>>
> >>> [1] https://issues.apache.org/jira/browse/NIFI-11700: Java 17 Nashorn
> >>> standalone support
> >>> [2] https://issues.apache.org/jira/browse/NIFI-11701: Support building
> >>> with
> >>> version 61 class files
> >>> [3] https://issues.apache.org/jira/browse/NIFI-11702: Upgrade Jetty to
> >>> version 11
> >>> [4] https://issues.apache.org/jira/browse/NIFI-11703: Upgrade Spring
> >>> Framework to version 6
> >>> [5] https://issues.apache.org/jira/browse/NIFI-11713: Remove
> Deprecated
> >>> ECMAScript Support
> >>>
> >>> Regards,
> >>> Dirk Arends
> >>>
> >>
>
>

Re: ValidateXml Processor - validatexml.invalid.error = Validation failed (NiFi 1.19.1)

2023-02-14 Thread Adam Taft

Hi Dan,

Sorry I just saw this reply. If you want to take it, that would be awesome.
I'm always over my head in ThingsToGetDone. Your assistance is appreciated!

Thanks,

/Adam


On Thu, Feb 9, 2023 at 8:56 AM Dan S  wrote:

> Adam,
>  Do you want to take this? If not I would be happy to take this.
>
> On Thu, Feb 9, 2023 at 2:04 AM Bilal Bektaş  .invalid>
> wrote:
>
> > Hi Adam and Dan,
> >
> > Thank you for your quick response.
> >
> > Ticket number: NIFI-11156
> >
> > Best wishes,
> >
> > --Bilal
> >
> >
> >
> > -Original Message-
> > From: Adam Taft 
> > Sent: 8 Şubat 2023 Çarşamba 23:48
> > To: dev@nifi.apache.org
> > Subject: Re: ValidateXml Processor - validatexml.invalid.error =
> > Validation failed (NiFi 1.19.1)
> >
> > Bilal,
> >
> > And I will be more than happy to help fix this. This is plaguing myself
> as
> > well. Please let me know what your ticket number is, and I will help in
> > whatever way is needed to get this fixed.
> >
> > /Adam
> >
> > On Wed, Feb 8, 2023 at 10:13 AM Dan S  wrote:
> >
> > > Bilal,
> > >  Please create a bug ticket for this.
> > >
> > > On Wed, Feb 8, 2023 at 4:51 AM Bilal Bektaş  > > .invalid>
> > > wrote:
> > >
> > > > Hi Dev Team,
> > > >
> > > > NiFi environment was upgraded from 1.15.1 to 1.19.1. The following
> > > > situation has occurred with the ValidateXml processor:
> > > >
> > > > ValidateXml processor works correctly on 1.15.1 and
> > > > validatexml.invalid.error attribute gives an detail information
> > > > about the error.
> > > >
> > > > ValidateXml processor works on 1.19.1 but validatexml.invalid.error
> > > > attribute does not give an detail information about the error.
> > > > validatexml.invalid.error attriubute only contains “Validation
> failed”
> > > text.
> > > >
> > > > Similar problem was talked in cloudera community and Eduu said that:
> > > >
> > > > Hi @ChuckE<
> > > > https://community.cloudera.com/t5/user/viewprofilepage/user-id/98065
> > > > > using NiFi 1.15.2 seems to add the error detail in the
> > > > "validatexml.invalid.error" attribute.
> > > > For example --> "cvc-minLength-valid: Value '' with length = '0' is
> > > > not facet-valid with respect to minLength '1' for type 'HotelCode"
> > > >
> > > > This seems to be an issue starting from NiFi 1.16.3, maybe a bug?
> > > > You can use NiFi 1.15.2 or custom code as @SAMSAL<
> > > > https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381
> > > > >
> > > > suggested.
> > > >
> > > >
> > > > Ref:
> > > >
> > > https://community.cloudera.com/t5/Support-Questions/How-to-get-the-rea
> > > son-for-invalid-XML/m-p/348767/highlight/true#M235443
> > > >
> > > > Is it possible to add to this feature in new NiFi version like NiFi
> > > 1.15.1?
> > > >
> > > >
> > > > Thank you in advance,
> > > >
> > > > --Bilal
> > > >
> > > > obase
> > > > TEL: +90216 527 30 00
> > > > FAX: +90216 527 31 11
> > > > [http://www.obase.com/images/signature/home.png]<http://www.obase.co
> > > > m> [ http://www.obase.com/images/signature/facebook.png] <
> > > > https://www.facebook.com/obasesocial>  [
> > > > http://www.obase.com/images/signature/twitter.png] <
> > > > https://twitter.com/obasesocial>  [
> > > > http://www.obase.com/images/signature/linkedin.png] <
> > > > https://tr.linkedin.com/in/obase>
> > > > [http://www.obase.com/images/signature/obaselogo.png]<
> > > http://www.obase.com
> > > > >
> > > >
> > > > Bu elektronik posta ve onunla iletilen bütün dosyalar sadece
> > > > göndericisi tarafindan almasi amaclanan yetkili gercek ya da tüzel
> > > > kisinin kullanimi icindir. Eger söz konusu yetkili alici degilseniz
> > > > bu elektronik postanin icerigini aciklamaniz, kopyalamaniz,
> > > > yönlendirmeniz ve kullanmaniz kesinlikle yasaktir ve bu elektronik
> > > > postayi derhal silmeniz
> > > gerekmektedir.
> > > > OBASE bu mesajin icerdigi bilgilerin doğruluğu veya eksiksiz oldugu
> > &

Re: PostHTTP Deprecation Concerns

2023-02-12 Thread Adam Taft

Hi Lucas,

Cheers, and thanks for sharing.

One of the solutions discussed even recently on the user's mailing list was
effectively putting MergeContent in front of InvokeHTTP. MergeContent would
be configured with the "FlowFile Stream v3" protocol, which effectively
bundles up flowfiles similar to PostHTTP with the send-as-flowfile
attribute enabled. ListenHTTP will receive the bundled flowfiles and unwind
the packaging upon reception.

This is probably the smallest solution for those trying to replace PostHTTP
in use today. As I mentioned in my original replies, I don't think it's a
100% drop-in replacement, but probably the simplest solution available
going forward, especially when NiFi 2.0 is released.

I'm going to check out your code sometime, thanks again for responding to
an old thread and offering a solution.

/Adam

On Sun, Feb 12, 2023 at 8:21 AM Lucas Ottersbach 
wrote:

> Hi Adam,
>
> I know this thread has been opened over a month ago, but we recently had to
> move FlowFiles, including both attributes and content, from one NiFi
> cluster to another and could not built upon the built-in Site-to-Site
> transfer mechanisms due to network restrictions between the clusters.
>
> We've built upon an existing solution from a community member which has
> been dormant for some time. It uses a pair of two custom processors to
> transfer FlowFile content and attributes using raw TCP connections.
> You can find the solution under its name "nifi-flow-over-tcp" both on
> GitHub and on Maven Central.
> githubDOTcom/EndzeitBegins/nifi-flow-over-tcp
>
>
> Maybe this can be helpful to you as well in the aforementioned cases you
> previously made use of the PostHTTP processor.
>
>
> Best regards
>
> Adam Taft  schrieb am Do., 12. Jan. 2023, 05:39:
>
> > David,
> >
> > Thank you for the reasonable response to my questions. Much appreciated.
> >
> > I'm not a huge fan of the MergeContent -> InvokeHTTP -> {} -> ListenHTTP
> ->
> > UnpackContent approach to provide the same functionality. But I do
> > acknowledge that's the most direct replacement option without PostHTTP.
> > It's adding extract processors to the chain for something that is
> > effectively a transport issue. NiFi-to-Nifi using PostHTTP was a simple
> > transport-oriented solution, and packing the data with MergeContent first
> > isn't quite the same level of fidelity. You also miss the two-phase
> commit
> > built into those extra bits. MergeContent is often a bit of a beast
> > in-and-of-itself too.
> >
> > Flowfile attributes conveyed as HTTP headers definitely don't work for
> > complex attribute values. But yes, I know that the functionality exists
> > (having some history with that processor myself).
> >
> > Thanks again for the response.
> >
> > /Adam
> >
> >
> >
> >
> > On Wed, Jan 11, 2023 at 9:27 PM Adam Taft  wrote:
> >
> > > Hi Mathew,
> > >
> > > > It's quite remarkable you're advocating against standard practice
> > > presumably
> > > > for your own convenience.
> > >
> > > Wow, absolutely not stated nor implied in my message. And even
> borderline
> > > offensive.
> > >
> > > What I asked was simply, why remove it, if it's not hurting anything. I
> > > agree with your statement that there is a (very small) cost for
> > maintaining
> > > the component in the source tree. But PostHTTP is not in the same scope
> > as
> > > compared to a component that has a dependency on an abandoned,
> insecure,
> > or
> > > completely out of standards library (for example).
> > >
> > > PostHTTP has a reasonable use case (as I described) that is not
> directly
> > > matched with other processors. The two-phase commit protocol sitting
> > > between PostHTTP and ListenHTTP has demonstrated to bear good fruit
> over
> > > many hardened years of use. I think it's a reasonable reply to my
> > question
> > > to just simply suggest that the interaction between PostHTTP and
> > ListenHTTP
> > > is just not supported by NiFi going forward. But please don't tell me
> my
> > > question/concern is "out of convenience."
> > >
> > > There is lacking documentation as to the rationale behind the
> deprecation
> > > of PostHTTP. I might be missing it, can you please send me the link to
> > the
> > > rationale? That's what this thread is trying to address. It sounds
> like,
> > > from your answer, that the rationale is to reduce code footprint, which
> > > isn't the strongest argument f

Re: ValidateXml Processor - validatexml.invalid.error = Validation failed (NiFi 1.19.1)

2023-02-08 Thread Adam Taft

Bilal,

And I will be more than happy to help fix this. This is plaguing myself as
well. Please let me know what your ticket number is, and I will help in
whatever way is needed to get this fixed.

/Adam

On Wed, Feb 8, 2023 at 10:13 AM Dan S  wrote:

> Bilal,
>  Please create a bug ticket for this.
>
> On Wed, Feb 8, 2023 at 4:51 AM Bilal Bektaş  .invalid>
> wrote:
>
> > Hi Dev Team,
> >
> > NiFi environment was upgraded from 1.15.1 to 1.19.1. The following
> > situation has occurred with the ValidateXml processor:
> >
> > ValidateXml processor works correctly on 1.15.1 and
> > validatexml.invalid.error attribute gives an detail information about the
> > error.
> >
> > ValidateXml processor works on 1.19.1 but validatexml.invalid.error
> > attribute does not give an detail information about the error.
> > validatexml.invalid.error attriubute only contains “Validation failed”
> text.
> >
> > Similar problem was talked in cloudera community and Eduu said that:
> >
> > Hi @ChuckE<
> > https://community.cloudera.com/t5/user/viewprofilepage/user-id/98065>
> > using NiFi 1.15.2 seems to add the error detail in the
> > "validatexml.invalid.error" attribute.
> > For example --> "cvc-minLength-valid: Value '' with length = '0' is not
> > facet-valid with respect to minLength '1' for type 'HotelCode"
> >
> > This seems to be an issue starting from NiFi 1.16.3, maybe a bug?
> > You can use NiFi 1.15.2 or custom code as @SAMSAL<
> > https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381>
> > suggested.
> >
> >
> > Ref:
> >
> https://community.cloudera.com/t5/Support-Questions/How-to-get-the-reason-for-invalid-XML/m-p/348767/highlight/true#M235443
> >
> > Is it possible to add to this feature in new NiFi version like NiFi
> 1.15.1?
> >
> >
> > Thank you in advance,
> >
> > --Bilal
> >
> > obase
> > TEL: +90216 527 30 00
> > FAX: +90216 527 31 11
> > [http://www.obase.com/images/signature/home.png] [
> > http://www.obase.com/images/signature/facebook.png] <
> > https://www.facebook.com/obasesocial>  [
> > http://www.obase.com/images/signature/twitter.png] <
> > https://twitter.com/obasesocial>  [
> > http://www.obase.com/images/signature/linkedin.png] <
> > https://tr.linkedin.com/in/obase>
> > [http://www.obase.com/images/signature/obaselogo.png]<
> http://www.obase.com
> > >
> >
> > Bu elektronik posta ve onunla iletilen bütün dosyalar sadece göndericisi
> > tarafindan almasi amaclanan yetkili gercek ya da tüzel kisinin kullanimi
> > icindir. Eger söz konusu yetkili alici degilseniz bu elektronik postanin
> > icerigini aciklamaniz, kopyalamaniz, yönlendirmeniz ve kullanmaniz
> > kesinlikle yasaktir ve bu elektronik postayi derhal silmeniz
> gerekmektedir.
> > OBASE bu mesajin icerdigi bilgilerin doğruluğu veya eksiksiz oldugu
> > konusunda herhangi bir garanti vermemektedir. Bu nedenle bu bilgilerin ne
> > sekilde olursa olsun iceriginden, iletilmesinden, alinmasindan ve
> > saklanmasindan sorumlu degildir. Bu mesajdaki görüsler yalnizca gönderen
> > kisiye aittir ve OBASE görüslerini yansitmayabilir.
> >
> > Bu e-posta bilinen bütün bilgisayar virüslerine karsi taranmistir.
> >
> > This e-mail and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom they are
> > addressed. If you are not the intended recipient you are hereby notified
> > that any dissemination, forwarding, copying or use of any of the
> > information is strictly prohibited, and the e-mail should immediately be
> > deleted. OBASE makes no warranty as to the accuracy or completeness of
> any
> > information contained in this message and hereby excludes any liability
> of
> > any kind for the information contained therein or for the information
> > transmission, recepxion, storage or use of such in any way whatsoever.
> The
> > opinions expressed in this message belong to sender alone and may not
> > necessarily reflect the opinions of OBASE.
> >
> > This e-mail has been scanned for all known computer viruses.
> >
>

Re: PostHTTP Deprecation Concerns

2023-01-11 Thread Adam Taft

David,

Thank you for the reasonable response to my questions. Much appreciated.

I'm not a huge fan of the MergeContent -> InvokeHTTP -> {} -> ListenHTTP ->
UnpackContent approach to provide the same functionality. But I do
acknowledge that's the most direct replacement option without PostHTTP.
It's adding extract processors to the chain for something that is
effectively a transport issue. NiFi-to-Nifi using PostHTTP was a simple
transport-oriented solution, and packing the data with MergeContent first
isn't quite the same level of fidelity. You also miss the two-phase commit
built into those extra bits. MergeContent is often a bit of a beast
in-and-of-itself too.

Flowfile attributes conveyed as HTTP headers definitely don't work for
complex attribute values. But yes, I know that the functionality exists
(having some history with that processor myself).

Thanks again for the response.

/Adam




On Wed, Jan 11, 2023 at 9:27 PM Adam Taft  wrote:

> Hi Mathew,
>
> > It's quite remarkable you're advocating against standard practice
> presumably
> > for your own convenience.
>
> Wow, absolutely not stated nor implied in my message. And even borderline
> offensive.
>
> What I asked was simply, why remove it, if it's not hurting anything. I
> agree with your statement that there is a (very small) cost for maintaining
> the component in the source tree. But PostHTTP is not in the same scope as
> compared to a component that has a dependency on an abandoned, insecure, or
> completely out of standards library (for example).
>
> PostHTTP has a reasonable use case (as I described) that is not directly
> matched with other processors. The two-phase commit protocol sitting
> between PostHTTP and ListenHTTP has demonstrated to bear good fruit over
> many hardened years of use. I think it's a reasonable reply to my question
> to just simply suggest that the interaction between PostHTTP and ListenHTTP
> is just not supported by NiFi going forward. But please don't tell me my
> question/concern is "out of convenience."
>
> There is lacking documentation as to the rationale behind the deprecation
> of PostHTTP. I might be missing it, can you please send me the link to the
> rationale? That's what this thread is trying to address. It sounds like,
> from your answer, that the rationale is to reduce code footprint, which
> isn't the strongest argument for its removal given its established
> historical use. Seems like we'd want more than just reduced footprint for
> such a heavily used processor, no?
>
> /Adam
>
>
> On Wed, Jan 11, 2023 at 7:53 PM Matthew Hawkins 
> wrote:
>
>> Hi Adam,
>>
>> PostHTTP was marked deprecated 3 years ago (aka six technology lifetimes).
>> The successive technologies to replace it's functionality are well
>> documented and proven in production. The technical reason to remove it is
>> that it is superfluous code that has a cost to maintain and zero benefit.
>> Backwards compatibility is never guaranteed for components marked
>> deprecated for such a long length of time in any software product let
>> alone
>> nifi specifically.
>>
>> Your organisation is free to continue using the version of nifi it is on
>> today and not take any further action. It is unhelpful to suggest every
>> other organisation should be held back in progress because yours refuses
>> to
>> take the necessary flow maintenance action. One of the impetus for a major
>> version upgrade is to specifically jettison deprecated components. It's
>> quite remarkable you're advocating against standard practice presumably
>> for
>> your own convenience.
>>
>> Site to site connectivity is conducted with either raw sockets or http
>> (which is https on secured nifi) so I'm highly skeptical there is any
>> performance deprecation in InvokeHTTP or S2S over PostHTTP, given the
>> former can take advantage of http/2 and the latter not. It's easy to
>> monitor nifi and prove through metrics in any case. Sadly in enterprise
>> environments this is sometimes necessary to defeat the political layer
>> around change management.
>>
>> You can run records-based processing over either current method and it is
>> ridiculously fast. The bottleneck in my last engagement ended up being
>> network hardware limitations, not nifi. Having contributed in this domain,
>> I also recommend tossing CompressContent into the flow to minimise
>> bandwidth. On modern hardware the decompression is minimal time and you
>> can
>> plug a *lot* more data through in less CPU and wall clock time. It's easy
>> to bench with DuplicateFlowfile on your production flow and metrics
>> analysis, just make sure yo

Re: PostHTTP Deprecation Concerns

2023-01-11 Thread Adam Taft

Hi Mathew,

> It's quite remarkable you're advocating against standard practice
presumably
> for your own convenience.

Wow, absolutely not stated nor implied in my message. And even borderline
offensive.

What I asked was simply, why remove it, if it's not hurting anything. I
agree with your statement that there is a (very small) cost for maintaining
the component in the source tree. But PostHTTP is not in the same scope as
compared to a component that has a dependency on an abandoned, insecure, or
completely out of standards library (for example).

PostHTTP has a reasonable use case (as I described) that is not directly
matched with other processors. The two-phase commit protocol sitting
between PostHTTP and ListenHTTP has demonstrated to bear good fruit over
many hardened years of use. I think it's a reasonable reply to my question
to just simply suggest that the interaction between PostHTTP and ListenHTTP
is just not supported by NiFi going forward. But please don't tell me my
question/concern is "out of convenience."

There is lacking documentation as to the rationale behind the deprecation
of PostHTTP. I might be missing it, can you please send me the link to the
rationale? That's what this thread is trying to address. It sounds like,
from your answer, that the rationale is to reduce code footprint, which
isn't the strongest argument for its removal given its established
historical use. Seems like we'd want more than just reduced footprint for
such a heavily used processor, no?

/Adam

On Wed, Jan 11, 2023 at 7:53 PM Matthew Hawkins  wrote:

> Hi Adam,
>
> PostHTTP was marked deprecated 3 years ago (aka six technology lifetimes).
> The successive technologies to replace it's functionality are well
> documented and proven in production. The technical reason to remove it is
> that it is superfluous code that has a cost to maintain and zero benefit.
> Backwards compatibility is never guaranteed for components marked
> deprecated for such a long length of time in any software product let alone
> nifi specifically.
>
> Your organisation is free to continue using the version of nifi it is on
> today and not take any further action. It is unhelpful to suggest every
> other organisation should be held back in progress because yours refuses to
> take the necessary flow maintenance action. One of the impetus for a major
> version upgrade is to specifically jettison deprecated components. It's
> quite remarkable you're advocating against standard practice presumably for
> your own convenience.
>
> Site to site connectivity is conducted with either raw sockets or http
> (which is https on secured nifi) so I'm highly skeptical there is any
> performance deprecation in InvokeHTTP or S2S over PostHTTP, given the
> former can take advantage of http/2 and the latter not. It's easy to
> monitor nifi and prove through metrics in any case. Sadly in enterprise
> environments this is sometimes necessary to defeat the political layer
> around change management.
>
> You can run records-based processing over either current method and it is
> ridiculously fast. The bottleneck in my last engagement ended up being
> network hardware limitations, not nifi. Having contributed in this domain,
> I also recommend tossing CompressContent into the flow to minimise
> bandwidth. On modern hardware the decompression is minimal time and you can
> plug a *lot* more data through in less CPU and wall clock time. It's easy
> to bench with DuplicateFlowfile on your production flow and metrics
> analysis, just make sure your provenance db has sufficient space.
>
> Kind regards,
>
> On Thu, Jan 12, 2023, 10:09 Adam Taft  wrote:
>
> > Just wanted to note a concern on the deprecation (and presumed removal)
> of
> > the PostHTTP processor in the upcoming 2.0 release.
> >
> > While yes, for traditional client interactions with an external HTTP
> > service, utilizing InvokeHTTP for your POST operation is probably
> sensible.
> > The concern is that there are a number of NiFi-to-NiFi transfers that
> > leverage the "special sauce" that exists between PostHTTP and ListenHTTP.
> >
> > What special sauce? Namely, the extra negotiation that enables an
> automated
> > serialization of NiFi flowfiles from one system to another. InvokeHTTP is
> > just a "raw" HTTP client and doesn't share any special concern or support
> > for NiFi-to-NiFi data transfer.
> >
> > Of course, if you remember the history, before there was any site-to-site
> > functionality built into processor groups, the primary means of flowfile
> > transport between NiFi systems was the PostHTTP / ListenHTTP combo. It
> was
> > an easy way to facilitate transfer between two nifi systems.
> >
> > And from what I ca

PostHTTP Deprecation Concerns

2023-01-11 Thread Adam Taft

Just wanted to note a concern on the deprecation (and presumed removal) of
the PostHTTP processor in the upcoming 2.0 release.

While yes, for traditional client interactions with an external HTTP
service, utilizing InvokeHTTP for your POST operation is probably sensible.
The concern is that there are a number of NiFi-to-NiFi transfers that
leverage the "special sauce" that exists between PostHTTP and ListenHTTP.

What special sauce? Namely, the extra negotiation that enables an automated
serialization of NiFi flowfiles from one system to another. InvokeHTTP is
just a "raw" HTTP client and doesn't share any special concern or support
for NiFi-to-NiFi data transfer.

Of course, if you remember the history, before there was any site-to-site
functionality built into processor groups, the primary means of flowfile
transport between NiFi systems was the PostHTTP / ListenHTTP combo. It was
an easy way to facilitate transfer between two nifi systems.

And from what I can tell, this "legacy" approach to NiFi data transfer is
still being used heavily in certain operational contexts. Why? Because
often it's the case that the _only_ traffic allowed between network
boundaries is done via HTTPS. The site-to-site protocol provides its own
ports and protocol operations that don't necessarily comply with such a
network policy. And I believe there's still some lingering and/or
demonstrated concern over the performance characteristics of the
site-to-site protocol by dataflow managers. They have often reverted to
using PostHTTP / ListenHTTP instead.

While many of the other deprecated components seem logical, getting rid of
this one just seems like change-for-the-sake-of-change.

Is there any actual technical reason to deprecate and remove PostHTTP from
the standard nar? Is it causing a burden to the product itself? Or was the
decision just more like, "hey it's dumb not to use InvokeHTTP for all HTTP
client operations" and maybe not realize the alternative use case that
PostHTTP enables?

Thanks for any feedback.

/Adam

Re: [discuss] NiFi 1.20 and NiFi 2.0

2023-01-11 Thread Adam Taft

This is really insightful and spot on ...

Kevin wrote:
> Good migration tooling will take a while to develop and test, and the core
> contributors to that effort may not have sufficient variety of flows to
> evaluate when the migration tools are "done" for the majority of the
> community to have success upgrading to 2.x. A milestone release would
allow
> us get more feedback on migration over a longer period than the vote
window
> of an RC candidate.

It's exactly this case, that an early 2.0 release might not have had time
to fully work its way through existing production deployments, that's
concerning. The pace and voting of an "RC" is much too short to get any
quality feedback from users in the field.

I think it's really smart to consider the "Milestone" release approach
here. We release 2.0.0-M1, 2.0.0-M2, ... waiting an adequate amount of time
for feedback. We can put these milestones on a calendar, as needed, so that
feedback is required some 'x' number of weeks/months after each milestone.

And to this end, I'd personally rather see us keep the 'main' branch
current with the 1.x line _until_ we're ready and are satisfied with the
end goals of the 2.0 release objectives. When the milestone releases have
been completed and there's a comfort level with the 2.x line, it's at the
point we'd isolate the 1.x line into its own branch and switch main over to
the 2.x line.

This is an attractive way of:
a) continuing business-as-usual with the 1.x line
b) making headway on the 2.x release milestones
c) giving adequate time for feedback against the 2.0 milestones coming from
the field

I don't mind the known-unknowns. But it's really the unknown-unknowns that
are going to drive a delay in the 2.0 release. I think it's smart to be
able to get some of the unknowns ironed out before we finalize the 2.0
release ceremony. The milestone approach really helps with that.

/Adam

On Wed, Jan 11, 2023 at 11:11 AM Kevin Doran  wrote:

> Sorry, Joe, I was not clear, and to be honest the two thoughts are somewhat
> unrelated in my mind too :)
>
> I agree that good migration tooling is key. Otherwise, we risk users
> staying on 1.x or creating a schism of 1.x and 2.x users.
>
> Good migration tooling will take a while to develop and test, and the core
> contributors to that effort may not have sufficient variety of flows to
> evaluate when the migration tools are "done" for the majority of the
> community to have success upgrading to 2.x. A milestone release would allow
> us get more feedback on migration over a longer period than the vote window
> of an RC candidate.
>
> Perhaps we could continue to release from the 1.x line (including minor
> releases with some features) until we are ready to drop the "milestone"
> qualifier from 2.0.0, and only then put 1.x into maintenance-only status.
> It would be the same proposal to move main to target 2.0.0-M1, but relax
> restrictions for what can land on the 1.x branch and be open to a 1.21,
> 1.22, etc. if 2.0.0 work takes longer than anticipated. For example, maybe
> we would be open to landing new/backported processors on the 1.x branch,
> but not core framework features or API changes.
>
> This might not be necessary, but I think it is fair that saying "no new
> features on 1.x" and also "no new features in 2.0.0" puts the project in a
> rough place if 2.0.0 takes longer than a few months, so if we go that
> route, we need to commit to a quick release of 2.0.0 that most users can
> move to easily.
>
> Thanks,
> Kevin
>
> On Jan 11, 2023 at 12:32:46, Joe Witt  wrote:
>
> > Kevin,
> >
> > Yeah we can do whatever we want as far as 'releases' of 2.0 that are
> prior
> > to us officially considering it 2.0/stable.
> >
> > That said - the migration tooling will be key to provide as we need to
> make
> > the bridge as solid and stable as possible to help someone move from 1.x
> to
> > 2.x.  I dont know how related these two concepts (milestone releases and
> > 1.x to 2.x ease really are).
> >
> > Thanks
> >
> > On Wed, Jan 11, 2023 at 10:27 AM Kevin Doran  wrote:
> >
> >  [hit the wrong keyboard shortcut, here is the rest of my thoughts]
> >
> >
> > On this point from David:
> >
> >
> > We may need to have a longer release candidate period, or more
> incremental
> >
> > > fix releases
> >
> > > for the initial 2.0.0 release train, but I do not expect delaying a
> 2.0.0
> >
> > > release for new features, as that is not part of the release goals.
> >
> > >
> >
> >
> > Would the community benefit from one or more milestone releases of 2.0,
> to
> >
> > allow for a wider group to run / live on the proposed 2.0 prior to
> >
> > releasing it as "stable"? I know we've never done a milestone release in
> >
> > the past, and I'm not sure what ASF guidance is on the topic, but if it
> >
> > could be beneficial we could look into that.
> >
> >
> > Cheers,
> >
> > Kevin
> >
> >
> > On Jan 11, 2023 at 12:22:43, Kevin Doran  wrote:
> >
> >
> > > I think this is a good, practical discussion.
>

Re: [discuss] NiFi 1.20 and NiFi 2.0

2023-01-09 Thread Adam Taft

Hi David,

Thanks for the reply. I appreciate it, no further questions from me.

David wrote:
> For deployments that are not using deprecated components and
> features in 1.x...

That's going to be a hard one to sort out, as historically large NiFi
deployments are likely going to have at least some usage of deprecated
components in active production. And rolling out a "major" version change
will take quite a long time for these folks to get right.

If the 2.0 train moves ahead too quickly, and the 1.x releases stop and/or
are abandoned shortly afterwards, these types of "legacy" deployments are
going to suffer and will feel left behind. That's usually when communities
/ projects get forked. My timing/calendar questions are as much about
understanding the 1.x support lifecycle and keeping these slow movers alive
than anything. It would be bad if the rush to 2.0 comes at the expense of
existing deployments.

Again, thanks for the thorough reply.

/Adam


On Mon, Jan 9, 2023 at 6:52 PM David Handermann 
wrote:

> Adam,
>
> Thanks for the reply.
>
> To clarify my statement about traction on 2.0 release goals, I stated it
> that way because I would like to see some measurable progress on the 2.0
> release goals before attempting to put forward any kind of timeline for
> release. I didn't mean to imply an increase in instability in the main
> branch, and that is not what I had in mind. On the contrary, the main
> branch should continue to be considered stable and current, and we should
> continue to apply the same level of rigor for all commits to the main
> branch. Changing major versions should not alter that approach.
>
> As outlined in the 2.0 Release Goals [1] the majority of the changes
> involve removal of deprecated components and features, as opposed to adding
> new and unstable capabilities. From that angle, a 2.0 major release should
> be stable, and the only concern should be that we are surgical in the
> removal process. That strategy is a large part of the reason that moving
> the main branch to 2.0.0-SNAPSHOT should not be seen as extremely
> disruptive. For deployments that are not using deprecated components and
> features in 1.x, upgrading to 2.x should be similar to upgrading between
> minor release versions.
>
> I appreciate the concern about potential disruption, and concern about
> stability. These concerns are important to keep in mind as we move forward.
> As long as we follow this approach, we should be able to maintain the level
> of stability that the community rightly expects.
>
> Regards,
> David Handermann
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Release+Goals
>
>
>
> On Mon, Jan 9, 2023 at 6:08 PM Adam Taft  wrote:
>
> > I think this sentence is capturing some of my question ...
> >
> > David wrote:
> > > I think it would be helpful to see some traction on the 2.0 release
> goals
> > > before attempting to sketch out a potential timeline.
> >
> > It feels like what you're saying is that the "main" git branch is going
> to
> > become an alpha or beta for the 2.0.0 release, and that the newly
> proposed
> > "1.x" branch will be the stable branch. Without any existing traction on
> > the 2.0 release goals (as you've stated), it would start to feel that the
> > main branch doesn't maintain a predictable stability. Folks would have to
> > be looking at the 1.x branch for stable releases for an undefined period
> of
> > time.
> >
> > This is contrary to most philosophies as to what the "main" branch should
> > imply. Typically, the "alpha/beta" work for a major upcoming revision
> would
> > occur in a separate off-main branch until there is at least some fidelity
> > with the release goals. And then switching main from the 1.x to 2.x code
> > base would ideally happen as late as possible in the 2.0.0 release
> > candidate timeframe.
> >
> > It's splitting hairs, of course. Branches are just branches. But I do
> think
> > it's smart to keep the main branch tracking what is considered the
> > currently stable release, not a future beta. I can foresee that there
> will
> > be many 2.0.0 release candidates and late-adopter reluctance to jump onto
> > the 2.0 release until a few cycles of stability have been demonstrated.
> I'd
> > rather feel like we can recommend a 2.0 release straight out of the gate
> > rather than waiting for it to stabilize.
> >
> > No big deal here. Just trying to anticipate what to communicate to people
> > once main switches over. It sounds like the communication will be,
> "ignore
> > the main branch, and focus on the 1.x branc

Re: [discuss] NiFi 1.20 and NiFi 2.0

2023-01-09 Thread Adam Taft

I think this sentence is capturing some of my question ...

David wrote:
> I think it would be helpful to see some traction on the 2.0 release goals
> before attempting to sketch out a potential timeline.

It feels like what you're saying is that the "main" git branch is going to
become an alpha or beta for the 2.0.0 release, and that the newly proposed
"1.x" branch will be the stable branch. Without any existing traction on
the 2.0 release goals (as you've stated), it would start to feel that the
main branch doesn't maintain a predictable stability. Folks would have to
be looking at the 1.x branch for stable releases for an undefined period of
time.

This is contrary to most philosophies as to what the "main" branch should
imply. Typically, the "alpha/beta" work for a major upcoming revision would
occur in a separate off-main branch until there is at least some fidelity
with the release goals. And then switching main from the 1.x to 2.x code
base would ideally happen as late as possible in the 2.0.0 release
candidate timeframe.

It's splitting hairs, of course. Branches are just branches. But I do think
it's smart to keep the main branch tracking what is considered the
currently stable release, not a future beta. I can foresee that there will
be many 2.0.0 release candidates and late-adopter reluctance to jump onto
the 2.0 release until a few cycles of stability have been demonstrated. I'd
rather feel like we can recommend a 2.0 release straight out of the gate
rather than waiting for it to stabilize.

No big deal here. Just trying to anticipate what to communicate to people
once main switches over. It sounds like the communication will be, "ignore
the main branch, and focus on the 1.x branch, if you want to be
conservative."

/Adam

On Mon, Jan 9, 2023 at 3:24 PM David Handermann 
wrote:

> Joe,
>
> Thanks for keeping things moving forward in terms of a 1.20 release and 2.0
> branching plan. Releasing 1.20 and moving the main branch to 2.0.0-SNAPSHOT
> aligns with the approved goals and provides a natural breakpoint for
> continued development on both branches.
>
> Adam,
>
> Thanks for raising the questions about timeline, I'm sure others have
> similar questions. I think it is probably a little too early to propose
> general timelines, but on the other hand, I think the historical pace of
> releases should be a good indication of continued release cadence.
>
> The 2.0 Release Goals did not include a timeline for the major release, or
> subsequent minor releases, by design, but these are certainly questions we
> should answer.
>
> We know that we will need at least one or 1.x releases to complete
> additional migration preparation work. With the scope of 2.0 Release Goals
> purposefully limited, I would not expect extensive delays. We may need to
> have a longer release candidate period, or more incremental fix releases
> for the initial 2.0.0 release train, but I do not expect delaying a 2.0.0
> release for new features, as that is not part of the release goals.
>
> I think it would be helpful to see some traction on the 2.0 release goals
> before attempting to sketch out a potential timeline.
>
> Regards,
> David Handermann
>
> On Mon, Jan 9, 2023 at 3:50 PM Adam Taft  wrote:
>
> > Joe / team,
> >
> > Question on this. I think it would be helpful to understand the desired
> > timelines for the first 2.0.0 release. I know it's not strictly
> > predictable, but having a sense of what the timing looks like is
> important
> > to help understand the implications of a "maintenance only" 1.x line. The
> > schedule would ideally come from the folks who are actively looking at /
> > contributing to the 2.0 release. They probably have the best gauge as to
> > "when" it might happen (under ideal conditions).
> >
> > One of the risks, of course, is if the 2.0 release stalls or delays.
> Having
> > an idea of how 1.x might evolve for the users who are not necessarily
> > early-adopters or those that need longer support tails. If 2.0 is delayed
> > and 1.x looks unmaintained, there's a potential chance for the project to
> > lose a bit of credibility. I know we don't anticipate this scenario, but
> if
> > we had a plan for it, that would be reassuring.
> >
> > Maybe this was already addressed, I apologize if so. But if not, can we
> > throw some darts on the calendar to help understand the ideal rollout of
> > 2.0 on a timeline? And are there any adjustments for the scenario
> described
> > above?
> >
> > Thanks in advance,
> >
> > /Adam
> >
> >
> > On Mon, Jan 9, 2023 at 1:53 PM Joe Witt  wrote:
> >
> > > Team,
> > >
> > >

Re: [discuss] NiFi 1.20 and NiFi 2.0

2023-01-09 Thread Adam Taft

Joe / team,

Question on this. I think it would be helpful to understand the desired
timelines for the first 2.0.0 release. I know it's not strictly
predictable, but having a sense of what the timing looks like is important
to help understand the implications of a "maintenance only" 1.x line. The
schedule would ideally come from the folks who are actively looking at /
contributing to the 2.0 release. They probably have the best gauge as to
"when" it might happen (under ideal conditions).

One of the risks, of course, is if the 2.0 release stalls or delays. Having
an idea of how 1.x might evolve for the users who are not necessarily
early-adopters or those that need longer support tails. If 2.0 is delayed
and 1.x looks unmaintained, there's a potential chance for the project to
lose a bit of credibility. I know we don't anticipate this scenario, but if
we had a plan for it, that would be reassuring.

Maybe this was already addressed, I apologize if so. But if not, can we
throw some darts on the calendar to help understand the ideal rollout of
2.0 on a timeline? And are there any adjustments for the scenario described
above?

Thanks in advance,

/Adam

On Mon, Jan 9, 2023 at 1:53 PM Joe Witt  wrote:

> Team,
>
> As David mentioned in [1] following a successful NiFi 2.0 release goal
> planning - we are now going to start moving the 'main' line to be the NiFi
> 2.0 line which will allow for the key work to take place.  We will also
> move niFi 1.x to its appropriate support line.
>
> It is also the case that we have nearly 100 JIRAs on NiFi 1.20 and we have
> work in there including security items so it is time to make a release.
> The intent then is to initiate 1.20 and immediate after that change 'main'
> to 2.0.
>
> Going forward then all work on the 1.x line should be focused on
> maintaining existing features, dependencies, and helping 1.x users migrate
> to the 2.x line.  Otherwise, new feature work will happen on 'main' as it
> normally does and will come out in the NiFi 2.x release line.
>
> Please flag key outstanding items as we narrow down the release candidate
> for NiFi 1.20.
>
> Thanks
> Joe
>
> [1] https://lists.apache.org/thread/qo4vvdw46235y7vy2crcd6l4m11wl7jz
>

Re: Cobol/EBCDIC Source

2023-01-06 Thread Adam Taft

Hi Frank,

NiFi does not currently have native support for Cobol formats, such as
copybook or EBCDIC. However, NiFi is built to be very extensible/pluggable,
and as such, support can be added for any type of data conversion that you
can imagine. For example, a new conversion processor from EBCDIC would
probably either go into a CSV format or directly into NiFi's Record
format[1]. From there, conversion to other forms and egress to systems like
Hive would be relatively easy to do.

I have a small hobby-level interest in Cobol, from my early days in tech. I
would be happy to help guide you, if you decide to create a custom NiFi
processor to do this conversion. Many other NiFi community members will
likewise be happy to guide you as well. It's possible even that it would be
considered for inclusion in the NiFi distribution directly, but likewise,
your custom NiFi extensions can live on strictly as an add-on component in
your control. You don't necessarily need it to be included into the main
NiFi distribution for you to leverage NiFi's extensibility.

Do you know of any Java-based libraries that already do the conversion that
you need? If so, that would be the starting place for the custom component.

/Adam

[1] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi

On Fri, Jan 6, 2023 at 7:06 AM DiMartini, Frank L <
frank.dimart...@navient.com> wrote:

> Hello,
>
>
>
> Does or will Apache Nifi support cobol copybook and EBCDIC file formats as
> a source? If not, Is there a way to build a custom source to support this?
> We are looking for ETL tools that have this functionality and can load to
> an Apache Hive DB.
>
>
>
> Thanks,
>
>
>
>
>
>
> *Frank DiMartini *Principal, Enterprise Data Warehouse
>
> Navient
> Pennsylvania Home Office
>
> 570.706.6085 (m)
> frank.dimart...@navient.com
> * Navient.com*
>
>
>

Re: setup TLS configuration

2023-01-04 Thread Adam Taft

Elvis,

I found this document which might help give you clues to convert between
IBM MQ's "kdb" format and the traditional Java "jks" format. In principle,
it looks like you will need to export your client certificates, etc. out
from your kdb store:

https://www.ibm.com/mysupport/s/question/0D50z62l4HICAY/how-do-i-configure-ssl-tls-between-java-client-and-mq-queue-manager?language=en_US

NiFi is not going to understand the kdb format. So you will ultimately need
to export your certs and CA from the kdb file you created. From there, you
will probably need to configure the JMS processors in NiFi to connect to MQ
server. These documents seem to have some hints:

https://community.cloudera.com/t5/Support-Questions/how-to-Setup-IBM-MQ-Configuration-for-Nifi/td-p/118155
https://www.senia.org/2018/05/10/integrating-apache-nifi-with-ibm-mq/
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-jms-processors-nar/1.19.1/org.apache.nifi.jms.cf.JMSConnectionFactoryProvider/additionalDetails.html

Hope that helps.

/Adam





On Wed, Dec 28, 2022 at 2:23 PM Rivera Molina, Leonorelvis
 wrote:

> Hello I am Elvis Rivera from Scotiabank, we have a connection with NIFI
> and it is very urgent to do the TLS set up between my MQ Server and NIFI
>
> I have a kesystore called  key.kdb  that is the name of the kesystore for
> the MQ manager and I was checking your documentation and I do not see
> anything related to the TLS connection between MQ SERVER and NIFI
>
> Do you have any documentation or could you please provide any help as this
> is something very urgent since we have a vulnerability open with our
> Security department.
>
> Thank you
>
> Elvis Rivera Molina | Asesor Arquitectura Transformación GBM
> Scotiabank  | DGA Tecnologías de la Información
> Boulevard Manuel Avila Camacho No.1
> Colonia. Lomas de Chapultepec Piso1.
> lrive...@scotiabank.com.mx
>
>
> Aviso de Confidencialidad: Este correo electrónico y/o el material adjunto
> es para uso exclusivo de la persona o entidad a la que expresamente se le
> ha enviado, y puede contener información confidencial o material
> privilegiado. Si usted no es el destinatario legítimo del mismo, por favor
> repórtelo inmediatamente al remitente del correo y bórrelo. Cualquier
> revisión, retransmisión, difusión o cualquier otro uso de este correo, por
> personas o entidades distintas a las del destinatario legítimo, queda
> expresamente prohibido. Este correo electrónico no pretende ni debe ser
> considerado como constitutivo de ninguna relación legal, contractual o de
> otra índole similar.
> Notice of Confidentiality: The information transmitted is intended only
> for the person or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, re-transmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender
> immediately by return electronic transmission and then immediately delete
> this transmission, including all attachments, without copying, distributing
> or disclosing same.
>
>

Re: proposal to extend several component key properties to use Expression Lng instead of VarRegs only (based off https://issues.apache.org/jira/browse/NIFI-8214)

2022-10-20 Thread Adam Taft

Definitely appreciate having as much "control" as possible afforded to each
flowfile. The use cases described here are spot on and I've hit this myself
previously. Any endpoint definition would ideally be configurable from the
flowfile itself via expression language. It's easy enough to hard code a
static endpoint value (maybe the majority use case), but it also doesn't
hurt to enable the flexibility of reading configuration from flowfile
attributes. So on principle and generally as a design aesthetic, this is a
great idea.

The hard part is that the internals for many processors are anchored to a
specific endpoint. For example, imagine a messaging service that requires a
heavy-weight client to be constructed. Typically, a processor might only
manage a single connection object to the remote service. The lifecycle of
the processor creates/destroys the underlying client, and must initialize
without input values coming from flowfile attributes.

It's these cases that will be harder to refactor. In theory (using my
example), the processor could maintain a pool of connections to different
remote service locations, caching each based on the hostname of the remote
service (or whatever). It's of course possible to create these heavy
connection objects "on demand" based on the attributes of the flowfile
being processed, caching them inside the processor, expiring them after a
period of time, etc. But it adds to the burden that the processor must
maintain (in terms of lines of code and/or complexity of the processor) and
might have effects on resource allocation (memory allocated per client,
etc.).

So that's really the tension here. The programming model of the processor,
in many cases, makes it "cleaner" to maintain a single connection facility,
which is why the configuration is a bit more stringent and why many
processors don't enable this more dynamic capability.

But definitely, any processors which can support a dynamic configuration
model, where flowfile attributes are used to make remote connections, those
should be the low hanging fruit to make changes to. And then past that,
probably any other processor will just need to be evaluated and considered
for a more sophisticated "on demand" approach for creating or maintaining
its internal clients or components.

On Thu, Oct 20, 2022 at 9:26 AM Kevin Doran  wrote:

> Hi Rogier,
>
> Thanks for your message. This is an interesting use case. In a way it
> inverts the typical use of NiFi, which is where the flow files are the data
> being moved and the flow logic is in the flow definition / processor
> config. Instead this puts the parameters of the job/workflow into the
> flowfile, which presumably gets enhanced as it moves through the flow so
> you end up with one object/document containing your workflow parameters and
> data/results. Is that accurate?
>
> I understand correctly, this sounds like a workflow orchestration problem,
> which is similar to but has some subtle differences from data flow
> management. There are tools that try to solve workflow orchestration. Two
> that come to mind are Conductor [1] and Cadence [2]. NiFi can do this, and
> I see plenty of flows that use flowfile attributes to store some control
> signals or values needed for flow logic. But because its not the core use
> case, I think NiFi developers / extension authors don't think of it when
> building components, which is why they don't think of enabling expression
> language on certain properties.
>
> This isn't really a response to whether NiFi should / should not add
> broader expression language support to properties nor is it an opinion on
> wheterh NiFi should or should not try to serve the needs of workflow
> orchestration / job execution. Others on this list may have opinions on
> that. I'm just offering my perspective on why this isn't already the case.
> AFAIK, in many cases adding EL support to processor properties is a fairly
> straightforward effort; the challenge, as you point out, is applying it
> broadly to all our existing processors (and new processors as they get
> developed) rather than just one or two.
>
> [1] https://conductor.netflix.com/
> [2] https://cadenceworkflow.io/
>
> Cheers,
> Kevin
>
> On Oct 18, 2022 at 06:05:14, TIMMERMANS Rogier <
> rogier.timmerm...@contractor.voo.be> wrote:
>
> > Hello,
> >
> >
> >
> > Apologies in advance if this is the wrong list to send this type of query
> > to.
> >
> > After short discussion with Chris Sampson on
> > https://issues.apache.org/jira/browse/NIFI-8214 he proposed to send out
> > email to this list; I hope it finds you well.
> >
> >
> >
> > We (several of my colleagues) sometimes use a pattern where we build a
> > nifi flow that gets initiated by a short json configuration file; the
> > initial input file (or generated flowfile) contains simple configuration
> > data for the rest of the flow and sets up things like Remote paths,
> users,
> > testcase IDs, endpoints to hit, etc… as a file is easier to manage,
> >

Re: Using nifi in separated networks

2021-08-02 Thread Adam Taft

Just spitballing a little here. If you set the configuration of the PutTCP
processor property "Connection per Flowfile" to 'true' and you leave the
"Outgoing Message Delimiter" as blank (none), then I don't think you have
the delimiter problem that you both are describing. I could be wrong though?

I would consider it a bug if you couldn't send a "raw" connection-oriented
object over PutTCP.  With that processor, the goal would be to: a) open a
socket, b) dump whatever binary you have prepared over it, c) close the
socket to signal completion of transfer. If PutTCP doesn't work this way
(byte-for-byte), it should probably be flagged as a bug (its original
intention was exactly this use case).

That being said, I still think custom FlowFile serialization might be
something that is outside of the concern of the transport. I personally
think serializing/deserializing is a different concern from transport.
Arguably, sometimes the semantics of the transport protocol requires you to
prepare the message itself in a protocol accommodating way (HTTP being an
obvious example of this, or packet ordering in Marc's UDP example). But a
new JSON flowfile serialization seems like it could be a separate
processor, not commingled into an existing one.

MergeContent / UnpackContent work in tandem and have a "FlowFile Stream v3"
format that can serialize/deserialize multiple flowfiles together into a
single byte stream. This allows transport over any protocol, including
file-based, socket-based, etc.

Marc: Your mention of performance is, of course, appropriate for the scale
that you're talking about (Gbps). Maybe there's some performance
improvements that could be garnered from your work applicable to the
"standard" processors I mentioned. And I definitely didn't mean to imply
you were doing "anything wrong". Just legitimately curious as to your
thought process and design approach.

OK, I'll step off a little, because I might be probing too hard here. But I
was legitimately curious about the intention of the proposed processor as
it relates to the mentioned Diode device.

Thanks,

Adam

On Mon, Aug 2, 2021 at 4:15 PM Phil H  wrote:

> Hi Marc,
>
> Thanks for the additional info.  Just so you know you’re not the only
> one, I’ve also had to re-implement a ListenTCP alternative to get
> around the byte delimeter issue for binary and multiline text data.
>
> Phil
>
>
> On Tue, Aug 3, 2021 at 6:59 AM Marc  wrote:
> >
> > Hi Adam,
> >
> > more or less it is a ‚merge', puttcp, listentcp and unpack. I hope that
> I am not wrong but the nifi ListenTCP processor uses a delimiter (\n as
> default?). If you are transferring binary data the processor splits the
> flow into ‚pieces'. And the attributes are not transferred to the
> destination.
> >
> > But your idea describes what the processor is doing.
> >
> > 1. It converts the attributes to a json string
> > 2. It transfers the json string and the payload (there is a header that
> tells the destination how long the json header and how long the payload is)
> > 3. The Listener gets the flow and decodes the header (to get the size of
> the json header and the payload)
> > 4. It writes the payload to a flow
> > 5. It converts the json string and sets the attributes to the flow
> >
> > If you do not want to transfer attributes you can configure a different
> decoder. In this case you can just ‚nectat‘ a binary file to nifi.
> >
> > The UDP version is far more complex. There must be a counter to tell the
> destination what part of the flow file was received (even in a diode
> environment packets are not received in the right order!). And you must be
> fast, very fast. It is a multithreaded architecture because one thread
> cannot receive, decode, and write a gigabit per second. I used the
> disruptor library. Receive a packet in one thread, decode it in another
> thread. A third thread gets the packet and write the content in the right
> order to a flow.
> >
> > I am still learning (and I am not a professional software developer). If
> I did something wrong or oversaw something please tell me.
> >
> > Marc
> >
> > > Am 02.08.2021 um 22:01 schrieb Adam Taft :
> > >
> > > Marc,
> > >
> > > How would this differ from a more generic use of the existing
> processors,
> > > PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is
> being
> > > added above these existing processors, but I'm sure I'm missing
> something.
> > >
> > > There's already an ability to serialize flowfiles via MergeContent. And
> > > there's the deserialize side in UnpackContent. So a dataflow that looks
> > > like the following would seem a reason

Re: Using nifi in separated networks

2021-08-02 Thread Adam Taft

Marc,

How would this differ from a more generic use of the existing processors,
PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is being
added above these existing processors, but I'm sure I'm missing something.

There's already an ability to serialize flowfiles via MergeContent. And
there's the deserialize side in UnpackContent. So a dataflow that looks
like the following would seem a reasonable approach to the problem:

MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent

I'm actually very interested in this topic, having a project that has a use
case for a "diode". So I'm legitimately asking here, not trying to derail
your work.

Thanks in advance,

Adam

On Sun, Aug 1, 2021 at 12:26 PM Marc  wrote:

> Greetings,
>
> there are companies and organizations that strictly separate their
> networks for security reasons. Such companies often use diodes to achieve
> this. But of course they still have to exchange data between the networks
> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two kinds of
> diodes. Some hardware-based ones only use one fiber optic to send data (UDP
> based). Others use TCP, but prevent sending in the reverse direction.
>
> Nifi is an amazing tool that allows data to be transferred between two
> separate networks in a very flexible but also secure way. I have
> implemented two processors. The first one ‚merges‘ the attributes and the
> content of a flowfile and sends it to the destination. The second one
> listens on a TCP port, splits attributes and content and creates a new
> flowfile containing all attributes of the origin flow. You can send the
> flow without attributes as well. In this case you can easily netcat a
> binary file to Nifi.
>
> These two processors are useful if you do NOT have a bidirectional
> communication between two NiFi instances and therefore the site-2-site
> mechanism or http(s) cannot be used.
>
> We have been using these processors for a longer period of time (exactly
> the version for 1.13.2) and would like to share these processors with
> others. So the question to you all is: Is someone interested in these
> processors or is this use case too special?
>
> The current source code can be found on GitHub. (
> https://github.com/nerdfunk-net/diode/ <
> https://github.com/nerdfunk-net/diode/>)
>
> I have also implemented a UDP based version of the processor. Due to the
> nature of UDP, this is more complex and these processors are now being
> tested.
>
> Best regards
> Marc

Re: [DISCUSS] NiFi 2.0 Release Goals

2021-07-30 Thread Adam Taft

Thanks for the reply Joe.

If the "marketplace" concept can be scoped to something like "not in core
NiFi but can be downloaded here", then it could fit in the definition of
achievable in 2.0.  The concept of the marketplace being this magical
dynamic processor lookup service always felt a little too vaporware to me.

One thing that would help the NiFi build size in general is to really focus
on what needs to be in "core" and what can just be external nars that the
dataflow/systems manager can add to the 'extensions' directory.  We've
struggled with having really large builds, so maybe this is an opportunity
to address that in 2.0.

Radical brainstorm warning:
Imagine a completely "thin" NiFi distribution. e.g. with no processors or
other components at all. Then a configuration manager could source
additional nar functionality from a different distribution source. We could
still build processor bundles, but they would just be completely separate
from the framework. Separate git repository, separate versioning, separate
download, etc.

This would be the baby steps towards the marketplace concept. Let the
ecosystem of processors and other components thrive outside of the main
framework distribution. In this way, things like PostHTTP could live on
easily without burdening the framework. And groups of similar processor
functionality can exist in separately managed nars.

Thanks for entertaining the thought at least.

/Adam

On Fri, Jul 30, 2021 at 4:50 PM Joe Witt  wrote:

> Adam
>
> In cases of ‘what happened’ it just hasnt happened.  The ideas are still
> good.
>
> I do agree with the ‘why kill anything’ mentality.  We can simply park
> these somewhere.  That said there is an ongoing maint cost we do have to
> rationalize.  Builds are longer, deps to maintain, etc..
>
> Also we need to prob pumps the breaks a bit on thoughs of removing or
> changing many things.  If we do we may well get to 2.0 but nobody could
> adopt it.  We have honestly a pretty massive deployment base and we cannot
> make it a pain for users. We got away with things at 0.x to 1.0 that we
> cannot get away with on 2.0
>
> Thanks
>
> On Fri, Jul 30, 2021 at 3:41 PM Adam Taft  wrote:
>
> > I'm not seeing the side thread that was going to discuss deprecation of
> > PostHTTP.  Has that thread started and I just don't see it?
> >
> > One (significant?) concern with PostHTTP is the smooth integration of
> > NiFi-to-NiFi communication that is very transparently enabled with the
> > ListenHTTP and PostHTTP processors. There's some special logic in there
> for
> > handling flowfiles that InvokeHTTP doesn't really (nor should really)
> have.
> >
> > I know of several (large) NiFi installations that rely on the PostHTTP /
> > ListenHTTP combination. It has enabled NiFi to NiFi communication for
> folks
> > reluctant or unable to enable site-to-site type configuration.
> >
> > Honestly, I don't know why we'd want to "deprecate" any processor, as
> > opposed to just moving it to a new location. If these processors can be
> > ported and maintained to whatever the 2.0 API looks like, there's
> possibly
> > little harm keeping them around.
> >
> > And by the way, what happened to the "marketplace" concept? Is this being
> > considered for 2.0 as well?  Because relocating the deprecated processors
> > to an external nar might be the best solution. Losing PostHTTP entirely I
> > think would be a mistake, but I'd conceptually support relocating it.
> >
> > Thanks,
> >
> > /Adam
> >
> > On Tue, Jul 27, 2021 at 2:11 PM Joe Witt  wrote:
> >
> > > Looks like we just need to knock out 5 JIRAs :) [1]
> > >
> > > I felt like we had a label folks were using at one point but quickly
> > > looking revealed nothing exciting.  After this confluence page
> > > stabilizes a bit we can probably knock out some JIRAs and such.
> > >
> > > [1] https://issues.apache.org/jira/projects/NIFI/versions/12339599
> > >
> > > On Tue, Jul 27, 2021 at 1:06 PM Otto Fowler 
> > > wrote:
> > > >
> > > >  I find myself wishing I had a list of all the jiras / issues that
> have
> > > > been put off for a 2.0 release because they required some change or
> > > another
> > > > :(
> > > >
> > > > From: Joe Witt  
> > > > Reply: dev@nifi.apache.org  <
> dev@nifi.apache.org>
> > > > Date: July 27, 2021 at 12:30:35
> > > > To: dev@nifi.apache.org  
> > > > Subject:  Re: [DISCUSS] NiFi 2.0 Release Goals
> > > >
> > > > A few though

Re: [DISCUSS] NiFi 2.0 Release Goals

2021-07-30 Thread Adam Taft

I'm not seeing the side thread that was going to discuss deprecation of
PostHTTP.  Has that thread started and I just don't see it?

One (significant?) concern with PostHTTP is the smooth integration of
NiFi-to-NiFi communication that is very transparently enabled with the
ListenHTTP and PostHTTP processors. There's some special logic in there for
handling flowfiles that InvokeHTTP doesn't really (nor should really) have.

I know of several (large) NiFi installations that rely on the PostHTTP /
ListenHTTP combination. It has enabled NiFi to NiFi communication for folks
reluctant or unable to enable site-to-site type configuration.

Honestly, I don't know why we'd want to "deprecate" any processor, as
opposed to just moving it to a new location. If these processors can be
ported and maintained to whatever the 2.0 API looks like, there's possibly
little harm keeping them around.

And by the way, what happened to the "marketplace" concept? Is this being
considered for 2.0 as well?  Because relocating the deprecated processors
to an external nar might be the best solution. Losing PostHTTP entirely I
think would be a mistake, but I'd conceptually support relocating it.

Thanks,

/Adam

On Tue, Jul 27, 2021 at 2:11 PM Joe Witt  wrote:

> Looks like we just need to knock out 5 JIRAs :) [1]
>
> I felt like we had a label folks were using at one point but quickly
> looking revealed nothing exciting.  After this confluence page
> stabilizes a bit we can probably knock out some JIRAs and such.
>
> [1] https://issues.apache.org/jira/projects/NIFI/versions/12339599
>
> On Tue, Jul 27, 2021 at 1:06 PM Otto Fowler 
> wrote:
> >
> >  I find myself wishing I had a list of all the jiras / issues that have
> > been put off for a 2.0 release because they required some change or
> another
> > :(
> >
> > From: Joe Witt  
> > Reply: dev@nifi.apache.org  
> > Date: July 27, 2021 at 12:30:35
> > To: dev@nifi.apache.org  
> > Subject:  Re: [DISCUSS] NiFi 2.0 Release Goals
> >
> > A few thoughts:
> >
> > 1. I would love to see deprecation notices show up in the UI in
> > various ways to help motivate users to move off things to more
> > supportable things. That is not a prerequisite for anything happening
> > however. Just a good feature/nice thing to do for users when someone
> > is able to tackle it.
> >
> > 2. The decision to deprecate something and to further remove it need
> > not mean there is a superior solution available. If that thing itself
> > isn't getting the love/attention it needs to be
> > maintained/supported/healthy going forward that alone is enough to
> > remove it. That might well be the case with PostHTTP [1] and for
> > comparison you can see how much effort has gone into InvokeHTTP [2].
> >
> > 3. When discussing a 2.0 release each thing we add as a 'must do' the
> > further away from reality such a release will become. We'll have to
> > get very specific about 'musts' vs 'wants'.
> >
> > [1]
> >
> https://github.com/apache/nifi/commits/11e9ff377333784974fa55f41483c4281d80da50/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PostHTTP.java
> > [2]
> >
> https://github.com/apache/nifi/commits/cc554a6b110dfa45767bcb13d834ea4265d6dfe6/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/InvokeHTTP.java
> >
> > On Tue, Jul 27, 2021 at 8:47 AM David Handermann
> >  wrote:
> > >
> > > Thanks Mark, providing a template or comparison statistics with Java
> > > versions and component configuration details would be very helpful. If
> it
> > > is possible to run tests using a public API or deployable service, that
> > > would also help confirm potential differences.
> > >
> > > Showing a deprecation notice in the UI could be helpful, perhaps as a
> > > configurable option. NIFI-8650 describes a general Flow Analysis
> > > capability, and it seems like that might be one possible way to surface
> > > deprecation warnings. For something more specific to component
> > deprecation,
> > > it seems important to find a balance between making it obvious and
> making
> > > it something that ends up getting ignored.
> > >
> > > Regards,
> > > David Handermann
> > >
> > > On Tue, Jul 27, 2021 at 10:28 AM Mark Bean 
> wrote:
> > >
> > > > I'll start a new thread for PostHTTP when I get a template and/or
> > detailed
> > > > stats.
> > > >
> > > > I know the deprecation is noted in the documentation. That's a
> > necessary
> > > > and minimum level of notification. I was suggesting it be more
> obvious
> > in
> > > > the UI. I think it would be beneficial to somehow be aware of the
> > > > deprecation status simply by looking at the flow (perhaps even on the
> > > > summary pages too), and without having to open the documentation for
> > every
> > > > processor to confirm whether or not a component is marked as
> > deprecated.
> > > >
> > > > Thanks,
> > > > Mark
> > > >
> > > >
> > > > On Tue, Jul 27, 2021 at

Re: invokeHttp routing of exceptions like ConnectException and IOException to failure instead of retry

2019-11-01 Thread Adam Taft

Hi David,

*> "What is the reasoning for routing them to failure instead of retry?"*

Good question ... HTTP status codes give good hints as to what a client
should do for retry/no-retry operations.  Generally 400 error codes do not
get retried, 500 codes get retried, etc.  It doesn't, however, give any
indication what a client should do in case of not connecting or having host
lookup problems down at the TCP level.

The "failure" relationship in NiFi is somewhat a common "catch all"
relationship, with a significant number of processors having both a
"success" and "failure" relationship pair.  InvokeHTTP uses that precedent
to capture TCP oriented failures,  and additionally provides relationships
when the http protocol can provide more context.

In short, the "failure" relationship captures TCP related problems.  The
"retry" / "no-retry" relationships capture HTTP related problems.

There's really no ability to tell, at the TCP level, whether a host will
come back online in the future or not.  HTTP 5xx service codes give a
pretty good hint that the request can be retried again in the future, but
TCP ConnectException or UnknownHostException don't really give any
indication for that.

On your other comment, "Yield vs. Penalize" ...

The "yield" function in NiFi is a mechanism that is used in the context of
the Processor.  It's basically a way from a processor to evaluate whether
there is any "work to do" and signal the framework that it can relinquish
its resources.  If a FlowFile is queued above a Processor, somewhat by
definition, the Processor indeed has work to do and therefore shouldn't
yield.  The yield function is applied in the context of the Processor
itself.

Whereas, the "penalize" function in NiFi is oriented to a FlowFile itself.
The Processor might notice a problem with a FlowFile as it is working on
it.  The processor can then apply a penalty to the FlowFile, which is
effectively a signal back to the downstream queues.  A NiFi Queue that
handles a FlowFile which has been penalized will effectively "hide" that
FlowFile until the penalty duration has expired, regardless of where that
flowfile is being routed.

So as a developer creating a custom processor, deciding when to "yield" is
a function of determining if Processor has work to do.  Whereas deciding
when to "penalize" is a function of determining if there was a problem with
the FlowFile being processed.

Now, InvokeHTTP is a complicated beast.  So it's having to make
determinations as to whether it should yield based on whether it's
considering itself a "source processor".  Because of its complexity, you
are really seeing multiple design patterns being played out inside the
code.  But fundamentally, the InvokeHTTP processor shouldn't be making a
decision to "yield" based on a FlowFile that had previously failed to
connect.

Because of the way InvokeHTTP is designed, it's not necessarily configured
to just connect with one host.  The URL parameter can be read in from
flowfile attributes (via expression language) allowing it to potentially
make requests to any number of hosts. So we can't universally predict when
to yield, penalize, retry or fail.

Maybe there's some room for improvement.  But I hope that gives some of the
background that you were asking for.

On Thu, Oct 31, 2019 at 12:48 PM David Caldwell 
wrote:

> Hi,
> While testing invokeHttp retry logic when the destination endpoint is
> offline, I learned that invokeHttp processor routes exceptions caused by
> the offline endpoint to the failure relationship instead of the retry
> relationship.
> That surprised me since those types of errors are exactly what I would
> normally like to retry.  What is the the reasoning for routing them to
> failure instead of retry?
> Furthermore, when I routed the failure relationship back into invokeHttp
> to retry, I then found that nifi cpu usage stays around 150-160% until the
> remote endpoint comes back online.
> Additional digging showed that invokeHttp penalizes retry, no_retry &
> failure scenarios, but yields only for retry and no_retry.  IOW, the
> failure scenario doesn't yield.
>
> I don't really have a problem with failure not yielding.  That's just what
> I suspect may be causing the excessive cpu utilization.  My real problem is
> that I think recoverable communications exceptions should be routed to
> retry instead of failure.  Not only would that avoid developer surprise,
> but it would include yield which I hope would prevent the high cpu
> utilization.
> Reading the topic
> https://nifi.apache.org/docs/nifi-docs/components/nifi-docs/html/developer-guide.html#penalization-vs-yielding,
> the following points reinforced my thinking:
>- Yield when processor won't be able to perform any useful function for
> some period of time
>
>- This tells framework, don't waste resources triggering the processor
> to run, because there's nothing it can do for a while
>- The topic actually uses a processor communicating with

Re: Java 11 Compilation

2019-10-30 Thread Adam Taft

Right, I agree with your perspective.

Just note, however, that this stance will require the RM to create the 1.x
convenience binary with Java 8 only.  It will be incumbent on the RM to
ensure they build with Java 8, because if they accidentally build with Java
11, the binary distribution won't run on Java 8.

Adam


On Wed, Oct 30, 2019 at 11:56 AM Joe Witt  wrote:

> Adam
>
> Interesting.  Id say though that where we are now for nifi 1.x is perfect.
> That matrix you shared as currently working seems ideal.
>
> For nifi 2.x we cut bait on Java 8 and go with latest stable Java at that
> time (11, 13)
>
> thanks
>
> On Wed, Oct 30, 2019 at 12:51 PM Adam Taft  wrote:
>
> > While building 1.10.0-rc3, I wanted to experiment with the compilation
> and
> > runtime variants using Java 8 and Java 11.  The summary of this
> experiment
> > was:
> >
> > Comp: Java 8   Run: Java 8  =>  SUCCESS
> > Comp: Java 8   Run: Java 11 =>  SUCCESS
> > Comp: Java 11  Run: Java 8  =>  FAILURE
> > Comp: Java 11  Run: Java 11 =>  SUCCESS
> >
> > As introduced in JEP-247 [1], starting with Java 9, javac has the ability
> > to compile to an older Java platform.  This was not possible previously
> > without having multiple JDKs installed and specifying the
> '-bootclasspath'
> > option in javac for your target.
> >
> > The newly introduced "--release" option to Java 9+ javac allows you to
> > specify the target version using the documented API for that platform.
> > This means replacing the '-source' and '-target' parameters with the
> > '--release' option instead.
> >
> > In maven, that manifests itself as a configuration option to the
> > maven-compiler-plugin, which is the 'maven.compiler.release' property.
> [2]
> >
> > The discussion here would be consideration for using the "release" option
> > as opposed to our current setup which uses "source" and "target".  The
> > benefit would be that all the deployment scenarios (above) could result
> in
> > success.
> >
> > The downside though is that, without other changes, Java 11 would be
> > required to compile NiFi.  It would still target Java 8 as a supported
> > runtime (specifying "release=8"), but you'd have to build/compile with
> Java
> > 11.  I don't think this should be a problem, but it's worth discussion.
> >
> > Thanks,
> > Adam
> >
> > [1]  https://openjdk.java.net/jeps/247
> > [2]
> >
> >
> https://maven.apache.org/plugins/maven-compiler-plugin/compile-mojo.html#release
> >
>

Java 11 Compilation

2019-10-30 Thread Adam Taft

While building 1.10.0-rc3, I wanted to experiment with the compilation and
runtime variants using Java 8 and Java 11.  The summary of this experiment
was:

Comp: Java 8   Run: Java 8  =>  SUCCESS
Comp: Java 8   Run: Java 11 =>  SUCCESS
Comp: Java 11  Run: Java 8  =>  FAILURE
Comp: Java 11  Run: Java 11 =>  SUCCESS

As introduced in JEP-247 [1], starting with Java 9, javac has the ability
to compile to an older Java platform.  This was not possible previously
without having multiple JDKs installed and specifying the '-bootclasspath'
option in javac for your target.

The newly introduced "--release" option to Java 9+ javac allows you to
specify the target version using the documented API for that platform.
This means replacing the '-source' and '-target' parameters with the
'--release' option instead.

In maven, that manifests itself as a configuration option to the
maven-compiler-plugin, which is the 'maven.compiler.release' property. [2]

The discussion here would be consideration for using the "release" option
as opposed to our current setup which uses "source" and "target".  The
benefit would be that all the deployment scenarios (above) could result in
success.

The downside though is that, without other changes, Java 11 would be
required to compile NiFi.  It would still target Java 8 as a supported
runtime (specifying "release=8"), but you'd have to build/compile with Java
11.  I don't think this should be a problem, but it's worth discussion.

Thanks,
Adam

[1]  https://openjdk.java.net/jeps/247
[2]
https://maven.apache.org/plugins/maven-compiler-plugin/compile-mojo.html#release

Re: [VOTE] Release Apache NiFi 1.10.0 (rc3)

2019-10-30 Thread Adam Taft

+1 (binding)

Signatures verified.
Hashes verified.
Tests pass, source builds cleanly.
I used both Java 11 & Java 8 to build.

I did run into a problem compiling with Java 11 and running with Java 8.  I
don't believe this was a goal of the Java 11 compatibility changes, so
nothing unexpected about this. But it's possibly worth discussion in
another thread (which I'll send out).  The convenience binary was compiled
with Java 8, so no problems with compatibility either way.

On Tue, Oct 29, 2019 at 11:32 AM Joe Witt  wrote:

> Hello,
>
> I am pleased to be calling this vote for the source release of Apache NiFi
> nifi-1.10.0.
>
> As they say 'third time's a charm'.
>
> The source zip, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachenifi-1151
>
> The source being voted upon and the convenience binaries can be found at:
> https://dist.apache.org/repos/dist/dev/nifi/nifi-1.10.0/
>
> The Git tag is nifi-1.10.0-RC3
> The Git commit ID is b217ae20ad6a04cac874b2b00d93b7f7514c0b88
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=commit;h=b217ae20ad6a04cac874b2b00d93b7f7514c0b88
>
> Checksums of nifi-1.10.0-source-release.zip:
> SHA256: e9b0a14b3029acd69c6693781b6b6487c14dda12676db8b4a015bce23b1029c1
> SHA512:
>
> b07258cbc21d2e529a1aa3098449917e2d059e6b45ffcfcb6df094931cf16caa8970576555164d3f2290cfe064b5780ba1a8bf63dad04d20100ed559a1cfe133
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/joewitt.asc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
> 384 issues were closed/resolved for this release:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12344993
>
> Release note highlights can be found here:
>
> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.10.0
>
> The vote will be open for 72 hours.
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build
> from source, and test. Then please vote:
>
> [ ] +1 Release this package as nifi-1.10.0
> [ ] +0 no opinion
> [ ] -1 Do not release this package because...
>

Re: PULL ProvenanceEvent

2019-10-28 Thread Adam Taft

> But a flowfile that was PULLed by the second nifi (from the first nifi)
will not necessarily have any provenance event generated by the first nifi.

Isn't this the fault of the first NiFi to fail to emit a SEND event in
response to the second NiFi's request?  In this scenario, shouldn't the
send/receive pair be:
NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?

What you describe is an odd use case for NiFi.  NiFi is usually not in the
business of acting as a file server daemon in order to "send" flowfiles to
other systems.  As you mention, HandleHttpResponse may be a lone wolf
example processor which generates a SEND event whose input originates from
a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
events because they are receiving bytes, not generating them.

Are there other processors in question? Something custom? Or is this
related to site-to-site transfers?

I still kind of question the motive of a provenance event pair that is
trying to establish "who called who first".  Honestly just trying to
understand the use case where a matching SEND/RECEIVE pair doesn't give you
what you need.

The only thing I could see would be a processor that asks for data, but
then doesn't receive it due to some error condition.  In this case, adding
some sort of ERROR event might be useful.  "I attempted to retrieve data
from ${uri}, but the transfer failed because of ${error condition}".  That
way, GetXYZ processors could report an error to provenance instead of as a
bulletin.

If the problem is related to a processor or the framework itself not
generating an event, can we just fix that function to emit SEND in the
scenario that you describe?  Changing the provenance model itself (beyond
possibly adding an ERROR event) feels like it would be the last scenario to
consider.

Thanks,
Adam

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191

On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman 
wrote:

>  Adam,
> I believe there is a need for more detailed ProvenanceEvents.
> A use case would be a customer that is trying to track data passed between
> two nifi's and trying to match up SENDs and RECEIVEs
>
> So a flowfile that has a SEND event on the first nifi should have a
> RECEIVE event on the second nifi.
> But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> (I realize that FETCH is already a "reserved word" in the current
> ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> There is another Provenance Event, ACKNOWLEDGE, which would also fit
> occasionally to this model as well (an example would be HandleHttpResponse
> processor which could send this instead of SEND when responding to a HTTP
> request)
> This being said, you make an excellent point when you said
> "However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution."
> Thanks,
> Nissim
>
> On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
>  wrote:
>
>   Adam,
> "Yes" to your first question and the four processor examples you listed.
>
> I will need to get back to you regarding your other points.
>
> Thanks,
> Nissim
>
> On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> a...@adamtaft.com> wrote:
>
>  Nissim,
>
> Just to be clear, you are trying to distinguish between processors which
> are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
> data (ListenXYZ)?  Is that your basic vision?
>
> GetFile => PULL
> GetHTTP => PULL
> ListenHTTP => RECEIVE
> ListenTCP => RECEIVE
>
> Could you clarify what advantages this would have in terms of data
> provenance?  What would you use this new event type for specifically?  What
> are you missing now? Do you have a use case that needs this, or are you
> just generally trying to round out the provenance event types for sake of
> completeness?  I honestly don't know a use case where you care whether you
> polled for the data or listened for it.  The provenance model today just
> cares that you received the data, not so much how you received it.
>
> You're right that this proposal will affect many processors and the
> internal visualization tools, etc.  However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution.  For example, any
> third-party/custom ReportingTask that handles provenance d

Re: Release vote helper docker setups

2019-10-24 Thread Adam Taft

In general I like this idea. I'd like to even suggest a possibly broader
vision that aims towards a more stable build environment that would be the
"reference" environment for building NiFi.

I have been kicking around and looking at a Docker based build environment
for NiFi.  The idea is that you have a well defined build image that can
take any of the NiFi source releases and execute the entire maven
test/build cycle on it.

I initially thought this would be an easy task, just whip up a new
Dockerfile with the necessary build environment (I based my on Centos,
adding in the java and maven requirements).  Surprisingly though, this
turned out to be a difficult task.  I experienced all sorts of test errors,
somewhat randomly and seemingly uncorrelated.

So for now, I have put the idea to the side because I haven't been able to
isolate whether the problems are with the build environment (likely) vs.
the problems being with NiFi source itself.  But I would definitely be
interested in contributing and kicking this concept again, if there was
some interest in collaboration.

Thanks,
Adam

On Thu, Oct 24, 2019 at 7:25 AM Mike Thomsen  wrote:

> Any thoughts on building a few Docker Compose configurations to help people
> test out a new release at vote time? I have a few that could be contributed
> to the repository with a little cleanup.
>
> Mike
>

Re: PULL ProvenanceEvent

2019-10-10 Thread Adam Taft

Nissim,

Just to be clear, you are trying to distinguish between processors which
are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
data (ListenXYZ)?  Is that your basic vision?

GetFile => PULL
GetHTTP => PULL
ListenHTTP => RECEIVE
ListenTCP => RECEIVE

Could you clarify what advantages this would have in terms of data
provenance?  What would you use this new event type for specifically?  What
are you missing now? Do you have a use case that needs this, or are you
just generally trying to round out the provenance event types for sake of
completeness?  I honestly don't know a use case where you care whether you
polled for the data or listened for it.  The provenance model today just
cares that you received the data, not so much how you received it.

You're right that this proposal will affect many processors and the
internal visualization tools, etc.  However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution.  For example, any
third-party/custom ReportingTask that handles provenance data would need to
be updated with this change.  There's probably need for a strong vision to
help demonstrate the value for this vs. the cost of the cascading effects
related to this change.

Thanks,
Adam

On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman 
wrote:

> Hello Team,
>
> The ProvenanceEventType class does a good job capturing possible events,
> but the PULL event doesn't seem to fall nicely into any of the existing
> types.
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> active action of a PULL
>
> And... maybe it would fall into FETCH, but FETCH is more focused on
> contents of an existing flow file being overwritten.
>
> What does the community think about a new PULL event type,
> or
>  using FETCH for PULL, and having what FETCH does now be a new event such
> as REUSE
>
> NOTE: a new PULL event would have a cascading effect of many processors
> that currently are emitting RECEIVE's being modified to be PULL
> (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> would more accurately capture the event.
>
> Thanks,
> Nissim Shiman
>
>

Re: Maven Build Error - nifi-properties-loader sub-project test failures

2019-10-10 Thread Adam Taft

Yeah, I've been suspicious that maybe Maven 3.3.9 is too old to build
NiFi.  We say[1] that Maven 3.1.0+ is required, but that's a really old
version.  It's possible or even likely that some Maven plugins used in the
NiFi build are expecting services from newer Maven versions.

I don't have a suggestion here that doesn't involve someone's time. It
would be ideal we could narrow down and update the required minimum Java &
Maven configuration, so as to update the Quickstart guide that Joe
originally linked[1].  Is this JIRA ticket worthy? I'll file one if so.

Adam

[1] https://nifi.apache.org/quickstart.html


On Thu, Oct 10, 2019 at 7:57 AM Aram Openden  wrote:

> Adam,
>
> Thanks so much for your excellent suggestion.
>
> Probably should have checked my versions 1st (honestly didn't even think of
> it yesterday).
>
> Upgrading my Java-JDK install version to *1.8.0_222* and Maven to *3.6.2*
> did indeed *fix my build issue*!
>
> Aram S. Openden
> aram.open...@gmail.com
>
>
>
> On Thu, Oct 10, 2019 at 1:10 AM Adam Taft  wrote:
>
> > Aram,
> >
> > Just to rule out the obvious ...  Can you update your Maven and Java
> > versions, which would include:
> > - Maven 3.6.2
> > - Java 1.8.0_222
> >
> > Also, are you including a MAVEN_OPTS environment to increase your JVM
> > memory in Maven?
> >
> > $> export MAVEN_OPTS="-Xms1g -Xmx3g"
> >
> > Thanks,
> > Adam
> >
> > On Wed, Oct 9, 2019 at 1:31 PM Aram Openden 
> > wrote:
> >
> > > Hoping someone on this dev mailing list can help with the following
> maven
> > > build failures issue. I am hoping to contribute a new suggested custom
> S3
> > > Processor that I have been working on.
> > >
> > > But, I need to be able to get the build to work locally before I even
> > start
> > > adding in my changes.
> > >
> > > I am trying to run the main NiFi project build with maven locally on
> the
> > > master branch *without having made any local code changes*, with the
> > latest
> > > updates (master is at commit 9a496fe9d NIFI-6751: - Fixing the
> identifier
> > > on the user table(in other words this is the latest code on
> master):
> > >
> > > $ git branch -v
> > > * master 9a496fe9d NIFI-6751: - Fixing the identifier on the user
> table.
> > In
> > > a previous task, this was changed to utilize the URI but that does not
> > work
> > > with other code interacting with this table.
> > >
> > > $ git status
> > > On branch master
> > > Your branch is up-to-date with 'origin/master'.
> > > nothing to commit, working tree clean
> > >
> > >
> > > My local maven env is as follows (running on Mac OS Mojave):
> > >
> > > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> > > 2015-11-10T11:41:47-05:00)
> > > Maven home: /usr/local/Cellar/maven/3.3.9/libexec
> > > Java version: 1.8.0_111, vendor: Oracle Corporation
> > > Java home:
> > > /Library/Java/JavaVirtualMachines/jdk1.8.0_111.jdk/Contents/Home/jre
> > > Default locale: en_US, platform encoding: UTF-8
> > > OS name: "mac os x", version: "10.14.6", arch: "x86_64", family: "mac"
> > >
> > >
> > > Looking for any help you can provide on what I should be doing to get
> the
> > > maven build to pass locally.
> > > I am getting test failures that look like this:
> > >
> > > INFO]
> > >
> 
> > > [INFO] Building nifi-properties-loader 1.10.0-SNAPSHOT
> > > [INFO]
> > >
> 
> > > [INFO]
> > > [INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @
> > > nifi-properties-loader ---
> > > [INFO] Deleting
> > >
> > >
> >
> /Users/aramo.penden/workspaces/WFfH/data-gov-beta/nifi/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-properties-loader/target
> > > [INFO]
> > > [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version)
> @
> > > nifi-properties-loader ---
> > > [INFO]
> > > [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven) @
> > > nifi-properties-loader ---
> > > [INFO]
> > > [INFO] --- buildnumber-maven-plugin:1.4:create (default) @
> > > nifi-properties-loader ---
> > > [INFO]
> > > [INFO] --- maven-remote-resources-pl

Re: Maven Build Error - nifi-properties-loader sub-project test failures

2019-10-09 Thread Adam Taft

Joe,

Are you referring to git config options "core.longpaths" and "core.autocrlf"?
I wouldn't have thought these settings would be important for Mac users (as
is Aram)?  Honestly just asking what the experience is here.

I can build from master on my Mac with latest Java and Maven, without those
git settings.  So that's why I suggested the updates.  But I don't have the
experience to really know beyond just the guess.

Adam



On Wed, Oct 9, 2019 at 11:16 PM Joe Witt  wrote:

> Adam, Aram,
>
> I'd be suspicious of your git settings relative to what we suggest here:
> http://nifi.apache.org/quickstart.html
>
> It seems like it is reading material from files (test files) and they don't
> contain what is expected so I wonder about git settings.
>
> Thanks
> Joe
>
> On Thu, Oct 10, 2019 at 1:10 AM Adam Taft  wrote:
>
> > Aram,
> >
> > Just to rule out the obvious ...  Can you update your Maven and Java
> > versions, which would include:
> > - Maven 3.6.2
> > - Java 1.8.0_222
> >
> > Also, are you including a MAVEN_OPTS environment to increase your JVM
> > memory in Maven?
> >
> > $> export MAVEN_OPTS="-Xms1g -Xmx3g"
> >
> > Thanks,
> > Adam
> >
> > On Wed, Oct 9, 2019 at 1:31 PM Aram Openden 
> > wrote:
> >
> > > Hoping someone on this dev mailing list can help with the following
> maven
> > > build failures issue. I am hoping to contribute a new suggested custom
> S3
> > > Processor that I have been working on.
> > >
> > > But, I need to be able to get the build to work locally before I even
> > start
> > > adding in my changes.
> > >
> > > I am trying to run the main NiFi project build with maven locally on
> the
> > > master branch *without having made any local code changes*, with the
> > latest
> > > updates (master is at commit 9a496fe9d NIFI-6751: - Fixing the
> identifier
> > > on the user table(in other words this is the latest code on
> master):
> > >
> > > $ git branch -v
> > > * master 9a496fe9d NIFI-6751: - Fixing the identifier on the user
> table.
> > In
> > > a previous task, this was changed to utilize the URI but that does not
> > work
> > > with other code interacting with this table.
> > >
> > > $ git status
> > > On branch master
> > > Your branch is up-to-date with 'origin/master'.
> > > nothing to commit, working tree clean
> > >
> > >
> > > My local maven env is as follows (running on Mac OS Mojave):
> > >
> > > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> > > 2015-11-10T11:41:47-05:00)
> > > Maven home: /usr/local/Cellar/maven/3.3.9/libexec
> > > Java version: 1.8.0_111, vendor: Oracle Corporation
> > > Java home:
> > > /Library/Java/JavaVirtualMachines/jdk1.8.0_111.jdk/Contents/Home/jre
> > > Default locale: en_US, platform encoding: UTF-8
> > > OS name: "mac os x", version: "10.14.6", arch: "x86_64", family: "mac"
> > >
> > >
> > > Looking for any help you can provide on what I should be doing to get
> the
> > > maven build to pass locally.
> > > I am getting test failures that look like this:
> > >
> > > INFO]
> > >
> 
> > > [INFO] Building nifi-properties-loader 1.10.0-SNAPSHOT
> > > [INFO]
> > >
> 
> > > [INFO]
> > > [INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @
> > > nifi-properties-loader ---
> > > [INFO] Deleting
> > >
> > >
> >
> /Users/aramo.penden/workspaces/WFfH/data-gov-beta/nifi/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-properties-loader/target
> > > [INFO]
> > > [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version)
> @
> > > nifi-properties-loader ---
> > > [INFO]
> > > [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven) @
> > > nifi-properties-loader ---
> > > [INFO]
> > > [INFO] --- buildnumber-maven-plugin:1.4:create (default) @
> > > nifi-properties-loader ---
> > > [INFO]
> > > [INFO] --- maven-remote-resources-plugin:1.5:process
> > > (process-resource-bundles) @ nifi-properties-loader ---
> > > [INFO]
> > > [INFO] --- maven-resources-plugin:3.1.0:resources (default-resources) @
> > > nifi-

Re: Maven Build Error - nifi-properties-loader sub-project test failures

2019-10-09 Thread Adam Taft

Aram,

Just to rule out the obvious ...  Can you update your Maven and Java
versions, which would include:
- Maven 3.6.2
- Java 1.8.0_222

Also, are you including a MAVEN_OPTS environment to increase your JVM
memory in Maven?

$> export MAVEN_OPTS="-Xms1g -Xmx3g"

Thanks,
Adam

On Wed, Oct 9, 2019 at 1:31 PM Aram Openden  wrote:

> Hoping someone on this dev mailing list can help with the following maven
> build failures issue. I am hoping to contribute a new suggested custom S3
> Processor that I have been working on.
>
> But, I need to be able to get the build to work locally before I even start
> adding in my changes.
>
> I am trying to run the main NiFi project build with maven locally on the
> master branch *without having made any local code changes*, with the latest
> updates (master is at commit 9a496fe9d NIFI-6751: - Fixing the identifier
> on the user table(in other words this is the latest code on master):
>
> $ git branch -v
> * master 9a496fe9d NIFI-6751: - Fixing the identifier on the user table. In
> a previous task, this was changed to utilize the URI but that does not work
> with other code interacting with this table.
>
> $ git status
> On branch master
> Your branch is up-to-date with 'origin/master'.
> nothing to commit, working tree clean
>
>
> My local maven env is as follows (running on Mac OS Mojave):
>
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
> 2015-11-10T11:41:47-05:00)
> Maven home: /usr/local/Cellar/maven/3.3.9/libexec
> Java version: 1.8.0_111, vendor: Oracle Corporation
> Java home:
> /Library/Java/JavaVirtualMachines/jdk1.8.0_111.jdk/Contents/Home/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.14.6", arch: "x86_64", family: "mac"
>
>
> Looking for any help you can provide on what I should be doing to get the
> maven build to pass locally.
> I am getting test failures that look like this:
>
> INFO]
> 
> [INFO] Building nifi-properties-loader 1.10.0-SNAPSHOT
> [INFO]
> 
> [INFO]
> [INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @
> nifi-properties-loader ---
> [INFO] Deleting
>
> /Users/aramo.penden/workspaces/WFfH/data-gov-beta/nifi/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-properties-loader/target
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @
> nifi-properties-loader ---
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven) @
> nifi-properties-loader ---
> [INFO]
> [INFO] --- buildnumber-maven-plugin:1.4:create (default) @
> nifi-properties-loader ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process
> (process-resource-bundles) @ nifi-properties-loader ---
> [INFO]
> [INFO] --- maven-resources-plugin:3.1.0:resources (default-resources) @
> nifi-properties-loader ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory
>
> /Users/aramo.penden/workspaces/WFfH/data-gov-beta/nifi/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-properties-loader/src/main/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @
> nifi-properties-loader ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Compiling 9 source files to
>
> /Users/aramo.penden/workspaces/WFfH/data-gov-beta/nifi/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-properties-loader/target/classes
> [INFO]
> [INFO] --- build-helper-maven-plugin:1.5:add-test-source (add-test-source)
> @ nifi-properties-loader ---
> [INFO] Test Source directory:
>
> /Users/aramo.penden/workspaces/WFfH/data-gov-beta/nifi/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-properties-loader/src/test/groovy
> added.
> [INFO]
> [INFO] --- maven-resources-plugin:3.1.0:testResources
> (default-testResources) @ nifi-properties-loader ---
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] Copying 24 resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @
> nifi-properties-loader ---
> [INFO] Changes detected - recompiling the module!
> [INFO]
> [INFO] --- maven-compiler-plugin:3.8.1:testCompile (groovy-tests) @
> nifi-properties-loader ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Using Groovy-Eclipse compiler to compile both Java and Groovy files
> [INFO] Compiling in a forked process using
>
> /Users/aramo.penden/.m2/repository/org/codehaus/groovy/groovy-eclipse-batch/2.5.4-01/groovy-eclipse-batch-2.5.4-01.jar
> [INFO]
> [INFO] --- maven-surefire-plugin:2.22.0:test (default-test) @
> nifi-properties-loader ---
> [INFO]
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft

Andy - fair points. Note that by definition, the process you describe is
harder (requires more maneuvers).  Maybe it's warranted/justified for the
desired integrity that you are after, but it's most definitely a total sum
of work that is greater.

Your registry example is really good.  In your example, you are proposing a
change to the framework and commons repositories before a change to the
registry can be finalized.  You'd need the changes to framework and commons
to "land" and become released before the final change to the registry was
committed.  You'd end up with a small release queued up for the framework
(whose release cycle is mostly infrequent) and you wouldn't be able to
finish the work on the registry changes until that new function was
releasable.  The ability to mark that JIRA ticket as "closed" is delayed
because you are waiting for releases from dependent components.

Of course, you can build/test against -SNAPSHOT versions in each of those
repositories (which is what Bryan was getting to).  But the registry
feature itself can't be totally finalized and is waiting on the release
cycle of the slowest of the components.  There are definitely tradeoffs
with this direction.


On Fri, Jul 12, 2019 at 12:42 PM Andy LoPresto  wrote:

> I think by definition, a contribution _must_ fit into a single repository.
> This will force developers to carefully consider the boundaries between
> modules and build clean abstractions. If you are a new contributor, I would
> be surprised if you are making a single (logical) contribution that would
> span multiple repositories on the first go. I think enforcing clear
> divisions is good for both new and experienced contributors. I also think a
> change that requires contributions to multiple repositories should be
> subdivided into atomic tasks.
>
> For example, if someone wants to contribute a new feature to nifi-registry
> which also requires changes to nifi-commons for the security piece and adds
> new behavior to the nifi-framework component to consume new changes from
> Registry, in my mind those are actually 3 atomic changes which, while
> related and interdependent, can all be contributed as standalone code to
> their respective repositories in an ordered fashion. I would prefer this
> over one large commit to a single repository which influences behavior in
> all three modules and requires one or more reviewers with comprehensive
> knowledge over all aspects of the project.
>
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Jul 12, 2019, at 10:49 AM, Adam Taft  wrote:
> >
> > Bryan,
> >
> > I think both of your points are actually closely related, and they
> somewhat
> > speak to my thoughts/concerns about splitting the repository.
> >
> > I would argue that one PR that affects multiple modules in a single
> > repository is _easier_ to review than multiple PRs that affect single
> > modules.  In the split repository model, if a change affects several
> > repositories, individual PRs would be issued against each repository.  A
> > reviewer would not as easily see the context of the changes and may even
> > consider them out of order.
> >
> > In the single repository model, a PR is atomic. There is no race
> condition,
> > ordering or loss of context across multiple repositories.
> >
> > This is the concern I was making for new contributors.  If your
> > contribution doesn't fit neatly into a single repository, then it's quite
> > the tough process to communicate and deal with changes. It will
> discourage
> > new folks from being involved, because the contribution barrier is
> raised.
> >
> > It's ideal that changesets are atomic, but you definitely lose this
> > property in a multi-repo scenario.  Imagine rolling back a change, for
> > example, that spans multiple repositories.
> >
> > Adam
> >
> > On Fri, Jul 12, 2019 at 11:27 AM Bryan Bende  wrote:
> >
> >> Two other points to throw out there...
> >>
> >> 1) I think something to consider is how the management of pull
> >> requests would be impacted, since that is the main form of
> >> contribution.
> >>
> >> Separate repos forces pull requests to stay scoped to a given module,
> >> making for more straight forward reviews. It also makes it easier to
> >> look at a repo and see what work/contributions are still open,
> >> although I suppose all the PRs in the nifi repo could be labeled by
> >> module and then filtered, but it seems a little more tedious. Just
> >> something to think about.
> >>

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft

t; definitely something NiFi needs, and I think it will be simplest to
> > get there with a multi-repo approach.
> >
> > That said, I agree that the *biggest* win comes from splitting
> > projects, and that splitting repos is a smaller step. I don't feel
> > strongly about it and could live with a single repo with multiple
> > projects (though, for what it's worth, the NiFi umbrella already has
> > several repositories and I personally don't feel it has been
> > burdensome).
> >
> > And I agree - let's not start splitting JIRA projects. Let's use
> > components or labels or something to differentiate issues under the
> > existing NIFI Jira project.
> >
> >
> > Edward,
> >
> > Thanks. I totally agree and I know others who feel the same way.
> > Better defined boundaries and loosely coupled modules is 100% a
> > long-term goal. I think this project restructuring won't solve the
> > problem completely (in fact, to your point, it may uncover some
> > unfortunate tight-coupling that needs to be reworked on the current
> > master before the split can happen), but I do think it will encourage
> > developers to more faithfully build to APIs and avoid leaky
> > abstractions as there will be more hard division points in the code
> > base. Some of those issues might be able to be addressed immediately.
> > Others might have to wait for a major version change.
> >
> > Thanks,
> > Kevin
> >
> > On Fri, Jul 12, 2019 at 1:04 PM Adam Taft  wrote:
> > >
> > > To be honest and to your point Joe, the thing that optimizes the RM
> duties
> > > should probably be preferred in all of this.  There is so much
> overhead for
> > > the release manager, that lubricating the RM process probably trumps a
> lot
> > > of my concerns.  I think there's real concern for making the project
> harder
> > > for new contributors. But likewise, that concern should be balanced
> with
> > > making the project harder for longtime contributors who have pulled the
> > > cart the most.
> > >
> > > I was just at least hoping for a discussion on the concept.  Thanks as
> > > always for your leadership and contributions to the nifi community.
> > >
> > > On Fri, Jul 12, 2019 at 10:48 AM Joe Witt  wrote:
> > >
> > > > Ah I agree the JIRA thing would be too heavy handed.  A single JIRA
> with
> > > > well defined components tied to 'repos' is good.
> > > >
> > > > As far as separate code repos we're talking about different
> releasable
> > > > artifacts for which we as a PMC are responsible for the
> meaning/etc..  As a
> > > > many time RM I definitely dislike the mono repo construct as I
> understand
> > > > it to function.  I prefer repos per source release artifact where all
> > > > source in that artifact is a function of the release. I am ok with
> > > > different convenience binaries resulting from a single source release
> > > > artifact though.
> > > >
> > > > Thanks
> > > >
> > > > On Fri, Jul 12, 2019 at 12:26 PM Adam Taft 
> wrote:
> > > >
> > > > > I think the concerns around user management are valid, are they
> not?
> > > > > Overhead in JIRA goes up (assigning rights to users in JIRA is
> > > > > multiplied).  Risk to new contributors is high, because each
> isolated
> > > > > repository has its own life and code contribution styles.  Maybe
> the
> > > > actual
> > > > > apache infra involvement is low, but the negative effects of
> community
> > > > and
> > > > > source code bifurcation goes up.
> > > > >
> > > > > Tagging in mono-repos is done by prefixing the name of the
> component in
> > > > the
> > > > > tag name.  Your release sources are still generated from the
> component
> > > > > folder (not from the root).
> > > > >
> > > > > Modularization (as being proposed) is a good thing, but can be
> done in a
> > > > > single repository. It's not a requirement to split up the git
> project to
> > > > > get the benefits of modularization.  That's the point I'm hoping
> is seen
> > > > in
> > > > > this.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 12, 2019 at 10:08 AM Joe Witt 
> wrote:
> > > > >
> > > > > > to clarify user management for

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft

To be honest and to your point Joe, the thing that optimizes the RM duties
should probably be preferred in all of this.  There is so much overhead for
the release manager, that lubricating the RM process probably trumps a lot
of my concerns.  I think there's real concern for making the project harder
for new contributors. But likewise, that concern should be balanced with
making the project harder for longtime contributors who have pulled the
cart the most.

I was just at least hoping for a discussion on the concept.  Thanks as
always for your leadership and contributions to the nifi community.

On Fri, Jul 12, 2019 at 10:48 AM Joe Witt  wrote:

> Ah I agree the JIRA thing would be too heavy handed.  A single JIRA with
> well defined components tied to 'repos' is good.
>
> As far as separate code repos we're talking about different releasable
> artifacts for which we as a PMC are responsible for the meaning/etc..  As a
> many time RM I definitely dislike the mono repo construct as I understand
> it to function.  I prefer repos per source release artifact where all
> source in that artifact is a function of the release. I am ok with
> different convenience binaries resulting from a single source release
> artifact though.
>
> Thanks
>
> On Fri, Jul 12, 2019 at 12:26 PM Adam Taft  wrote:
>
> > I think the concerns around user management are valid, are they not?
> > Overhead in JIRA goes up (assigning rights to users in JIRA is
> > multiplied).  Risk to new contributors is high, because each isolated
> > repository has its own life and code contribution styles.  Maybe the
> actual
> > apache infra involvement is low, but the negative effects of community
> and
> > source code bifurcation goes up.
> >
> > Tagging in mono-repos is done by prefixing the name of the component in
> the
> > tag name.  Your release sources are still generated from the component
> > folder (not from the root).
> >
> > Modularization (as being proposed) is a good thing, but can be done in a
> > single repository. It's not a requirement to split up the git project to
> > get the benefits of modularization.  That's the point I'm hoping is seen
> in
> > this.
> >
> >
> >
> > On Fri, Jul 12, 2019 at 10:08 AM Joe Witt  wrote:
> >
> > > to clarify user management for infra is not a prob.  it is an ldap
> group.
> > >
> > > repo creation is self service as well amd group access is tied to that.
> > >
> > > release artifact is the source we produce.  this is typically
> correlated
> > to
> > > a tag of the repo.  if we have all source in one repo it isnt clear to
> me
> > > how we can maintain that.
> > >
> > > in any event im not making a statement of whether to do many repos or
> > not.
> > > just correcting some potentially misleading claims.
> > >
> > > thanks
> > >
> > > On Fri, Jul 12, 2019, 12:01 PM Adam Taft  wrote:
> > >
> > > > Just as a point of discussion, I'm not entirely sure that splitting
> > into
> > > > multiple physical git repositories is actually adding any value.  I
> > think
> > > > it's worth consideration that all the (good) changes being proposed
> are
> > > > done under a single mono-repository model.
> > > >
> > > > If we split into multiple repositories, you have substantially
> > increased
> > > > the infra surface area. User account management overhead goes up.
> > Support
> > > > from the infra team goes up. JIRA issue management goes up,
> > > > misfiled/miscategorized issues become common. It becomes harder for
> > > > community members to interact and engage with the project, steeper
> > > learning
> > > > curve for new contributors. There are more "side channel"
> conversations
> > > and
> > > > less transparency into the project as a whole. Git history is much
> > harder
> > > > (or impossible) to follow across the entire project. Tracking down
> bugs
> > > and
> > > > performing git blame or git bisect becomes hard.
> > > >
> > > > There's nothing really stopping all of these changes from occurring
> in
> > > the
> > > > existing repo, we don't have to have a maven pom.xml in the root of
> the
> > > > project repository. It's much easier for contributors to just clone a
> > > > single repository, read the README at the root, and get oriented to
> the
> > > > project layout.  Output artifacts can still be versioned differently
> >

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft

I think the concerns around user management are valid, are they not?
Overhead in JIRA goes up (assigning rights to users in JIRA is
multiplied).  Risk to new contributors is high, because each isolated
repository has its own life and code contribution styles.  Maybe the actual
apache infra involvement is low, but the negative effects of community and
source code bifurcation goes up.

Tagging in mono-repos is done by prefixing the name of the component in the
tag name.  Your release sources are still generated from the component
folder (not from the root).

Modularization (as being proposed) is a good thing, but can be done in a
single repository. It's not a requirement to split up the git project to
get the benefits of modularization.  That's the point I'm hoping is seen in
this.



On Fri, Jul 12, 2019 at 10:08 AM Joe Witt  wrote:

> to clarify user management for infra is not a prob.  it is an ldap group.
>
> repo creation is self service as well amd group access is tied to that.
>
> release artifact is the source we produce.  this is typically correlated to
> a tag of the repo.  if we have all source in one repo it isnt clear to me
> how we can maintain that.
>
> in any event im not making a statement of whether to do many repos or not.
> just correcting some potentially misleading claims.
>
> thanks
>
> On Fri, Jul 12, 2019, 12:01 PM Adam Taft  wrote:
>
> > Just as a point of discussion, I'm not entirely sure that splitting into
> > multiple physical git repositories is actually adding any value.  I think
> > it's worth consideration that all the (good) changes being proposed are
> > done under a single mono-repository model.
> >
> > If we split into multiple repositories, you have substantially increased
> > the infra surface area. User account management overhead goes up. Support
> > from the infra team goes up. JIRA issue management goes up,
> > misfiled/miscategorized issues become common. It becomes harder for
> > community members to interact and engage with the project, steeper
> learning
> > curve for new contributors. There are more "side channel" conversations
> and
> > less transparency into the project as a whole. Git history is much harder
> > (or impossible) to follow across the entire project. Tracking down bugs
> and
> > performing git blame or git bisect becomes hard.
> >
> > There's nothing really stopping all of these changes from occurring in
> the
> > existing repo, we don't have to have a maven pom.xml in the root of the
> > project repository. It's much easier for contributors to just clone a
> > single repository, read the README at the root, and get oriented to the
> > project layout.  Output artifacts can still be versioned differently (api
> > can have a different version from extensions).  "Splitting out" modules
> can
> > still happen in the mono-repository.  Jenkins and friends can be taught
> the
> > project layout.
> >
> > tl;dr - The changes being proposed can be done in a single repository.
> > Splitting into multiple repositories is adding overhead on multiple
> levels,
> > which might be a sneaky form of muda. [1]
> >
> > Thanks for reading,
> > Adam
> >
> > [1] https://dzone.com/articles/seven-wastes-software
> >
> >
> > On Thu, Jul 11, 2019 at 11:01 AM Otto Fowler 
> > wrote:
> >
> > > I agree that this looks great. I think Mike’s idea is worth considering
> > as
> > > well. I would hope, that as part of this effort some thought will be
> > given
> > > to enhancing the developer documentation around the modules would be
> > given
> > > as well.
> > >
> > >
> > >
> > >
> > > On July 10, 2019 at 18:15:21, Mike Thomsen (mikerthom...@gmail.com)
> > wrote:
> > >
> > > I agree. It's very well thought out. One change to consider is
> splitting
> > > the extensions further into two separate repos. One that would serve
> as a
> > > standard library of sorts for other component developers and another
> that
> > > would include everything else. Things like the Record API would go into
> > the
> > > former so that we could have a more conservative release schedule going
> > > forward with those components.
> > >
> > > On Wed, Jul 10, 2019 at 4:17 PM Andy LoPresto 
> > > wrote:
> > >
> > > > Thanks Kevin, this looks really promising.
> > > >
> > > > Updating the link here as I think the page may have moved:
> > > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/NIFI/

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft

Just as a point of discussion, I'm not entirely sure that splitting into
multiple physical git repositories is actually adding any value.  I think
it's worth consideration that all the (good) changes being proposed are
done under a single mono-repository model.

If we split into multiple repositories, you have substantially increased
the infra surface area. User account management overhead goes up. Support
from the infra team goes up. JIRA issue management goes up,
misfiled/miscategorized issues become common. It becomes harder for
community members to interact and engage with the project, steeper learning
curve for new contributors. There are more "side channel" conversations and
less transparency into the project as a whole. Git history is much harder
(or impossible) to follow across the entire project. Tracking down bugs and
performing git blame or git bisect becomes hard.

There's nothing really stopping all of these changes from occurring in the
existing repo, we don't have to have a maven pom.xml in the root of the
project repository. It's much easier for contributors to just clone a
single repository, read the README at the root, and get oriented to the
project layout.  Output artifacts can still be versioned differently (api
can have a different version from extensions).  "Splitting out" modules can
still happen in the mono-repository.  Jenkins and friends can be taught the
project layout.

tl;dr - The changes being proposed can be done in a single repository.
Splitting into multiple repositories is adding overhead on multiple levels,
which might be a sneaky form of muda. [1]

Thanks for reading,
Adam

[1] https://dzone.com/articles/seven-wastes-software

On Thu, Jul 11, 2019 at 11:01 AM Otto Fowler 
wrote:

> I agree that this looks great. I think Mike’s idea is worth considering as
> well. I would hope, that as part of this effort some thought will be given
> to enhancing the developer documentation around the modules would be given
> as well.
>
>
>
>
> On July 10, 2019 at 18:15:21, Mike Thomsen (mikerthom...@gmail.com) wrote:
>
> I agree. It's very well thought out. One change to consider is splitting
> the extensions further into two separate repos. One that would serve as a
> standard library of sorts for other component developers and another that
> would include everything else. Things like the Record API would go into the
> former so that we could have a more conservative release schedule going
> forward with those components.
>
> On Wed, Jul 10, 2019 at 4:17 PM Andy LoPresto 
> wrote:
>
> > Thanks Kevin, this looks really promising.
> >
> > Updating the link here as I think the page may have moved:
> >
>
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Project+and+Repository+Restructuring
> > <
> >
>
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Project+and+Repository+Restructuring
> > >
> >
> > Andy LoPresto
> > alopre...@apache.org
> > alopresto.apa...@gmail.com
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> >
> > > On Jul 10, 2019, at 12:08 PM, Kevin Doran  wrote:
> > >
> > > Hi NiFi Dev Community,
> > >
> > > Jeff Storck, Bryan Bende, and I have been collaborating back and forth
> > > on a proposal for how to restructure the NiFi source code into smaller
> > > Maven projects and repositories based on the discussion that took
> > > place awhile back on this thread. I'm reviving this older thread in
> > > order to share that proposal with the community and generate farther
> > > discussion about at solidifying a destination and a plan for how to
> > > get there.
> > >
> > > Specifically, the proposal we've started working on has three parts:
> > >
> > > 1. Goals (more or less a summary of the earlier discussion that took
> > > place on this thread)
> > > 2. Proposed end state of the new Maven project and repository structure
> > > 3. Proposed approach for how to get from where we are today to the
> > > desired end state
> > >
> > > The proposal is on the Apache NiFi Wiki [1], so that we can all
> > > collaborate on it or leave comments there.
> > >
> > > [1]
> >
>
> https://cwiki.apache.org/confluence/display/NIFIREG/NiFi+Project+and+Repository+Restructuring
> > >
> > > Thanks,
> > > Kevin, Jeff, and Bryan
> > >
> > > On Thu, May 30, 2019 at 1:31 PM Kevin Doran  wrote:
> > >>
> > >> I am also in favor of splitting the nifi maven project up into smaller
> > >> projects with independent release cycles in order to decouple
> > >> development at well defined boundaries/interfaces and also to
> > >> facilitate code reuse.
> > >>
> > >> In anticipation of eventually working towards a NiFi 2.0 that
> > >> introduces bigger changes for developers and users, I've started work
> > >> on a nifi-commons project in which I've extracted out some of the code
> > >> that originally got ported from NiFi -> NiFi Registry, and now exists
> > >> as similar code in both projects, into a standalone modular library.
> > >> That premilinary work is here on my

Re: [DISCUSS] Apache NiFi distribution has grown too large

2018-01-19 Thread Adam Taft

I'd also vote for an OSGi backend (in the long term).  It's something that
has been on my mind (and mentioned) for years now.

The Nar classloader ecosystem is trying to implement features of OSGi (and
doing it somewhat poorly at that, if you are honest).  Not saying that OSGi
is the right solution, but it's at least worth a discussion.  It would be a
non-trivial reboot of the entire framework though, which is the biggest
downside.

OSGi is very mature and has solved a lot of the problems that the Nar
system attempts to solve.  It's at least worth a serious consideration for
NiFi 2.x.

Adam

On Wed, Jan 17, 2018 at 1:57 PM, Brett Ryan  wrote:

>
>
> > On 18 Jan 2018, at 03:07, Matt Burgess  wrote:
> >
> > BTW, talking about mixin inheritance, shared dependencies, improved
> > classloading, and module repositories, I feel like OSGi is the
> > elephant in the room. I can see perfectly good reasons NOT to move to
> > an OSGi-backed architecture, but it does feel like we'd end up
> > implementing many of the same features and capabilities. Perhaps a
> > topic for a separate DISCUSS thread?
>
> I did ask the same question though I wondered why you guys aren’t using
> Felix specifically.

Re: About NIFI-3620: Multipart support in invokeHTTP.java

2017-12-12 Thread Adam Taft

Multipart is just a set of related content types (multipart/form-data,
multipart/mixed).  InvokeHTTP doesn't care too much about content types, it
just sends bytes verbatim from the flowfile payload.

What should be considered is for an upstream processor to create the
multipart payload in the flowfile.  Then InvokeHTTP can be used to deliver
those bytes.  This would likely be the easiest thing to do, in order to
keep changes to InvokeHTTP to a minimum.  And this could potentially be
used in PostHTTP, and potentially other transports too (smtp maybe).

A multipart message is really a lot like the output from MergeContent.
It's a "tar" format, of sorts.  I wouldn't muddy MergeContent either, but
the concept is closer aligned with packing multiple flowfiles together.

Anyway, I think a separate processor would be ideal, since multipart is
really not related to transport but more of content.

Thanks,

Adam


On Tue, Dec 12, 2017 at 3:45 AM, Damiano Giampaoli 
wrote:

> Ciao Pierre,
>
> I can spend some hours tomorrow looking deep in the invokeHTTP
> implementation and figure out a way to properly support Multipart messages,
> then write back in the ML asking for feedback before implementing it.
> @Andre in case you have already some code or ideas I'll be glad to follow
> it!
>
>
> Best regards
> Damiano
>
> -Original Message-
> From: Pierre Villard [mailto:pierre.villard...@gmail.com]
> Sent: Tuesday, December 12, 2017 11:25 AM
> To: dev 
> Subject: Re: About NIFI-3620: Multipart support in invokeHTTP.java
>
> Hey Damiano,
>
> Andre will correct me in case I'm wrong but he is not working on
> https://issues.apache.org/jira/browse/NIFI-3620 at the moment. If you
> want to give it a try, that would be more than appreciated.
>
> Pierre
>
> 2017-12-11 18:01 GMT+01:00 Joe Witt :
>
> > Hello
> >
> > Sounds great.  I think i might have dropped the ball on that review by
> > commenting on it and then it made others who might be able to help
> > avoid it.  Just commented on the PR again but we're of course happy to
> > work with you to improve as needed.
> >
> > Thanks
> > Joe
> >
> > On Mon, Dec 11, 2017 at 11:10 AM, Damiano Giampaoli
> >  wrote:
> > > Hi list,
> > >
> > >
> > > We are planning to move from our in-house-built workflow engine to
> > NiFi... needless to say, we love this project.
> > >
> > > We created a PoC of some our production workflows using NiFi but in
> > order to move entirely to NiFi we need a full multipart support in
> > order to be able to invoke some of our internal microservices.
> > >
> > >
> > > I saw there is already a pending PR for the ListenHTTP multipart
> > > support<
> > https://github.com/apache/nifi/pull/1795> and an issue already opened
> > about the InvokeHTTP
> > which is marked as in progress.
> > >
> > >
> > > We are willing to contribute to the developments starting from
> > > adding
> > the multipart support but before we would like to ensure that there
> > are no other developers who are already working on this, in case we
> > can provide support in testing, complete or bugfix a work not ready
> > yet to be merged in the master branch.
> > >
> > >
> > >
> > > We are looking forward to hearing from you!
> > >
> > >
> > > Best regards,
> > >
> > > Damiano
> > >
> > >
> > >
> > > SearchInk
> > >
> > >
> > >
> > > Damiano Giampaoli
> > >
> > > Software Engineer
> > >
> > >
> > >
> > > mobile:  +49 1719956912
> > >
> > > email:  
> > > 
> > dami...@searchink.com
> > >
> > >
> > >
> > > #execcircle17 event: Connecting
> > Innovators & Insurers
> > >
> > > Join our team: searchink.com/careers
> > >
> > >
> > >
> > > Koppenplatz 10, D-10115 Berlin
> > > +49 30 220 560 730
> > >
> > > www.searchink.com
> > >
> > >
> > >
> > > HRB 171236 B, Amtsgericht Charlottenburg, Berlin |  UID: DE302404693
> > >
> > > This e-mail and any attached files are confidential and may be
> > > legally
> > privileged. If you are not the addressee, any disclosure,
> > reproduction, copying, distribution, or other dissemination or use of
> > this communication is strictly prohibited. If you have received this
> > transmission in error please notify the sender immediately and then
> delete this mail.
> > >
> > > E-mail transmission cannot be guaranteed to be secure or error free
> > > as
> > information could be intercepted, corrupted, lost, destroyed, arrive
> > late or incomplete, or contain viruses. The sender therefore does not
> > accept liability for any errors or omissions in the contents of this
> > message which arise as a result of e-mail transmission or changes to
> > transmitted date not specifically approved by the sender.
> > >
> > > If this e-mail or attached files contain

Re: Dockerfile and Docker Hub Management

2017-09-21 Thread Adam Taft

Aldrin,

+1 to separate repository (bullet #2).  The basic premise that Docker
releases should happen separate from the main distribution is spot on. I
think a separate repository would help keep this separation.

I tend to believe that the future of NiFi distributions will be via
containerization. Making the Docker components somewhat a standalone
initiative will help drive changes and innovation in this area.  I'd like
to help see Docker become a first-class citizen for distributing, running
and upgrading NiFi.

Thanks,

Adam



On Thu, Sep 21, 2017 at 9:11 AM, Aldrin Piri  wrote:

> Hey folks,
>
> ** This message turned out to be more detailed than anticipated.  In
> summary, I propose consolidating Docker/container work with a separate
> release process outside of the repository they are packaging.  Full
> thoughts and background follow.  Any input would be appreciated!
>
> ---
>
> I've been working through providing some additional Docker capabilities for
> the project and wanted to share some thoughts as well as possible plans to
> help us be a bit more nimble and responsive to curating Dockerfiles and
> their respective images on DockerHub.
>
> As a bit of context, we currently have the core NiFi project captured in
> two Dockerfiles, one that is used in conjunction with a Maven plugin for
> creating an image during the NiFi build (dockermaven), and another that is
> used for building tagged releases on Docker Hub (dockerhub).  Both of these
> artifacts, currently, reside in a nifi-docker project and are activated via
> Maven profile, (-P docker).
>
> We've seen at times that this is a very coupled process and limits our
> flexibility.  For instance, we had an ill-placed 'chown' which caused a
> duplicating layer and causes our image to be doubly large.  While this has
> been remedied, given current release processes, this is included with the
> core nifi release and we have been unable to rectify that issue.
>
> Another issue is a very specific sequence of actions that needs to happen
> with the current release for artifacts to be triggered correctly in Docker
> Hub.  This can be seen in Section 6 of the release guide [1].  While there
> are ways to rectify this if the timing isn't quite right and/or an error is
> made, it can impose an additional burden on the INFRA team to facilitate
> these requests as there currently is no capability for PMCs to manage their
> Docker repositories directly.
>
> Ultimately, I think we should consider a separate release process for NiFi
> Docker, and any associated efforts that may relate to those files.  In this
> context, NiFi is encompassing of all projects/efforts in the project.
> Additional efforts could comprise of examples of configuring NiFi to be
> secured or clustered, receive data from MiNiFi instances, or using Docker
> Compose or other orchestration frameworks. I have also noticed a number of
> different areas across our work that are using Docker for integration
> testing purposes.  With some planning and coordination, we could likely
> consolidate many of these core resources/templates to allow us to reuse
> them across efforts.
>
> I believe there are two approaches from an organizational standpoint that
> let us execute on the separate release process effectively:
>
> 1.) Condense all Docker artifacts into the current NiFi repository [2].  We
> update our release for NiFi to exclude the Docker subtree to carry out our
> normal release flow and provide the build/tooling for the Docker subtree to
> be released on its own.
>
> 2.)  Establish a new git repository to handle Docker and any other
> containerization efforts and migrate all existing resources into a file
> structure that makes sense.
>
> My inclination is toward (2).
>
> Regardless of path chosen above, this frees us to handle updates and
> improvements to container efforts when needed.  Any time we wanted to
> release updates to Docker images, we could perform a separate release on
> either the subtree of (1) or the repository of (2) and reference the
> associated latest artifacts of NiFi.
>
> If you've made it this far, thanks for working through the wall of text and
> would appreciate any thoughts or comments.
>
> [1] http://nifi.apache.org/release-guide.html
> [2] https://git-wip-us.apache.org/repos/asf?p=nifi.git
>

Re: [EXT] Re: [DISCUSS} Closing in on a NiFi 1.4.0 release?

2017-09-20 Thread Adam Taft

Here's another good link to try, maybe a little easier to read:

https://issues.apache.org/jira/projects/NIFI/versions/12340589



On Wed, Sep 20, 2017 at 8:01 AM, Brandon DeVries  wrote:

> Mayank,
>
> Try this:
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20NIFI%20AND%20fixVersion%20%3D%201.4.0%20ORDER%20BY%20status%20DESC
>
> Brandon
>
>
> On Wed, Sep 20, 2017 at 9:57 AM mayank rathi 
> wrote:
>
> > Hello All,
> >
> > How can we find out list of fixes that will go in 1.4.0 release?
> >
> > Thanks!!
> >
> > On Wed, Sep 20, 2017 at 9:53 AM, Brandon DeVries  wrote:
> >
> > > All,
> > >
> > > I think we should plan on calling for a vote on Friday.  That gives two
> > > days to wrap up any outstanding tickets that anyone feels really belong
> > in
> > > 1.4.  At that point the remaining tickets can be shifted to a future
> > > release.
> > >
> > > If there are tickets that are not getting the attention they need to
> make
> > > it into the release, let the list know.
> > >
> > > Any objections?
> > >
> > > Brandon
> > >
> > > On Wed, Sep 20, 2017 at 12:32 AM Koji Kawamura  >
> > > wrote:
> > >
> > > > Hi Paul,
> > > >
> > > > I was able to reproduce the GenerateTableFetch processor issue
> > > > reported by NIFI-4395.
> > > > Please go ahead and provide a PR, I can review it.
> > > >
> > > > Thanks,
> > > > Koji
> > > >
> > > > On Wed, Sep 20, 2017 at 1:10 PM, Paul Gibeault (pagibeault)
> > > >  wrote:
> > > > > We have submitted this JIRA ticket:
> > > > > https://issues.apache.org/jira/browse/NIFI-4395
> > > > >
> > > > > This issue causes GenerateTableFetch processor to malfunction
> after a
> > > > server restart.
> > > > >
> > > > > We are very interested in getting this released in 1.4.0 and are
> > > willing
> > > > to provide the PR if there is still time.
> > > > >
> > > > > Thanks,
> > > > > Paul Gibeault
> > > > >
> > > > >
> > > > > -Original Message-
> > > > > From: Michael Hogue [mailto:michael.p.hogu...@gmail.com]
> > > > > Sent: Tuesday, September 19, 2017 9:36 AM
> > > > > To: dev@nifi.apache.org
> > > > > Subject: [EXT] Re: [DISCUSS} Closing in on a NiFi 1.4.0 release?
> > > > >
> > > > > All,
> > > > >
> > > > >There are a couple of issues with open PRs that i think would be
> > > > desirable to get into 1.4.0:
> > > > >
> > > > >   - https://github.com/apache/nifi/pull/2163 - trivial one-liner
> in
> > > > ListenGRPC
> > > > >   - https://github.com/apache/nifi/pull/1985 - support TLS
> algorithm
> > > > selection via SSLContextService in HandleHTTPRequest
> > > > >
> > > > > Thanks,
> > > > > Mike
> > > > >
> > > > > On Tue, Sep 19, 2017 at 10:46 AM Mark Bean 
> > > > wrote:
> > > > >
> > > > >> I agree with including only those which can be completed quickly
> in
> > > > 1.4.0.
> > > > >> We are anxious for the next release to begin exercising some of
> the
> > > > >> new features. IMO it's time to get 1.4.0 out the door.
> > > > >>
> > > > >> Thanks,
> > > > >> Mark
> > > > >>
> > > > >> On Mon, Sep 18, 2017 at 6:59 PM, Jeff  wrote:
> > > > >>
> > > > >> > Still good.  Was looking through tickets yesterday and today and
> > > > >> > while review progress has been made on some PRs, it might be
> best
> > to
> > > > >> > move JIRAs tagged for 1.4.0 that have PRs and aren't on the cusp
> > of
> > > > >> > being committed
> > > > >> to
> > > > >> > post 1.4.0.  Thoughts?
> > > > >> >
> > > > >> > On Mon, Sep 18, 2017 at 2:50 PM Joe Witt 
> > > wrote:
> > > > >> >
> > > > >> > > Definitely agree with Brandon that due for a 1.4.0 and it has
> > some
> > > > >> > > really nice things in it
> > > > >> > >
> > > > >> > > Jeff Storck volunteered to RM.  Jeff you still good?
> Anything I
> > > > >> > > can
> > > > >> help
> > > > >> > > with?
> > > > >> > >
> > > > >> > >
> > > > >> > > Thanks
> > > > >> > > Joe
> > > > >> > >
> > > > >> > > On Mon, Sep 18, 2017 at 2:28 PM, Brandon DeVries  >
> > > > wrote:
> > > > >> > > > There are significant changes in 1.4.0 that I am actively
> > > > >> > > > waiting
> > > > >> on...
> > > > >> > > >
> > > > >> > > > On Mon, Sep 18, 2017 at 2:25 PM Russell Bateman <
> > > > >> r...@windofkeltia.com
> > > > >> > >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > >> I don't know. Are we due for a release? Is time-since the
> > > > >> significant
> > > > >> > > >> factor in a release cycle or is growing features part of
> it?
> > > > >> > > >>
> > > > >> > > >> 1.3.0 subsists with no bump of the third digit. This is an
> > > > >> > > >> oddly
> > > > >> > stable
> > > > >> > > >> .0 product (though the third digit had somewhat different
> > > > >> > > >> semantics
> > > > >> in
> > > > >> > > >> NiFi 0.x). No bug fixes to 1.3.0 in its roughly 6-month
> > > history?
> > > > >> > That's
> > > > >> > > >> an achievement.
> > > > >> > > >>
> > > >

Re: Does PostHTTP support Multipart/form-data ?

2017-06-27 Thread Adam Taft

The multipart/form-data body would have to be preemptively created and
stored in your flowfile payload.  InvokeHTTP could then be used to POST the
message body to the remote server (after having set the appropriate
content-type).

i.e. you have to manually construct the multipart form in the flowfile
payload itself, the NIFI processors aren't going to do that for you like
curl does.

Does that make sense?

On Mon, Jun 26, 2017 at 1:44 PM, icreatedanaccount <
pierluc.boudr...@gmail.com> wrote:

> I know they work great for sending parameters in the request headers, but
> does the PostHTTP and InvokeHTTP processors support multipart/form-data as
> a
> content Content-Type ?
>
> Imagine something like this in CURL :
> curl -v -X POST -F "file=@/19010230.bin"
>
> Best,
> Luc
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/Does-PostHTTP-support-Multipart-
> form-data-tp16259.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: How to ingest files into HDFS via Apache NiFi from non-hadoop environment

2017-06-27 Thread Adam Taft

This is a bit outside of the box, but I have actually implemented this
solution previously.

My scenario was very similar.  NIFI was installed outside of the firewalled
HDFS cluster.  The only external access to the HDFS cluster was through SSH.

Therefore, my solution was to use SSH to call a remote command on the HDFS
node.  This was enabled using the ExecuteStreamCommand processor.  I used
the hadoop fs command line tools, piping in the contents of the flowfile.

The basic command (assuming put) would look something like this:

$>  cat file.ext | hadoop fs -put - /hdfs/path/file.ext

This would read from standard input and store the stream into file.ext.
Next you add the SSH execution to call the above.

$>  cat file.ext | ssh user@remote 'hadoop fs -put - /hdfs/path/file.ext'

Now we can try to put the above into the ExecuteStreamCommand processor.
We will extract the filename from the flowfile attribute.  I like using
bash to execute my script:

ExecuteStreamCommand
Command Path:  /bin/bash
Command Arguments: -c; "ssh user@remote 'hadoop fs -put -
/hdfs/path/${filename}'"* unsure of the quotes here

Not sure if the above helps, since it sounds like you're going for
something more than 'get' and 'put'.  But the above is an easy mechanism to
interact with an HDFS cluster if the NIFI node is not running on the
cluster.

On Fri, Jun 23, 2017 at 2:53 PM, Mothi86  wrote:

> Okay thanks so that clarifies that NiFi will not work in terms of
> integrating
> from local machine / non-hadoop environment to hadoop environment. It
> either
> has to be in edge node or built up a node similar restriction of edge or
> management node.
>
> Is this HDF recommended solution ?
>
> Will spinning a VM work ? Can you suggest me VM requirements for Apache
> NiFi
> ?
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/How-to-ingest-files-into-HDFS-via-
> Apache-NiFi-from-non-hadoop-environment-tp16247p16252.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: [VOTE] Release Apache NiFi 1.3.0

2017-06-05 Thread Adam Taft

Got past the first error, thanks Joey.  Indeed, the non-root user was the
fix.

Unfortunately, now I'm getting another test failure.  Can anyone confirm
this one?

Running org.apache.nifi.controller.StandardFlowSynchronizerSpec
Tests run: 4, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 0.677 sec
<<< FAILURE! - in org.apache.nifi.controller.StandardFlowSynchronizerSpec
scaling of /conf/scale-positions-flow-0.7.0.xml with encoding version
"null"(org.apache.nifi.controller.StandardFlowSynchronizerSpec)  Time
elapsed: 0.485 sec  <<< ERROR!
java.lang.NullPointerException: null
at
org.apache.nifi.controller.StandardFlowSynchronizer.readFlowFromDisk(StandardFlowSynchronizer.java:558)
at
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:176)
at org.apache.nifi.controller.StandardFlowSynchronizerSpec.scaling of
#filename with encoding version
"#flowEncodingVersion"(StandardFlowSynchronizerSpec.groovy:83)

scaling of /conf/scale-positions-flow-0.7.0.xml with encoding version
"0.7"(org.apache.nifi.controller.StandardFlowSynchronizerSpec)  Time
elapsed: 0.047 sec  <<< ERROR!
java.lang.NullPointerException: null
at
org.apache.nifi.controller.StandardFlowSynchronizer.readFlowFromDisk(StandardFlowSynchronizer.java:558)
at
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:176)
at org.apache.nifi.controller.StandardFlowSynchronizerSpec.scaling of
#filename with encoding version
"#flowEncodingVersion"(StandardFlowSynchronizerSpec.groovy:83)

scaling of /conf/scale-positions-flow-0.7.0.xml with encoding version
"1.0"(org.apache.nifi.controller.StandardFlowSynchronizerSpec)  Time
elapsed: 0.036 sec  <<< ERROR!
java.lang.NullPointerException: null
at
org.apache.nifi.controller.StandardFlowSynchronizer.readFlowFromDisk(StandardFlowSynchronizer.java:558)
at
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:176)
at org.apache.nifi.controller.StandardFlowSynchronizerSpec.scaling of
#filename with encoding version
"#flowEncodingVersion"(StandardFlowSynchronizerSpec.groovy:83)

scaling of /conf/scale-positions-flow-0.7.0.xml with encoding version
"99.0"(org.apache.nifi.controller.StandardFlowSynchronizerSpec)  Time
elapsed: 0.037 sec  <<< ERROR!
java.lang.NullPointerException: null
at
org.apache.nifi.controller.StandardFlowSynchronizer.readFlowFromDisk(StandardFlowSynchronizer.java:558)
at
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:176)
at org.apache.nifi.controller.StandardFlowSynchronizerSpec.scaling of
#filename with encoding version
"#flowEncodingVersion"(StandardFlowSynchronizerSpec.groovy:83)

Results :
Tests in error:
  StandardFlowSynchronizerSpec.scaling of #filename with encoding version
"#flowEncodingVersion":83 ? NullPointer
  StandardFlowSynchronizerSpec.scaling of #filename with encoding version
"#flowEncodingVersion":83 ? NullPointer
  StandardFlowSynchronizerSpec.scaling of #filename with encoding version
"#flowEncodingVersion":83 ? NullPointer
  StandardFlowSynchronizerSpec.scaling of #filename with encoding version
"#flowEncodingVersion":83 ? NullPointer

On Mon, Jun 5, 2017 at 4:18 PM, Adam Taft <a...@adamtaft.com> wrote:

> OK, will check out root/non-root.  Thanks for the heads up on that.  Give
> me a minute, will check with a non-root user.
>
> Adam
>
>
> On Mon, Jun 5, 2017 at 4:12 PM, Joey Frazee <joey.fra...@icloud.com>
> wrote:
>
>> Adam, this can happen if you're building as root (E.g., if you're being
>> lazy like me and just using a disposable Docker container).
>>
>> NIFI-3836 is open for it.
>>
>> If this is what it is, just build as a non root user.
>>
>> > On Jun 5, 2017, at 5:25 PM, Adam Taft <a...@adamtaft.com> wrote:
>> >
>> > I'm getting a test failure for this RC.  Here is the maven snippet.
>> >
>> > ---
>> > T E S T S
>> > ---
>> > Running org.apache.nifi.provenance.CryptoUtilsTest
>> > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.761
>> sec
>> > <<< FAILURE! - in org.apache.nifi.provenance.CryptoUtilsTest
>> > testShouldNotValidateUnreadableFileBasedKeyProvider(org.apac
>> he.nifi.provenance.CryptoUtilsTest)
>> > Time elapsed: 0.052 sec  <<< FAILURE!
>> > org.codehaus.groovy.runtime.powerassert.PowerAssertionError: assert
>> > !unreadableKeyProviderIsValid
>> >   ||
>> >   |true
>> >   false
>> >at
&g

Re: [VOTE] Release Apache NiFi 0.7.4

2017-06-05 Thread Adam Taft

+1 (binding)

Verified gpg signature.
Verified all hashes on source zipfile.
Performed mvn clean install -Pcontrib-checks
Builds cleanly, all tests pass in docker container centos:latest w/
openjdk-1.8.0 and maven 3.5.0
LICENSE, NOTICE, README look good.
Binary runs as expected with a simple dataflow.

Cheers,

Adam

Re: [VOTE] Release Apache NiFi 1.3.0

2017-06-05 Thread Adam Taft

OK, will check out root/non-root.  Thanks for the heads up on that.  Give
me a minute, will check with a non-root user.

Adam


On Mon, Jun 5, 2017 at 4:12 PM, Joey Frazee <joey.fra...@icloud.com> wrote:

> Adam, this can happen if you're building as root (E.g., if you're being
> lazy like me and just using a disposable Docker container).
>
> NIFI-3836 is open for it.
>
> If this is what it is, just build as a non root user.
>
> > On Jun 5, 2017, at 5:25 PM, Adam Taft <a...@adamtaft.com> wrote:
> >
> > I'm getting a test failure for this RC.  Here is the maven snippet.
> >
> > ---
> > T E S T S
> > ---
> > Running org.apache.nifi.provenance.CryptoUtilsTest
> > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.761
> sec
> > <<< FAILURE! - in org.apache.nifi.provenance.CryptoUtilsTest
> > testShouldNotValidateUnreadableFileBasedKeyProvider(org.
> apache.nifi.provenance.CryptoUtilsTest)
> > Time elapsed: 0.052 sec  <<< FAILURE!
> > org.codehaus.groovy.runtime.powerassert.PowerAssertionError: assert
> > !unreadableKeyProviderIsValid
> >   ||
> >   |true
> >   false
> >at
> > org.codehaus.groovy.runtime.InvokerHelper.assertFailed(
> InvokerHelper.java:402)
> >at
> > org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(
> ScriptBytecodeAdapter.java:650)
> >at
> > org.apache.nifi.provenance.CryptoUtilsTest.
> testShouldNotValidateUnreadableFileBasedKeyProvider(
> CryptoUtilsTest.groovy:214)
> > Results :
> > Failed tests:
> >  CryptoUtilsTest.testShouldNotValidateUnreadableFileBasedKeyProvider:214
> > assert !unreadableKeyProviderIsValid
> >   ||
> >   |true
> >   false
> >
> > I'm running from a clean docker container from centos:latest.  I
> installed
> > openjdk-1.8.0-devel and maven 3.5.0 into the container.  The openjdk
> comes
> > with the crypto extensions, so I don't think this is the problem.
> >
> > Any thoughts on the above?
> >
> > By the way, the signatures and hashes look good.  However I don't see a
> > 1.3.0-RC1 tag, as per your email.
> >
> > Thanks,
> >
> > Adam
> >
> > p.s. here's more environment info:
> >
> > [root@0e3de1bf9bfc nifi-1.3.0]# mvn --version
> > Apache Maven 3.5.0 (ff8f5e7444045639af65f6095c62210b5713f426;
> > 2017-04-03T19:39:06Z)
> > Maven home: /opt/apache-maven-3.5.0
> > Java version: 1.8.0_131, vendor: Oracle Corporation
> > Java home: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.
> x86_64/jre
> > Default locale: en_US, platform encoding: ANSI_X3.4-1968
> > OS name: "linux", version: "4.9.27-moby", arch: "amd64", family: "unix"
> > [root@0e3de1bf9bfc nifi-1.3.0]# uname -a
> > Linux 0e3de1bf9bfc 4.9.27-moby #1 SMP Thu May 11 04:01:18 UTC 2017 x86_64
> > x86_64 x86_64 GNU/Linux
> > [root@0e3de1bf9bfc nifi-1.3.0]# cat /etc/redhat-release
> > CentOS Linux release 7.3.1611 (Core)
> >
> >
> >
> >
> >> On Mon, Jun 5, 2017 at 11:54 AM, Matt Gilman <mcgil...@apache.org>
> wrote:
> >>
> >> Hello,
> >>
> >>
> >> I am pleased to be calling this vote for the source release of Apache
> NiFi
> >> nifi-1.3.0.
> >>
> >>
> >> The source zip, including signatures, digests, etc. can be found at:
> >>
> >> https://repository.apache.org/content/repositories/orgapachenifi-1108
> >>
> >>
> >> The Git tag is nifi-1.3.0-RC1
> >>
> >> The Git commit ID is ddb73612bd1512d8b2151b81f9aa40811bca2aaa
> >>
> >> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
> >> ddb73612bd1512d8b2151b81f9aa40811bca2aaa
> >>
> >>
> >> Checksums of nifi-1.3.0-source-release.zip:
> >>
> >> MD5: 8b115682ac392342b9edff3bf0658ecb
> >>
> >> SHA1: f11cdebbabdc0d8f1f0dd4c5b880ded39d17f234
> >>
> >> SHA256: 9ba5565729d98c472c31a1fdbc44e9dc1eee87a2cf5184e8428743f75314
> 5b7f
> >>
> >>
> >> Release artifacts are signed with the following key:
> >>
> >> https://people.apache.org/keys/committer/mcgilman.asc
> >>
> >>
> >> KEYS file available here:
> >>
> >> https://dist.apache.org/repos/dist/release/nifi/KEYS
> >>
> >>
> >> 110 issues were closed/resolved for this release:
> >>
> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> >> projectId=12316020=12340498
> >>
> >>
> >> Release note highlights can be found here:
> >>
> >> https://cwiki.apache.org/confluence/display/NIFI/
> >> Release+Notes#ReleaseNotes-Version1.3.0
> >>
> >>
> >> The vote will be open for 72 hours.
> >>
> >> Please download the release candidate and evaluate the necessary items
> >> including checking hashes, signatures, build from source, and test.  The
> >> please vote:
> >>
> >>
> >> [ ] +1 Release this package as nifi-1.3.0
> >>
> >> [ ] +0 no opinion
> >>
> >> [ ] -1 Do not release this package because...
> >>
>

Re: [VOTE] Release Apache NiFi 1.3.0

2017-06-05 Thread Adam Taft

I'm getting a test failure for this RC.  Here is the maven snippet.

---
 T E S T S
---
Running org.apache.nifi.provenance.CryptoUtilsTest
Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.761 sec
<<< FAILURE! - in org.apache.nifi.provenance.CryptoUtilsTest
testShouldNotValidateUnreadableFileBasedKeyProvider(org.apache.nifi.provenance.CryptoUtilsTest)
Time elapsed: 0.052 sec  <<< FAILURE!
org.codehaus.groovy.runtime.powerassert.PowerAssertionError: assert
!unreadableKeyProviderIsValid
   ||
   |true
   false
at
org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:402)
at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:650)
at
org.apache.nifi.provenance.CryptoUtilsTest.testShouldNotValidateUnreadableFileBasedKeyProvider(CryptoUtilsTest.groovy:214)
Results :
Failed tests:
  CryptoUtilsTest.testShouldNotValidateUnreadableFileBasedKeyProvider:214
assert !unreadableKeyProviderIsValid
   ||
   |true
   false

I'm running from a clean docker container from centos:latest.  I installed
openjdk-1.8.0-devel and maven 3.5.0 into the container.  The openjdk comes
with the crypto extensions, so I don't think this is the problem.

Any thoughts on the above?

By the way, the signatures and hashes look good.  However I don't see a
1.3.0-RC1 tag, as per your email.

Thanks,

Adam

p.s. here's more environment info:

[root@0e3de1bf9bfc nifi-1.3.0]# mvn --version
Apache Maven 3.5.0 (ff8f5e7444045639af65f6095c62210b5713f426;
2017-04-03T19:39:06Z)
Maven home: /opt/apache-maven-3.5.0
Java version: 1.8.0_131, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre
Default locale: en_US, platform encoding: ANSI_X3.4-1968
OS name: "linux", version: "4.9.27-moby", arch: "amd64", family: "unix"
[root@0e3de1bf9bfc nifi-1.3.0]# uname -a
Linux 0e3de1bf9bfc 4.9.27-moby #1 SMP Thu May 11 04:01:18 UTC 2017 x86_64
x86_64 x86_64 GNU/Linux
[root@0e3de1bf9bfc nifi-1.3.0]# cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)




On Mon, Jun 5, 2017 at 11:54 AM, Matt Gilman  wrote:

> Hello,
>
>
> I am pleased to be calling this vote for the source release of Apache NiFi
> nifi-1.3.0.
>
>
> The source zip, including signatures, digests, etc. can be found at:
>
> https://repository.apache.org/content/repositories/orgapachenifi-1108
>
>
> The Git tag is nifi-1.3.0-RC1
>
> The Git commit ID is ddb73612bd1512d8b2151b81f9aa40811bca2aaa
>
> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
> ddb73612bd1512d8b2151b81f9aa40811bca2aaa
>
>
> Checksums of nifi-1.3.0-source-release.zip:
>
> MD5: 8b115682ac392342b9edff3bf0658ecb
>
> SHA1: f11cdebbabdc0d8f1f0dd4c5b880ded39d17f234
>
> SHA256: 9ba5565729d98c472c31a1fdbc44e9dc1eee87a2cf5184e8428743f753145b7f
>
>
> Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/mcgilman.asc
>
>
> KEYS file available here:
>
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
>
> 110 issues were closed/resolved for this release:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12316020=12340498
>
>
> Release note highlights can be found here:
>
> https://cwiki.apache.org/confluence/display/NIFI/
> Release+Notes#ReleaseNotes-Version1.3.0
>
>
> The vote will be open for 72 hours.
>
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build from source, and test.  The
> please vote:
>
>
> [ ] +1 Release this package as nifi-1.3.0
>
> [ ] +0 no opinion
>
> [ ] -1 Do not release this package because...
>

Re: [GitHub] nifi issue #272: NIFI-1620 Allow empty Content-Type in InvokeHTTP processor

2016-06-15 Thread Adam Taft

I added a comment to the JIRA ticket associated with this pull request.  I
think there should be discussion / buy-in from others on the aestetics of
introducing a new processor property for this edge case.  Instead, I think
the goals of this request could be fulfilled without strictly introducing a
new property, which I think would be a likely improved approach.

https://issues.apache.org/jira/browse/NIFI-1620

Maybe we should postpone this ticket resolution and not merge in 0.7.x
until more discussion has occurred?  I wouldn't want to merge this change
without at least a few nods agreeing to the proposed property.

On Wed, Jun 15, 2016 at 5:05 PM, JPercivall  wrote:

> Github user JPercivall commented on the issue:
>
> https://github.com/apache/nifi/pull/272
>
> @taftster I'll let you finish it up tonight if you have time since
> you've already had eyes on it. If you're not able to, I'll take a look
> tomorrow.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

Re: Dynamic URLs using InvokeHttp from an array

2016-04-03 Thread Adam Taft

You are probably missing the necessary change to the following file:

META-INF/services/org.apache.nifi.processor.Processor

If you haven't modified this file to include your processor, this would be
the problem.

Adam



On Fri, Apr 1, 2016 at 2:53 PM, kkang  wrote:

> I looked into the links, but they didn't quite give me what I was looking
> for.  Instead, I thought I would try a different route.  I "borrowed" the
> code from InvokeHttp and created another processor called InvokeHttpLooped
> (I know...not very original).
>
> I modified it so that I could put place holders in the URL (example:
> "http://somedns.com/${0}?query=${1}"...I tried following the same type of
> pattern already established).  I then added another property
> (ContextDataList) that should contain 1 or more rows with comma delimited
> values.  In a loop for the number of line in ContextDataList, I create a
> separate FlowFile, fill in the placeholders matching the sequence in the
> comma separated values, and finally do what InvokeHttp did for each
> iteration of the loop for the number of rows.
>
> Since this was the first time using Maven, I finally succeeded in getting a
> build done; however, when I copied in the NAR file, the InvokeHttpLooped
> did
> not show up in the UI.
>
> I probably missed some steps...any suggestions?
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Dynamic-URLs-using-InvokeHttp-from-an-array-tp8638p8722.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: Dynamic URLs using InvokeHttp from an array

2016-03-31 Thread Adam Taft

Yeah, these solutions won't work for thousands of iterations.  Andy's
suggestion for using ExecuteScript starts to sound very compelling,
especially if you are algorithmically generating your term values.

Another thought for you.  Uwe Geercken was experimenting with a processor
which could read in a CSV file and output a flowfile attribute for every
cell in the CSV data.  Something like this might work for you.

Basically you'd have a single column CSV file with all your terms.  For
every line in the file, a new flowfile would be produced.  Each "column"
from each line would be stored as a flowfile attribute.  You'd end up with
a new flowfile for every term, with a flowfile attribute containing that
term.

Here's a link to his work:

https://github.com/uwegeercken/nifi_processors

Here's an archive from the mailing list discussion:

http://mail-archives.apache.org/mod_mbox/nifi-dev/201603.mbox/%3Ctrinity-4e63574c-9f19-459f-b048-ca40667e964c-1458542998682@3capp-webde-bs02%3E

Something like this might be worth considering as well.

On Thu, Mar 31, 2016 at 9:10 AM, kkang  wrote:

> Thanks, but unfortunately I have thousands of iterations that must occur so
> this would probably be too tedious; however, it is a technique that may
> come
> in handy with smaller looped scenarios.  I am still looking at the
> solutions
> that Andy sent earlier.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Dynamic-URLs-using-InvokeHttp-from-an-array-tp8638p8658.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: Dynamic URLs using InvokeHttp from an array

2016-03-31 Thread Adam Taft

OK, one more "out the box" idea to consider.

UpdateAttribute also has a mode which "clones" the flowfile if multiple
rules are matched.  Here's the specific quote from the UpdateAttribute
documentation:

"If the FlowFile policy is set to "use clone", and multiple rules match,
then a copy of the incoming FlowFile is created, such that the number of
outgoing FlowFiles is equal to the number of rules that match. In other
words, if two rules (A and B) both match, then there will be two outgoing
FlowFiles, one for Rule A and one for Rule B. This can be useful in
situations where you want to add an attribute to use as a flag for routing
later. In this example, there will be two copies of the file available, one
to route for the A path, and one to route for the B path"

If you used the Advanced UI, you might be able to create rules which always
match, but alter the value of the $foo parameter to your liking.  If the
"use clone" option was set, it would create a new flowfile for every rule
matched.  Thus if your array had 10 values, you'd have 10 rules, each one
would set $foo to a different value.  Out from UpdateAttribute, you'd end
up with 10 flowfiles that could be sent to InvokeHTTP.

That might be a fun way to solve this.  :)

On Thu, Mar 31, 2016 at 9:55 AM, Adam Taft <a...@adamtaft.com> wrote:

> One (possibly bad) idea would be to try and loop your flow around the
> UpdateAttribute processor using RouteOnAttribute.  UpdateAttribute has an
> "advanced" mode which would let you do logic something like:
>
> if $foo == "" then set $foo = "step 1";
> if $foo == "step 1" then set $foo = "step 2";
> if $foo == "step 2" then set $foo = "step 3";
> ...
> if $foo == "step n" then set $foo = "finished";
>
> The next part would be RouteOnAttribute, which would read the value of
> $foo and if set to "finished" break the loop.  Otherwise it would pass to
> InvokeHTTP and then back to UpdateAttribute.  The setup for this would be
> tedious, but I think it would technically work.
>
> Just putting this out there for brainstorming purposes.
>
>
>
>
> On Wed, Mar 30, 2016 at 6:25 PM, kkang <ki.k...@ds-iq.com> wrote:
>
>> I have been able to figure out how to GenerateFlowFile -> UpdateAttribute
>> ->
>> InvokeHttp to dynamically send a URL (example:
>> https://somedomain.com?parameterx=${foo}); however, I need to do this N
>> number of times and replace ${foo} with a known set of values.  Is there a
>> way to call InvokeHttp multiple times and use the next value for ${foo}
>> automatically?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/Dynamic-URLs-using-InvokeHttp-from-an-array-tp8638.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>
>
>

Re: Dynamic URLs using InvokeHttp from an array

2016-03-31 Thread Adam Taft

One (possibly bad) idea would be to try and loop your flow around the
UpdateAttribute processor using RouteOnAttribute.  UpdateAttribute has an
"advanced" mode which would let you do logic something like:

if $foo == "" then set $foo = "step 1";
if $foo == "step 1" then set $foo = "step 2";
if $foo == "step 2" then set $foo = "step 3";
...
if $foo == "step n" then set $foo = "finished";

The next part would be RouteOnAttribute, which would read the value of $foo
and if set to "finished" break the loop.  Otherwise it would pass to
InvokeHTTP and then back to UpdateAttribute.  The setup for this would be
tedious, but I think it would technically work.

Just putting this out there for brainstorming purposes.

On Wed, Mar 30, 2016 at 6:25 PM, kkang  wrote:

> I have been able to figure out how to GenerateFlowFile -> UpdateAttribute
> ->
> InvokeHttp to dynamically send a URL (example:
> https://somedomain.com?parameterx=${foo}); however, I need to do this N
> number of times and replace ${foo} with a known set of values.  Is there a
> way to call InvokeHttp multiple times and use the next value for ${foo}
> automatically?
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Dynamic-URLs-using-InvokeHttp-from-an-array-tp8638.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: Re: Processor: User friendly vs system friendly design

2016-03-19 Thread Adam Taft

Uwe,

I'll take a look at your code sometime soon.  However, just to point you in
the direction, I'd suggest extracting your single line CSV data into
flowfile attributes named as you've demonstrated.  i.e.  create a processor
which reads each CSV column as a flowfile attribute, using a configured
naming convention.

For example, using "column" as your prefix with your example input, you'd
end up with a single flowfile with attributes like:

column0 = Peterson
column1 = Jenny
column2 = New York
column3 = USA

Flowfile attributes are effectively a Map<String,String>.  So in your
Velocity processor, you would pass the Map of flowfile attributes to the
template engine and record the results to the flowfile content.

Using SplitText seems correct up front (though like you said, you lose the
CSV header line).  You'd need two additional processors, from my
perspective:

(input) -> SplitText -> ExtractCSVColumns -> ApplyVelocityTemplate ->
(output)

It's the "split row into fields and merge with template" that we would
want to separate into two processors instead of one.

You're very much on the right track, I believe.  If the above doesn't help,
I'll try and jump in on a code example when I can.

Adam


On Fri, Mar 18, 2016 at 5:04 PM, Uwe Geercken <uwe.geerc...@web.de> wrote:

> Adam,
>
> I don't see an obvious way for your suggestion of "Read columns from a
> single CSV line into flowfile attributes." - I would need your advice how I
> can achieve it.
>
> Thinking about it in more detail, I have following issues:
> - the incomming flowfile may have many columns. so adding the columns
> manually as attributes with UpdateAttributes is not feasible
> - I have setup a flow where I use SplitText to divide the flowfile into
> multiple flowfiles, so there won't be a header row I can use to get the
> column names. So I think I can only use abstract column names plus a
> running number. e.g. column0, column1, etc.
>
> So for the moment I have coded the processor like described below. At the
> moment I am still "thinking in CSV" but I will check it with other formats
> later. The user can steer follwoing settings: path where the template is
> stored, name of the template file, the label for the columns (I call it
> prefix) and the separator based on which the split of the row is done.
>
> Example Flowfile content (user has chosen "comma" as separator:
>
> Peterson, Jenny, New York, USA
>
> Example template (user has chosen "column" as the prefix):
>
> {
> "name": "$column0",
> "first": "$column1",
> "city": "$column2",
> "country": "$column3"
> }
>
> Example flow:
>
> GetFile: Get CSV File >> SplitText : split into multiple flowfiles, one
> per row >> TemplateProcessor:
> 
> split row into fields and merge with template >> MergeContent: merge
> flowfiles into one >> PutFile: put the file to the filesystem
>
> Example result:
>
> {
> "name": "Peterson",
> "first": "Jenny",
> "city": "New York",
> "country": "USA"
>  }
>
> I will test the processor now for larger files, empty files and other
> exceptions. If you are interested the code is here:
>
> https://github.com/uwegeercken/nifi_processors
>
> Greetings,
>
> Uwe
>
>
>
> > Gesendet: Freitag, 18. März 2016 um 18:58 Uhr
> > Von: "Adam Taft" <a...@adamtaft.com>
> > An: dev@nifi.apache.org
> > Betreff: Re: Processor: User friendly vs system friendly design
> >
> > Uwe,
> >
> > The Developer Guide[1] and Contributor Guide[2] are pretty solid.  The
> > Developer Guide has a section dealing with reading & writing flowfile
> > attributes.  Please check these out, and then if you have any specific
> > questions, please feel free to reply.
> >
> > For inclusion in NIFI directly, you'd want to create a NIFI Jira ticket
> > mentioning the new feature, and then fork the NIFI project in Github and
> > send a Pull Request referencing the ticket.  However, if you just want
> some
> > feedback on suitability and consideration for inclusion, using your own
> > personal Github project and sending a link would be fine.
> >
> > Having a template conversion processor would be a nice addition.  Making
> it
> > generic to support Velocity, FreeMarker, and others might be really nice.
> > Extra bonus points for Markdown or Asciidoc transforms as well (but these
> > might be too separate of a use case).

Re: Processor: User friendly vs system friendly design

2016-03-19 Thread Adam Taft

I'm probably on the far end of favoring composibility and processor reuse.
In this case, I would even go one step further and suggest that you're
talking about three separate operations:

1.  Split a multi-line CSV input file into individual single line flowfiles.
2.  Read columns from a single CSV line into flowfile attributes.
3.  Pass flowfile attributes into the Velocity transform processor.

The point here, have you considered driving your Velocity template
transform using flowfile attributes as opposed to CSV?  Flowfile attributes
are NIFI's lowest common data representation, many many processors create
attributes which would enable your Velocity processor to be used by more
than just CSV input.

Adam

On Fri, Mar 18, 2016 at 11:06 AM, Uwe Geercken  wrote:

>
> Hello,
>
> my first mailing here. I am a Java developer, using Apache Velocity,
> Drill, Tomcat, Ant, Pentaho ETL, MongoDb, Mysql and more and I am very much
> a data guy.
>
> I have used Nifi for a while now and started yesterday of coding my first
> processor. I basically do it to widen my knowledge and learn something new.
>
> I started with the idea of combining Apache Velocity - a template engine -
> with Nifi. So in comes a CSV file, it gets merged with a template
> containing formatting information and some placeholders (and some limited
> logic maybe) and out comes a new set of data, formatted differently. So it
> separates the processing logic from the formatting. One could create HTML,
> XML, Json or other text based formats from it. Easy to use and very
> efficient.
>
> Now my question is: Should I rather implement the logic this way that I
> process a whole CSV file - which usually has multiple lines? That would be
> good for the user as he or she has to deal with only one processor doing
> the work. But the logic would be more specialized.
>
> The other way around, I could code the processor to handle one row of the
> CSV file and the user will have to come up with a flow that divides the CSV
> file into multiple flowfiles before my processor can be used. That is not
> so specialized but it requires more preparation work from the user.
>
> I tend to go the second way. Also because there is already a processor
> that will split a file into multiple flowfiles. But I wanted to hear your
> opinion of what is the best way to go. Do you have a recommendation for me?
> (Maybe the answer is to do both?!)
>
> Thanks for sharing your thoughts.
>
> Uwe
>

Re: InvokeHTTP body

2016-03-13 Thread Adam Taft

I think it makes total sense that POST/PUT requests read from the flowfile
content.  Therefore, the problem should be fixed further up in the flow
design.  For example, try these solutions:

GenerateFlowFile -> ReplaceText -> InvokeHTTP   (or)
GetFile -> InvokeHTTP

The problem you're describing has more to do with generating static
flowfile content, which is a separate concern from how to transfer flowfile
content over the wire via http.

If the above solutions don't work for you, perhaps a modification of
GenerateFlowFile could be made which uses static content instead of random
content?

Hope this helps.

Adam

On Fri, Mar 11, 2016 at 6:56 AM, Pierre Villard  wrote:

> Hi,
>
> Would it make sense to add a property "body" allowing the user to manually
> set the body of the request for PUT/POST requests?
>
> At the moment, the body of the request seems to be only set with the
> content of incoming flow files. But it is possible to use this processor
> without incoming relationship. It would be useful to be able to set the
> body manually.
>
> The behaviour would be: if there is an incoming relationship, the incoming
> flow file content is used whatever the property "body" is, and if there is
> no incoming relationship, the request body is based on the property value.
>
> What do you think?
>
> Pierre
>

Re: [DISCUSS] git branching model

2016-02-15 Thread Adam Taft

One of the harder things with gitflow is using it in combination with
maven.  It's ideal that the tags and releases are tracking closely with the
maven pom.xml version.  gitflow, on its own, doesn't keep the pom version
updated with the git release names.

Because of the general importance of keeping releases and tags synchronized
with the pom version, I think whatever we do, it needs to be approached
with tools that are available through maven rather than from git.  The
git-flow plugin (referenced by Thad) doesn't directly help deal with this
synchronization, since it's a git tool, not a maven tool.

I've been using, with reasonable success, the jgitflow [1] plugin, which
does a reasonable job of following the gitflow model for a maven project.
I don't recommend this plugin for NIFI, because it insists that the master
branch is strictly used for published release tags (as per the strict
gitflow workflow).  I just mention this, in reference to how some plugins
are tackling the gitflow and maven synchronization issue.

[1] http://jgitflow.bitbucket.org/

On Sun, Feb 14, 2016 at 10:48 PM, Thad Guidry  wrote:

> Your on the right track / idea with Git-flow.  Your Master become primary
> development of next release (with feature branches off of it).. while you
> continue to have release branches that can have hot fix branches off of
> them.  (don't use Master as your release branch ! - bad practice ! )
>
> Here is the Git-flow cheat sheet to make it easy for everyone to
> understand... just scroll it down to gain the understanding. Its really
> that easy.
>
> http://danielkummer.github.io/git-flow-cheatsheet/
>
> Most large projects have moved into using git-flow ... and tools like
> Eclipse Mars, IntelliJ, Sourcetree, etc...have Git-flow either built in or
> plugin available now.  If you want to live on the command line, then that
> is handled easily by the instructions in the above link.
>
> Thad
> +ThadGuidry 
>

Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Adam Taft

If we're willing to have a LoopFlowFile processor, why not consider a
PenalizeFlowFile processor too?  Just throwing it out for discussions sake,
but penalization could ultimately be realized in multiple ways:

a) by both the processor developer (and DFM via penalty duration), as it is
done today;
b) by the DFM as part of the Connection Settings, per Mark's proposal;
c) by the DFM as part of a (alternative) standard processor, with various
(and future) penalization options configured as Processor Properties
d) all the above

The line is blurry between what functionality can/should go into the
connection or queue vs. which functionality can/should go into a
processor.  If we're willing to say that LoopFlowFile should be defined as
a processor, I don't see much difference between "loop for 10 minutes" vs.
"penalize for 10 minutes" (beyond the obvious).

As a general statement, I think it's good to minimize the various ways in
which processors, queues and relationships are managed.  Today we have
configuration options for:

 - queues (expiration, prioritizers, back pressure)
-  settings tab of every processor (scheduling strategy, penalty duration,
run schedule)
-  specific settings to the processor itself (processor properties).

Flowfile expiration is handled on the queue, while penalization is
configured on the processor.  At the end of the day, whatever reduces the
number places a DFM has to touch is a good thing.

Perhaps as a radical proposal, why don't we add some @Experimental
processors which do things like PenalizeFlowFile, LoopFlowfile,
DelayFlowFile, PrioritizeFlow, and see what the experience is like using
these vs. using the existing functions.  If the community thinks there's
too much overlap, we can remove these from the 1.0 release.  But at least
we'll get some A/B testing by having these queue management services
realized as processors vs. built into other object types.  Maybe put these
into a nifi-queue-management.nar extension for people to play with?

Just food for thought.

Adam

On Thu, Jan 28, 2016 at 2:36 PM, Mark Payne  wrote:

> I think for the particular pattern, I would like to see a LoopFlowFile
> processor (or something with a better name perhaps :) )
> that would allow the user to just set a threshold for how many times to
> try or how long to keep trying or both and then
> send to either a 'threshold exceeded' or 'below threshold' relationship.
> I.e., set a threshold of 3 times or 10 minutes and
> then route to one or the other. It would make that pattern a lot easier by
> just using a single easy-to-understand Processor.
>
>
>
> > On Jan 28, 2016, at 2:31 PM, Ricky Saltzer  wrote:
> >
> > That's a good point, Mark. I also agree that it's better to give the user
> > control whenever possible. I imagine the RouteOnAttribute pattern to
> > eventually "give up" on a FlowFile will be a common pattern, and so so we
> > should account for that, rather than forcing the user into knowing this
> > pattern.
> >
> > On Thu, Jan 28, 2016 at 2:11 PM, Mark Payne 
> wrote:
> >
> >>
> >> The retry idea concerns me a bit. If we were to have a method like:
> >>
> >> penalizeOrTransfer(FlowFile flowFile, int numberOfTries, Relationship
> >> relationship)
> >>
> >> I think that leaves out some info - even if a FlowFile is
> >> penalized, it must be penalized and sent somewhere. So there would have
> to
> >> be
> >> a relationship to send it to if penalized and another to send it to if
> not
> >> penalizing.
> >> This also I think puts more onus on the developer to understand how it
> >> would be
> >> used - I believe the user should be making decisions about how many
> times
> >> to
> >> penalize, not the developer.
> >>
> >>> On Jan 28, 2016, at 2:03 PM, Bryan Bende  wrote:
> >>>
> >>> Regarding throwing an exception... I believe if you are extending
> >>> AbstractProcessor and an exception is thrown out of onTrigger() then
> the
> >>> session is rolled back and any flow files that were accessed are
> >> penalized,
> >>> which results in leaving them in the incoming connection to the
> processor
> >>> and not being retried until the penalty duration passes. This seems
> >> similar
> >>> to what Michael described, although it is not stopping the processor
> from
> >>> processing other incoming  flow files.
> >>>
> >>> Ricky's retry idea sounds interesting... I think a lot of people handle
> >>> this today by creating a retry loop using UpdateAttribute and
> >>> RouteOnAttribute [1].
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1=1433271239000=v2
> >>>
> >>>
> >>> On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer 
> >> wrote:
> >>>
>  Is there currently a way to know how many times a FlowFile has been
>  penalized? Do we have use cases where we want to penalize a FlowFile
> *n
>  *number
>  of times

Re: NiFi 0.4.1 InvokeHttp processor POST error issue

2016-01-15 Thread Adam Taft

Joe,

Just as a quick observation, this statement isn't completely accurate:

> "... and can stream the contents instead of loading into memory"

The original InvokeHTTP code (pre okhttp) explicitly set the content-length
header, because it was known (the flowfile payload content length is always
known).  This does not, however, imply that the entire contents were loaded
into memory.  The previous InvokeHTTP used the
#setFixedLengthStreamingMode(long), which is described as:

"This method is used to enable streaming of a HTTP request body without
internal buffering, when the content length is known in advance." [1]

HttpURLConnection doesn't need to buffer if the length is known in
advance.  It's only when it doesn't know the length that it either needs to
buffer to determine it or use chunked encoding.

I think it's important to be able to support non-chunked encoded POST
requests.  There are many "legacy" (or even "broken") web services that
don't work with chunked encoding, obviously like in this case.

Unfortunately, I don't recall that okhttp has similar direct support for
"fixed length streaming".  It's probable that a custom implementation of
okhttp.RequestBody would need to be created to support this. [2]

[1]
https://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode-long-

[2] http://square.github.io/okhttp/3.x/okhttp/okhttp3/RequestBody.html

On Thu, Jan 14, 2016 at 10:29 PM, Joe Percivall <
joeperciv...@yahoo.com.invalid> wrote:

> Hello Evan,
>
> Glad to hear you're enjoying NiFi!
>
> I was able to replicate your results so I dug in a bit and noticed in
> Wireshark that the "Transfer-Encoding" header for InvokeHttp was set to
> "chunked". When I tried using the same flag for curl it failed so I'm
> relatively confident that is the problem. Currently InvokeHttp requires
> using the chunk encoding for POST (primarily because you don't need to know
> the content-length and can stream the contents instead of loading into
> memory).
>
> PostHttp does have a "Use Chunked Encoding" option which would solve your
> problem except that it doesn't work properly. PostHttp is using the
> "EntityTemplate" which streams the content so the content length will never
> be implemented and thus it will alway use the chunked encoding. I created a
> ticket for it [1].
>
>
> Also as a note, when creating a template you have to either explicitly
> select the connections or not select anything and create a template for the
> whole canvas (your template didn't have any connections).
>
> [1] https://issues.apache.org/jira/browse/NIFI-1396
>
> Cheers,
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
>
>
>
> On Thursday, January 14, 2016 8:07 PM, "yuchen@thomsonreuters.com" <
> yuchen@thomsonreuters.com> wrote:
>
>
>
>
> Hi Guys,
>
> Not sure if it is the correct way to raise issue by sending this email, if
> not, let me know where the post the issue, thanks.
>
> We are using NiFi InvokeHttp processor to do POST to an webpage.
> URL:
> http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx
> Request header: Content-Type: application/x-www-form-urlencoded
> POST Data:
> txt_stock_code=24984_DateOfReleaseFrom_y=2016_DateOfReleaseFrom_m=01_DateOfReleaseFrom_d=04_DateOfReleaseTo_y=2016_DateOfReleaseTo_m=01_DateOfReleaseTo_d=11=False_tier_1=-2_tier_2_group=-2_tier_2=-2
>
> To make sure the request header and request body are correct, we use
> Fiddler to compose the post request.
> And the response show the request header and post data are correct.
>
>
> Attached file is the template we are using, it is working fine on version
> 0.3.0
>
> But not on the latest version 0.4.1
>
> So we suppose it is potential defect of the InvokeHttp processor in this
> version.
> We checked the source code and try to locate the issue, and found it is
> using com.squareup.okhttp.Request; to do the request, so we are not go any
> further to dig the issue…
> Currently we are using Curl to do the POST as a workaround.
>
> Let me know your comments, thanks.
>
> Finally, NiFi is a great tool!!! You guys are awesome!!!
>
> Best Regards,
> Evan from Thomson Reuters
>

Re: remote command execution via SSH?

2015-11-24 Thread Adam Taft

Sumo,

On Tue, Nov 24, 2015 at 10:27 PM, Sumanth Chinthagunta 
wrote:

> I think you guys may have configured password less login for  SSH (keys?)
>

Correct.  I'm using SSH key exchange for authentication.  It's usually
done password-less, true, but it doesn't necessarily have to be (if using
ssh-agent).

> In my case the  edge node is managed by different team and they don’t
> allow me to add my SSH key.
>

Yikes.  Someone should teach them the benefits of ssh keys!  :)

> I am thinking we need ExecuteRemoteCommand processor (based on
> https://github.com/int128/groovy-ssh) that will take care of key or
> password base SSH login.
>

+1  - this would be a pretty nice contribution.  Recommend building the
processor and then posting here for review. I'm sure this would be a useful
processor for many people.

ExecuteRemoteCommand should have configurable attributes and return command
> output as flowfile
>
> host : Hostname or IP address.
> port : Port. Defaults to 22.
> user : User name.
> password: A password for password authentication.
> identity : A private key file for public-key authentication.
> execute - Execute a command.
> executeBackground - Execute a command in background.
> executeSudo - Execute a command with sudo support.
> shell - Execute a shell.
>
>
As we do for SSL contexts, it might make sense to bury some of these
properties in an SSH key controller service.  I'm thinking username,
password, identity might make sense to have configured externally as a
service so they could be reused by multiple processors.  Unsure though,
there might not be enough re-usability to really get the benefit.

Also, I'm thinking that the "background", "sudo" and "shell" options should
possibly be a multi-valued option of the processor, not separate
properties, and definitely not separate "commands."  i.e. I'd probably
recommend property configuration similar to ExecuteCommand, with options
for specifying the background, sudo, shell preference.

Good idea, I hope this works out.

Adam

Re: Keep Files

2015-11-16 Thread Adam Taft

Oooh, neat idea Salvatore.  +1 to creativity.  Really interesting.

Adam

On Mon, Nov 16, 2015 at 6:25 AM, Salvatore Papa <salvatore.p...@gmail.com>
wrote:

> If you're on a linux system, a alternative i've used in the past is to
> create another directory, full of symlinks pointing to the original
> directory.
>
> As an example, assuming you have a directory: /data/input_files/ full of
> files, create a directory /data/input_links/, and from that new directory,
> do: "ln -s ../input_files/* ./"
>
> Now in NiFi, use the original GetFile processor, configured with
> /data/input_links/, and set Keep Source File to False. When the GetFile
> processor picks up the file, it'll read the contents and create a flowfile
> by following the symlink, delete the symlink, and the original file will
> remain in /data/input_files.
>
> On Mon, Nov 16, 2015 at 12:00 PM, Adam Taft <a...@adamtaft.com> wrote:
>
> > Also, as a potential work-around, it's possible to use GetFile with
> > "delete" mode and then somewhere in your flow, use PutFile to place the
> > file back down into a "complete" directory.  i.e. something like:
> >
> > /path/incoming  <- use GetFile to pick up files here
> > /path/complete  <- use PutFile to place files here after processing
> >
> > As a variation of the above, if you need the files consistently in the
> same
> > directory, you could configure GetFile to only pick up certain file
> > patterns.  In this way, you could rename a file after it has been
> > processed:
> >
> > /path/incoming  <- use GetFile to pick up files named $filename.new
> > /path/incoming  <- rename file (using UpdateAttribute) to
> > $filename.complete and use PutFile to place files here after rename
> >
> > Hope that gives you some possible alternatives.
> >
> > Adam
> >
> >
> >
> > On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic <markpetro...@gmail.com>
> > wrote:
> >
> > > Keep, yes, There is a parameter to configure that. Read once. No. But
> > there
> > > is a set of processors in the works to address that. ListFile and
> > > FetchFile. ListFile will return the list of files that have changed
> since
> > > the last time the files were read - it is stateful. FetchFile can then
> > take
> > > a list and fetch them, and I would assume it would have a parameter for
> > > keep=<yes|no> like GetFile. Not sure of the status of the changes -
> have
> > > not checked recently but see:
> > > https://issues.apache.org/jira/browse/NIFI-631
> > >
> > > Mark
> > >
> > > On Fri, Nov 13, 2015 at 8:55 AM, plj <p...@mitre.org> wrote:
> > >
> > > > Is there a way for GetFile to not delete a file but only read it
> > once?  I
> > > > have a directory with files in it.  I only want the new files that
> are
> > > > added
> > > > to the to be processed.  It seems that if I set GetFile to not delete
> > the
> > > > files, the same files get read over and over.
> > > >
> > > >
> > > > thoughts?
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> > > > Sent from the Apache NiFi Developer List mailing list archive at
> > > > Nabble.com.
> > > >
> > >
> >
>

Re: Keep Files

2015-11-15 Thread Adam Taft

Also, as a potential work-around, it's possible to use GetFile with
"delete" mode and then somewhere in your flow, use PutFile to place the
file back down into a "complete" directory.  i.e. something like:

/path/incoming  <- use GetFile to pick up files here
/path/complete  <- use PutFile to place files here after processing

As a variation of the above, if you need the files consistently in the same
directory, you could configure GetFile to only pick up certain file
patterns.  In this way, you could rename a file after it has been processed:

/path/incoming  <- use GetFile to pick up files named $filename.new
/path/incoming  <- rename file (using UpdateAttribute) to
$filename.complete and use PutFile to place files here after rename

Hope that gives you some possible alternatives.

Adam



On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic 
wrote:

> Keep, yes, There is a parameter to configure that. Read once. No. But there
> is a set of processors in the works to address that. ListFile and
> FetchFile. ListFile will return the list of files that have changed since
> the last time the files were read - it is stateful. FetchFile can then take
> a list and fetch them, and I would assume it would have a parameter for
> keep= like GetFile. Not sure of the status of the changes - have
> not checked recently but see:
> https://issues.apache.org/jira/browse/NIFI-631
>
> Mark
>
> On Fri, Nov 13, 2015 at 8:55 AM, plj  wrote:
>
> > Is there a way for GetFile to not delete a file but only read it once?  I
> > have a directory with files in it.  I only want the new files that are
> > added
> > to the to be processed.  It seems that if I set GetFile to not delete the
> > files, the same files get read over and over.
> >
> >
> > thoughts?
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> > Sent from the Apache NiFi Developer List mailing list archive at
> > Nabble.com.
> >
>

Re: ExecuteStreamCommand tests

2015-11-12 Thread Adam Taft

git revert is your friend.

https://git-scm.com/docs/git-revert

It's not "rollback" -- it's another new commit with the changes reinstated.

On Thu, Nov 12, 2015 at 5:45 PM, Joe Witt  wrote:

> ok - will undo the commit.  I get to learn a new git trick?  Or just
> add them back?  I must admit I'm not sure how best to do that.
>
> On Thu, Nov 12, 2015 at 5:39 PM, Brandon DeVries  wrote:
> > I would undo the removal for now, and make a point of doing the test
> > properly. I don't like the idea of removing the test and saying we'll add
> > new ones eventually (those sorts of things tend to not happen...).
> >
> > Brandon
> > On Thu, Nov 12, 2015 at 5:36 PM Tony Kurc  wrote:
> >
> >> Shipping built jars that tests depend on is icky. Not shipping the
> source
> >> to those tests is ickier.
> >> On Nov 12, 2015 5:34 PM, "Joe Witt"  wrote:
> >>
> >> > i think we should kill those tests which depend on the build of those
> >> > jars personally.  But if the view is to undo the removal of those
> >> > three classes i can do that.
> >> >
> >> > Thanks
> >> > Joe
> >> >
> >> > On Thu, Nov 12, 2015 at 5:32 PM, Tony Kurc  wrote:
> >> > > Do you plan to undo the removal?
> >> > > On Nov 12, 2015 4:46 PM, "Joe Witt"  wrote:
> >> > >
> >> > >> well that explains these goofball classes I deleted the other day
> >> > >>
> >> > >> https://issues.apache.org/jira/browse/NIFI-1134
> >> > >>
> >> > >> These classes were used to make those Jars.  Those jars are used to
> >> > >> test execute command.  We've now removed the source that was
> floating
> >> > >> randomly.  We need the built to automatically create whatever we
> >> > >> execute against if we're going to do this.  Those tests should be
> >> > >> replaced by something else.
> >> > >>
> >> > >>
> >> > >> On Thu, Nov 12, 2015 at 3:02 PM, Joe Percivall
> >> > >>  wrote:
> >> > >> > Tony,
> >> > >> >
> >> > >> > I did a bit of digging through the history and the jars were a
> part
> >> of
> >> > >> the initial code import so unless if Joe or someone else knows
> where
> >> > they
> >> > >> came from then we may be out of luck.
> >> > >> >
> >> > >> > Joe
> >> > >> >
> >> > >> > - - - - - -
> >> > >> > Joseph Percivall
> >> > >> > linkedin.com/in/Percivall
> >> > >> > e: joeperciv...@yahoo.com
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On Wednesday, November 11, 2015 6:23 PM, Tony Kurc <
> >> trk...@gmail.com>
> >> > >> wrote:
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > All, I was code reviewing and something occurred to me. This
> raised
> >> my
> >> > >> > eyebrow:
> >> > >> >
> >> > >>
> >> >
> >>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestExecuteStreamCommand.java#L63
> >> > >> >
> >> > >> > If I'm reading it right, the test "runs" a jar that we've got in
> our
> >> > >> source
> >> > >> > tree
> >> > >> >
> >> > >> > What code made those jars in src/test/resources?
> >> > >>
> >> >
> >>
>

Re: Incorporation of other Maven repositories

2015-11-06 Thread Adam Taft

I'm concerned that not all networks will be able to connect with and use
the JCenter repository.  If it's not in Maven Central, we should likely
avoid the dependency and instead find alternative approaches.

Adam



On Fri, Nov 6, 2015 at 11:31 AM, Joe Witt  wrote:

> joe explained to me he meant to update the nifi pom.xml with this
> repository.  Today we use whatever the apache pom (which we extend
> from uses) which for releases is nothing which means it is whatever
> maven defaults to (presumably maven central).  So we see that spark
> does this explicit addition of repositories on their pom for both
> primary artifacts and plugins.
>
> My concern with this is that our requirement as a community is to
> provide repeatable builds.  We looked into what Hbase and Spark do and
> in fact both of them extend their poms to depend on other repos as
> well so there is precedent.
>
> In light of finding other apache projects that use extra repositories
> and the fact that Jcenter Bintray while being a commercially focused
> repo is offering free support for OSS artifacts then I think the risk
> is low.  I am ok with this.
>
> Anyone have a different view?
>
> Thanks
> Joe
>
> On Fri, Nov 6, 2015 at 11:04 AM, Joe Witt  wrote:
> > Joe
> >
> > Sorry i didn't catch this thread sooner.  I am not supportive of
> > adding a required repo if it means we need to tell folks to update
> > their maven settings.  While it sounds trivial it really isn't.  We
> > should seek to understand better what other projects do for such
> > things.  Definitely no fast movement on this one please.
> >
> > Thanks
> > Joe
> >
> > On Fri, Nov 6, 2015 at 10:18 AM, Joe Percivall
> >  wrote:
> >> As no issues were brought up, I'm going to assume that everyone is ok
> with adding Bintray JCenter as a repo. I plan on using it in a patch for
> 0.4.0 in which I'm refactoring InvokeHttp. The patch is dependent on a lib
> to add digest authentication that is only hosted there.
> >>
> >> Thanks,
> >> Joe
> >> - - - - - -
> >> Joseph Percivall
> >> linkedin.com/in/Percivall
> >> e: joeperciv...@yahoo.com
> >>
> >>
> >>
> >>
> >> On Tuesday, November 3, 2015 4:52 PM, Matthew Burgess <
> mattyb...@gmail.com> wrote:
> >> Bintray JCenter (https://bintray.com/bintray/jcenter/) is also
> moderated and
> >> claims to be "the repository with the biggest collection of Maven
> artifacts
> >> in the world". I think Bintray itself proxies out to Maven Central, but
> it
> >> appears that for JCenter you choose to sync your artifacts with Maven
> >> Central: http://blog.bintray.com/tag/maven-central/
> >>
> >> I imagine trust is still a per-organization or per-artifact issue, but
> >> Bintray claims to be even safer and more trustworthy than Maven Central
> >> (source:
> >> http://blog.bintray.com/2014/08/04/feel-secure-with-ssl-think-again/).
> For
> >> my (current) work and home projects, I still resolve from Maven
> Central, but
> >> I have been publishing my own artifacts to Bintray.
> >>
> >> Regards,
> >> Matt
> >>
> >> From:  Aldrin Piri 
> >> Reply-To:  
> >> Date:  Tuesday, November 3, 2015 at 12:34 PM
> >> To:  
> >> Subject:  Incorporation of other Maven repositories
> >>
> >>
> >> I am writing to see what the general guidance and posture is on
> >> incorporating additional repositories into the build process.
> >>
> >> Obviously, Maven Central provides a very known quantity.  Are there
> other
> >> repositories that are viewed with the same level of trust?  If so, is
> there
> >> a listing? If not, do we vet new sources as they bring libraries that
> aid
> >> our project and how is this accomplished?
> >>
> >> Incorporating other repos brings up additional areas of concern,
> >> specifically availability but also some additional security
> considerations
> >> to the binaries that are being retrieved.
> >>
> >> Any thoughts on this front would be much appreciated.
>

Re: Incorporation of other Maven repositories

2015-11-06 Thread Adam Taft

I'm OK with this if trkurc is OK with this.  He's far wiser than I on most
everything.  ;)



On Fri, Nov 6, 2015 at 1:11 PM, Tony Kurc <trk...@gmail.com> wrote:

> As we're providing source code, the repositories section in the pom are
> more a "convenient pointer" than a "thou shalt use". Building using a
> different repository of your choosing is as simple as adding a mirror in
> your maven settings.
>
> Because of this, I'm not even close to having an objection.
>
> On Fri, Nov 6, 2015 at 1:03 PM, Joe Witt <joe.w...@gmail.com> wrote:
>
> > As an additional data point Hadoop does this as well.  So Hadoop,
> > Spark, and HBase easily three of the most widely built open source
> > projects around do this.
> >
> > Thanks
> > Joe
> >
> > On Fri, Nov 6, 2015 at 1:01 PM, Joe Witt <joe.w...@gmail.com> wrote:
> > > What are some examples of networks which can access maven central but
> > > cannot access JCenter?
> > >
> > > Thanks
> > > Joe
> > >
> > > On Fri, Nov 6, 2015 at 12:10 PM, Adam Taft <a...@adamtaft.com> wrote:
> > >> I'm concerned that not all networks will be able to connect with and
> use
> > >> the JCenter repository.  If it's not in Maven Central, we should
> likely
> > >> avoid the dependency and instead find alternative approaches.
> > >>
> > >> Adam
> > >>
> > >>
> > >>
> > >> On Fri, Nov 6, 2015 at 11:31 AM, Joe Witt <joe.w...@gmail.com> wrote:
> > >>
> > >>> joe explained to me he meant to update the nifi pom.xml with this
> > >>> repository.  Today we use whatever the apache pom (which we extend
> > >>> from uses) which for releases is nothing which means it is whatever
> > >>> maven defaults to (presumably maven central).  So we see that spark
> > >>> does this explicit addition of repositories on their pom for both
> > >>> primary artifacts and plugins.
> > >>>
> > >>> My concern with this is that our requirement as a community is to
> > >>> provide repeatable builds.  We looked into what Hbase and Spark do
> and
> > >>> in fact both of them extend their poms to depend on other repos as
> > >>> well so there is precedent.
> > >>>
> > >>> In light of finding other apache projects that use extra repositories
> > >>> and the fact that Jcenter Bintray while being a commercially focused
> > >>> repo is offering free support for OSS artifacts then I think the risk
> > >>> is low.  I am ok with this.
> > >>>
> > >>> Anyone have a different view?
> > >>>
> > >>> Thanks
> > >>> Joe
> > >>>
> > >>> On Fri, Nov 6, 2015 at 11:04 AM, Joe Witt <joe.w...@gmail.com>
> wrote:
> > >>> > Joe
> > >>> >
> > >>> > Sorry i didn't catch this thread sooner.  I am not supportive of
> > >>> > adding a required repo if it means we need to tell folks to update
> > >>> > their maven settings.  While it sounds trivial it really isn't.  We
> > >>> > should seek to understand better what other projects do for such
> > >>> > things.  Definitely no fast movement on this one please.
> > >>> >
> > >>> > Thanks
> > >>> > Joe
> > >>> >
> > >>> > On Fri, Nov 6, 2015 at 10:18 AM, Joe Percivall
> > >>> > <joeperciv...@yahoo.com.invalid> wrote:
> > >>> >> As no issues were brought up, I'm going to assume that everyone is
> > ok
> > >>> with adding Bintray JCenter as a repo. I plan on using it in a patch
> > for
> > >>> 0.4.0 in which I'm refactoring InvokeHttp. The patch is dependent on
> a
> > lib
> > >>> to add digest authentication that is only hosted there.
> > >>> >>
> > >>> >> Thanks,
> > >>> >> Joe
> > >>> >> - - - - - -
> > >>> >> Joseph Percivall
> > >>> >> linkedin.com/in/Percivall
> > >>> >> e: joeperciv...@yahoo.com
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> On Tuesday, November 3, 2015 4:52 PM, Matthew Burgess <
> > >>> mattyb...@gmail.com> wrote:
> >

Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread Adam Taft

This thread has forked into two different conversations:  1. improvements
to LogAttribute processor; 2. improvements to processor documentation.

1)  re: improvements to LogAttribute - we already have NIFI-67 [1] that
suggests a number of improvements to LogAttribute.  One of these is the use
of a custom name for the logger so that logback rules can be written
against that name.

While the provenance engine is great for many scenarios, in my opinion, it
doesn't replace the need for true text-based logging.  The tooling for log
processing is very mature and there's no ability to "grep" a provenance
repository, migrate or offload provenance logs into deep storage, store log
events into a database, or do any other cool syslogd or logback type
things.  Being able to capture and log a flowfile at the exact right place
in the data flow and processing it using the command line is an extremely
valuable tool in the toolkit.

For a long time, I've wanted to work on at least some of the things
mentioned in NIFI-67 and will hopefully get to do so time willing.  Having
a custom "name" for the LogAttribute processor seems like a no-brainer.
Contributions for this should definitely be welcome!

2) improvements to processor document - I agree, even as a somewhat
seasoned NIFI user, I still have a hard time reading and understanding the
processor documentation.  I often do exactly what Mark P. suggests and
instead go directly to the source.  Any contribution towards better
processor documentation is greatly appreciated!

[1] https://issues.apache.org/jira/browse/NIFI-67

On Mon, Nov 2, 2015 at 1:54 PM, Aldrin Piri  wrote:

> We greatly appreciate contributions.  Your prescribed approach sounds great
> and if you are willing to give us a few cycles pointing out, and optionally
> correcting, the items that are in need of improvement, we will certainly
> incorporate.
>
> Thanks!
>
> On Mon, Nov 2, 2015 at 1:28 PM, Mark Petronic 
> wrote:
>
> > I'm sort of in the camp of "don't come with a complaint if you don't come
> > with a solution" and hesitated to even raise the documentation comment
> > without just fixing it myself. How about this, I just do some updates on
> > some processor docs myself and use that as my first contribution to work
> > through the process of committing to this project?
> >
> > But, to give you one quick example, EvaluateJSONPath (which, btw has
> pretty
> > good docs otherwise) does not mention HOW to extract the JSON you are
> > interested in. I had to look at the code to figure out it used this:
> > https://github.com/jayway/JsonPath. Ok, that was not hard, I admit, but,
> > as
> > a user, should I need to look at the code for such information? I submit,
> > no. Me personally, I like to dig into the code. So, this is more a
> comment
> > on "overall goodness" for the general new user experience.
> >
> > I agree with your assessment of 'new user vibe' as I am starting to not
> > notice it as much. lol
> >
> > On Mon, Nov 2, 2015 at 10:15 AM, Joe Witt  wrote:
> >
> > > Mark
> > >
> > > All fair points.  Can you please point out which processor docs
> > > specifically should be better.  Let's fix em..you will quickly lose
> that
> > > new user vibe and not notice what needs to improve as much.  We need to
> > > make the new user experience awesome.
> > >
> > > Thanks
> > > Joe
> > > On Nov 2, 2015 10:08 AM, "Mark Petronic" 
> wrote:
> > >
> > > > My primary use is for understanding Nifi. I like to direct various
> > > > processors output into both their logical next processor stage as
> well
> > as
> > > > into a log attribute processor. Then I tail the Nifi app log file and
> > > watch
> > > > what happens - in real time. I do not intend to use this for long
> term
> > > log
> > > > retention. I agree that providence is the right choice for that. So,
> > the
> > > > only reason I wanted to allow configuration of a custom logger was
> > simply
> > > > to isolate all the attribute-rich logging from the normal logging
> > > because I
> > > > was primarily interested in the attribute flows as a way to (a)
> better
> > > > understand what a processor emits because, frankly, the documentation
> > of
> > > > some of the processors is very sparse. So, I learn imperatively, so
> to
> > > > speak. I say that as a new user. I feel I should be able to get a
> > pretty
> > > > good understanding of a processor by reading the usage. But I am
> > finding
> > > > that the documentation, in some cases, is more like what I like to
> > refer
> > > to
> > > > as, "note to self" documentation. Great if you are the guy who wrote
> > the
> > > > processor with those "insights" - not so great if you are not the
> > > > developer. So, then I need to dig up the code. That should not be
> > needed
> > > as
> > > > the first step of understanding a processor as a new user. There is
> > some
> > > > well documented processors but not all

Re: LogAttribute - Sending that output to a custom logger?

2015-11-02 Thread Adam Taft

David,

This sounds like a slightly different use case than the NiFi standard
LogAttribute processor.  It sounds like your processor is more of a generic
attribute converter and file writer.  The LogAttribute processor is
designed to interact with the underlying NiFi logging subsystem, not
necessarily just to write files.

That being said, your processor may be a useful contribution to Apache
NiFi.  Specifically, the value-add of your processor might be in the
key-value format you've defined to output the flowfile attributes.  It
might be interesting to see this expressed as an attribute-to-payload
converter, chained together with potentially other processors like PutFile
in the dataflow.

If you want to contribute your processor, I would recommend making it
available on GitHub (or similar) for review by the Apache NiFi community.
Just post a link of your contribution here or even issue a pull request for
your processor.  It would at least be evaluated and considered for
inclusion.

Hope this helps.

Adam


On Mon, Nov 2, 2015 at 5:39 PM, davidrsm...@btinternet.com <
davidrsm...@btinternet.com> wrote:

> Hi
>
> Where I work we have created an attribute loggers of our own. It is a
> fairly simple affair which used a regex to determine which attributes to
> log, and writes them as key value pairs to a file, whose location is
> determined by a user properly. I'm happy to put this out there if anyone is
> interested.
>
> Sent from my HTC
>
>
> ----- Reply message -
> From: "Adam Taft" <a...@adamtaft.com>
> Date: Mon, Nov 2, 2015 19:23
> Subject: LogAttribute - Sending that output to a custom logger?
> To: <dev@nifi.apache.org>
>
> This thread has forked into two different conversations:  1. improvements
> to LogAttribute processor; 2. improvements to processor documentation.
>
> 1)  re: improvements to LogAttribute - we already have NIFI-67 [1] that
> suggests a number of improvements to LogAttribute.  One of these is the use
> of a custom name for the logger so that logback rules can be written
> against that name.
>
> While the provenance engine is great for many scenarios, in my opinion, it
> doesn't replace the need for true text-based logging.  The tooling for log
> processing is very mature and there's no ability to "grep" a provenance
> repository, migrate or offload provenance logs into deep storage, store log
> events into a database, or do any other cool syslogd or logback type
> things.  Being able to capture and log a flowfile at the exact right place
> in the data flow and processing it using the command line is an extremely
> valuable tool in the toolkit.
>
> For a long time, I've wanted to work on at least some of the things
> mentioned in NIFI-67 and will hopefully get to do so time willing.  Having
> a custom "name" for the LogAttribute processor seems like a no-brainer.
> Contributions for this should definitely be welcome!
>
> 2) improvements to processor document - I agree, even as a somewhat
> seasoned NIFI user, I still have a hard time reading and understanding the
> processor documentation.  I often do exactly what Mark P. suggests and
> instead go directly to the source.  Any contribution towards better
> processor documentation is greatly appreciated!
>
> [1] https://issues.apache.org/jira/browse/NIFI-67
>
>
> On Mon, Nov 2, 2015 at 1:54 PM, Aldrin Piri <aldrinp...@gmail.com> wrote:
>
> > We greatly appreciate contributions.  Your prescribed approach sounds
> great
> > and if you are willing to give us a few cycles pointing out, and
> optionally
> > correcting, the items that are in need of improvement, we will certainly
> > incorporate.
> >
> > Thanks!
> >
> > On Mon, Nov 2, 2015 at 1:28 PM, Mark Petronic <markpetro...@gmail.com>
> > wrote:
> >
> > > I'm sort of in the camp of "don't come with a complaint if you don't
> come
> > > with a solution" and hesitated to even raise the documentation comment
> > > without just fixing it myself. How about this, I just do some updates
> on
> > > some processor docs myself and use that as my first contribution to
> work
> > > through the process of committing to this project?
> > >
> > > But, to give you one quick example, EvaluateJSONPath (which, btw has
> > pretty
> > > good docs otherwise) does not mention HOW to extract the JSON you are
> > > interested in. I had to look at the code to figure out it used this:
> > > https://github.com/jayway/JsonPath. Ok, that was not hard, I admit,
> but,
> > > as
> > > a user, should I need to look at the code for such information? I
> submit,
> > > no. Me personally, I like to dig into the code. So, this is more a
> > comment
> > > on "overall goodness" for the general new user experience.
> > >
> > > I agree with your assessment of 'new user vibe' as I am starting to not
> > > notice it as much. lol
> > >
> > > On Mon, Nov 2, 2015 at 10:15 AM, Joe Witt <joe.w...@gmail.com> wrote:
>
>
>

Re: Source code for Version 0.3.0

2015-10-02 Thread Adam Taft

Just bumping this conversation.  Did we end up addressing this?  Are we
going for a signed release tag?  If so, does it make sense for the 0.3.0
tag to be signed by the releasor (I believe Matt Gilman)?  Or maybe just an
unsigned tag?

Thanks,

Adam


On Mon, Sep 21, 2015 at 2:28 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Looks fairly straightforward to sign a release [1].
>
> What is the workflow you'd suggest?  Can we keep our current process
> and once the vote is done just add a step to make a new identical (but
> signed) tag with a name that doesn't include '-RC#'?
>
> I'm good with that.  I understand why the RC# throws folks off so
> happy to sort this out.
>
> [1] http://gitready.com/advanced/2014/11/02/gpg-sign-releases.html
>
> On Mon, Sep 21, 2015 at 12:42 PM, Ryan Blue <b...@cloudera.com> wrote:
> > +1 for a nifi-0.3.0 release tag. Signed is even better, but I don't think
> > I'd mind if it weren't signed.
> >
> > rb
> >
> >
> > On 09/21/2015 06:35 AM, Sean Busbey wrote:
> >>
> >> The pattern I've liked the most on other projects is to create a
> >> proper release tag, signed by the RM on passage of the release vote. I
> >> don't recall off-hand what the phrasing was in the VOTE thread (if
> >> any).
> >>
> >> On Mon, Sep 21, 2015 at 8:13 AM, Adam Taft <a...@adamtaft.com> wrote:
> >>>
> >>> What's the thoughts on creating a proper 0.3.0 tag, as would be
> >>> traditional
> >>> for a final release?  It is arguably a little confusing to only have
> the
> >>> RC
> >>> tags, when looking for the final release.  I found this head scratching
> >>> for
> >>> 0.2.0 as well.
> >>>
> >>> Adam
> >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Cloudera, Inc.
>

Re: [VOTE] Release Apache NiFi nifi-nar-maven-plugin 1.1.0

2015-08-20 Thread Adam Taft

+1 - Validated source signature, hashes, licensing.

Nar plugin compiles and builds nifi-0.3.0-SNAPSHOT without problem on Mac
10.10.4 with JDK 8u60, maven 3.3.3.

Running -Dmode=tree against the standard-nar results in:

[INFO] --- nifi-nar-maven-plugin:1.1.0:provided-nar-dependencies
(default-cli) @ nifi-standard-nar ---
[INFO] --- Provided NAR Dependencies ---
+- org.apache.nifi:nifi-standard-services-api-nar:nar:0.3.0-SNAPSHOT
   +-
org.apache.nifi:nifi-ssl-context-service-api:jar:0.3.0-SNAPSHOT:compile
   +-
org.apache.nifi:nifi-distributed-cache-client-service-api:jar:0.3.0-SNAPSHOT:compile
   +-
org.apache.nifi:nifi-load-distribution-service-api:jar:0.3.0-SNAPSHOT:compile
   +- org.apache.nifi:nifi-http-context-map-api:jar:0.3.0-SNAPSHOT:compile
   +- org.apache.nifi:nifi-dbcp-service-api:jar:0.3.0-SNAPSHOT:compile

Cheers,

Adam



On Thu, Aug 20, 2015 at 10:35 PM, Aldrin Piri aldrinp...@gmail.com wrote:

 +1, binding - Release this package as nifi-nar-maven-plugin-1.1.0

 Followed the accompanying guide Matt provided for evaluating releasability.

 Code matches up with specified commit

 On Thu, Aug 20, 2015 at 8:30 AM, Bryan Bende bbe...@gmail.com wrote:

  +1 Release this package as nifi-nar-maven-plugin-1.1.0
 
  Verified all steps in Matt's helper email, functions as expected.
 
 
 
  On Wed, Aug 19, 2015 at 11:25 PM, Matt Gilman matt.c.gil...@gmail.com
  wrote:
 
   +1 (binding) Release this package as nifi-nar-maven-plugin-1.1.0
  
   On Wed, Aug 19, 2015 at 11:21 PM, Joe Witt joe.w...@gmail.com wrote:
  
+1 (binding) Release this package as nifi-nar-maven-plugin-1.1.0
   
Verified sigs, hashes, builds clean w/contrib-check.  Functions as
expected.
   
Minor:
- The README.md contains two references to our old incubator
addresses.  This should be resolved in a future release.
   
On Wed, Aug 19, 2015 at 10:57 PM, Matt Gilman 
 matt.c.gil...@gmail.com
  
wrote:
 Hello

 I am pleased to be calling this vote for the source release of
 Apache
NiFi
 nifi-nar-maven-plugin-1.1.0.

 The source zip, including signatures, digests, etc. can be found
 at:

  https://repository.apache.org/content/repositories/orgapachenifi-1059

 The Git tag is nifi-nar-maven-plugin-1.1.0-RC1
 The Git commit ID is 80841130461e8346c0bd643b4097b36bf005b3a2

   
  
 
 https://git-wip-us.apache.org/repos/asf?p=nifi-maven.git;a=commit;h=80841130461e8346c0bd643b4097b36bf005b3a2

 Checksums of nifi-nar-maven-plugin-1.1.0-source-release.zip:
 MD5: 83c70a2a1372d77b3c9e6bb5828db70f
 SHA1: 90e5667a465a092ffeea36f0621706fc55b4

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/mcgilman.asc

 KEYS file available here:
 https://dist.apache.org/repos/dist/release/nifi/KEYS

 1 issue was closed/resolved for this release:

   
  
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020version=1201

 Release note highlights can be found here:

   
  
 
 https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-NiFiNARMavenPluginVersion1.1.0

 The vote will be open for 72 hours.
 Please download the release candidate and evaluate the necessary
  items
 including checking hashes, signatures, build from source, and test.
   The
 please vote:

 [ ] +1 Release this package as nifi-nar-maven-plugin-1.1.0
 [ ] +0 no opinion
 [ ] -1 Do not release this package because because...

Re: [DISCUSS] Removal of the 'master' vs 'develop' distinction

2015-08-13 Thread Adam Taft

It's really a principle and style preference. Each of the git workflows
have pros/cons, but they are each viable. There's nothing that says that
gitflow is superior to other workflows.

Gitflow has the unique advantage that, by default, master only has exactly
the finished product tags on it, and the latest release is always at the
master's head. If you clone and checkout master, you can safely assume
you're getting the most stable release, which is what most non-contributors
want when they download source code.

If the community doesn't value this principle and master can just be a
free-for-all, that's OK too. It's going to be tougher to apply hotfixes to
existing stable releases, in my opinion, which might create more cries for
help when a bug is introduced during a release. There is a bit more wild
west and forward only approach when removing the gitflow methodology.

Using good tooling, again like my reference to jgitflow, would make the RM
process much easier. If proper tooling exists, the RM process shouldn't be
an obstacle. If the right tooling does not exist, that's a different
story, of course.

It might be good to have a survey of other Apache and open source project
development workflows. I was under the assumption that the forking
workflow is becoming the most common for open source contributions (with
Github's rise to dominance), and gitflow being a close second, but that's
just my guess, not research oriented.

I personally have no vote or stake on this issue. I'm just chiming in some
thoughts.

On Thu, Aug 13, 2015 at 4:55 PM, Joe Witt joe.w...@gmail.com wrote:

So sounds like we can set the default to develop whenever it is
cloned. That is a good start. We still have to articulate that we
have 'master' and 'develop' and help folks understand why.

So on that second part, let's help ourselves understand 'why' for our
own community. For me that is what I'm pushing back on. Why is that
helpful for *this* apache nifi community? Having done the release
management gig a couple times now I am not seeing the value add for
*this* project. There too we must be clear about how these models can
be applied to generating value apache releases.

I am open minded to this having value. That is why i was supportive
of the idea back in Nov/Dec. But over the past 8 months or so I've
only seen it as an 'extra step' for an already difficult RM task and
as something that creates confusion.

So for me, this is an easy discussion if we can clearly articulate
value of the master/develop distinction.

Thanks
Joe

On Thu, Aug 13, 2015 at 4:44 PM, Adam Taft a...@adamtaft.com wrote:
The default branch is not a feature of GitHub, GitLab, etc. It's a
feature
of git itself. On the 'bare' repository, issue this command:

git symbolic-ref HEAD refs/heads/*mybranch*

Effectively, this is what GitHub is doing. It should be possible to do
with the Apache git host as well.

On Thu, Aug 13, 2015 at 4:28 PM, Dan Bress dbr...@onyxconsults.com
wrote:

Ah, I didn't realize that was a github only thing [1], I take-back my
early comment and can now see how this is confusing.

[1]

http://mail-archives.apache.org/mod_mbox/nifi-dev/201501.mbox/%3CCALhtWke141nTsCdA4tHnZXOJ1UGhtZurLwvDsjBxH_G=86n...@mail.gmail.com%3E

Dan Bress
Software Engineer
ONYX Consulting Services

From: Joe Witt joe.w...@gmail.com
Sent: Thursday, August 13, 2015 4:22 PM
To: dev@nifi.apache.org
Subject: Re: [DISCUSS] Removal of the 'master' vs 'develop' distinction

Nope. That is just what is shown in github as the default.
On Aug 13, 2015 4:15 PM, Dan Bress dbr...@onyxconsults.com wrote:

+0. Our default branch is set to 'develop', so when you clone
apache-nifi
from git, you are automatically looking at the 'develop' branch,
right?
To
me, this is a straight forward indicator of where I should be working.

I thought we set this up a little while ago to avoid the confusion?

Dan Bress
Software Engineer
ONYX Consulting Services

From: Ryan Blue b...@cloudera.com
Sent: Thursday, August 13, 2015 4:04 PM
To: dev@nifi.apache.org
Subject: Re: [DISCUSS] Removal of the 'master' vs 'develop'
distinction

+1 to removing the distinction. Master is the default branch in a lot
of
projects and I would argue that is the common expectation. It sounds
like we can do gitflow without a separate develop branch (or at least
it
isn't too painful) so doing what new people tend to expect is a good
thing.

On 08/13/2015 12:55 PM, Mark Payne wrote:
I think the issue here is less about gitflow being hard and more
about
it being confusing.
We have had numerous people write to the dev list about why the
thing
that they checked out
doesn't have what they expect.

Even being very experience with NiFi, I've cloned

Re: Route Original Flow File Base on InvokeHTTP Response

2015-08-04 Thread Adam Taft

One option I think we kicked around at some point was to capture the
response body as a flowfile attribute in the original flowfile.  For
reasonably sized response bodies, this would work OK.  It would be a nice
way to handle your situation, because then the response becomes an
attribute of the request.

This would obviously take a code change, but adding a property to the
effect of Capture response body as flowfile attribute might be a nice
feature.


On Tue, Aug 4, 2015 at 11:57 AM, steveM stephen.c.metc...@lmco.com wrote:

 My use case is I pull the doc id from the file, call a web service with
 that
 id. The service responds with json that I would then parse to determine
 where to route the document next. Sometimes the document might be new,
 sometimes an update is allowed, sometimes duplicates need to be put
 somewhere else.
 I was hoping I was missing something that allowed you to handle a response
 and just add an attribute to the original file (or something similar to
 handle this case).



 --
 View this message in context:
 http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/Route-Original-Flow-File-Base-on-InvokeHTTP-Response-tp2317p2343.html
 Sent from the Apache NiFi (incubating) Developer List mailing list archive
 at Nabble.com.

Re: nifi error

2015-07-28 Thread Adam Taft

One possible option to help on this might be to commit a .gitattributes
file, which would basically name the problematic test files and mark them
to not modify their line endings.

http://git-scm.com/docs/gitattributes

I think the format would look something like:

/path/to/problematic/test/file -text

where the '-text' option would tell git to treat the file more like binary
and not convert line endings.



On Tue, Jul 28, 2015 at 11:15 AM, Mark Payne marka...@hotmail.com wrote:

 Excellent! Feel free to reach out if you run into any other issues.

 Thanks
 -Mark

 
  Date: Tue, 28 Jul 2015 07:37:42 -0700
  From: dattathreyulu.p...@capitalone.com
  To: d...@nifi.incubator.apache.org
  Subject: RE: nifi error
 
  thanks mark
 
  after running the
  git config --global core.autocrlf false
 
  and then i ran just the mvn clean install from the nifi folder and it
 worked
  and the build was successful.
 
 
 
 
 
  --
  View this message in context:
 http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/nifi-error-tp2240p2278.html
  Sent from the Apache NiFi (incubating) Developer List mailing list
 archive at Nabble.com.

73 matches

Mail list logo