Re: Wait/Notify Question

2019-08-19 Thread Chris Lundeberg
Thanks for the feedback, Koji - I appreciate it.  I will submit a Jira for
the below and implement an alternate solution for the potential race
condition.

Thanks!

Chris Lundeberg




On Mon, Aug 19, 2019 at 8:16 PM Koji Kawamura 
wrote:

> Hi Chris,
>
> You are correct, Wait processor has to rely on an attribute within a
> FlowFile to determine target signal count.
> I think the idea of making Wait be able to fetch target signal count
> from DistributedMapCache is a nice improvement.
>
> Please create a JIRA for further discussion. I guess we will need to
> add a property such as "Fetch Target Signal Count from Cache Service",
> boolean, defaults to false. If enabled, Wait processor treats the
> configured "Target Signal Count" value as a key in the
> DistributedMapCache, then fetch the value to use as a target count. In
> case the key is not found, the Wait processor transfer the FlowFile to
> wait relationship.
> https://issues.apache.org/jira/projects/NIFI
>
> Adding FetchDistributedMapCache right before Wait provides the same
> result. But if Wait processor can fetch it, we can reduce the number
> of fetch operation required to process multiple FlowFiles at Wait.
>
> To avoid the race condition that Wait processes FlowFiles before the
> counting part finishes, I'd use two keys at the counting part.
> Temporary one to accumulate the count, and the final one (the signal
> identifier), once the counting finished.
>
> Thanks,
> Koji
>
> On Tue, Aug 20, 2019 at 1:08 AM Chris Lundeberg 
> wrote:
> >
> > Hi all,
> >
> > I wanted to throw out a question to the larger community before I went
> down
> > a different path.  I might be looking at this wrong or making
> assumptions I
> > shouldn't.
> >
> > Recently I started working with the Wait and Notify processors a bit
> more.
> > I have a new flow which is a bit more batch in nature and these
> processors
> > seem to work nicely for being able to intelligently wait for chunks or
> > files to be processed, before moving on to the next step.  I have one
> > specific pattern that I haven't solved with the inbuilt functionality,
> > which is:
> >
> > 1. I have an incoming zip file from SFTP.  That zip contains n-number of
> > files within and each of those files need to be split in some way.  I
> won't
> > know the number of files within the zip.
> >
> > 2.  After they have been split correctly, a few transformations run on
> each
> > of the files.
> >
> > 3.  At the end of the transformation process, these various files will be
> > merged into 5 specific outbound file formats, to be sent to an outbound
> > SFTP server.  *Note*: I am not splitting and merging the same files back
> > together (I have looked at the fragment index stuff).
> >
> > I found a nice solution for being able to count the number of flowfiles
> > after the split, so I know exactly how many files should be transformed
> and
> > thus I know what my "Target Signal Count" should be within the Wait
> > processor.  At the moment I have a counting process to (1) Fetch
> > Distributed MapCache, (2) Replace text (incrementing the count number
> from
> > the fetch, if a number is found), and (3) Put Distributed MapCache.  This
> > process works as expected and I have a valid key/value pair in the
> MapCache
> > for that particular process (I create a BachID so its very specific for
> > each pull from the SFTP processor).  The only way I know how to
> > intelligently provide that information back to the Wait processor is to
> > pull that value with a Fetch Distributed MapCache right before the
> flowfile
> > enters the Wait processor.  In theory each flowfile waiting would have
> the
> > same attribute from the Fetch process and each attribute would be the
> same
> > count.  However this doesn't always work because there could exist a
> > condition where the transformations happen before the counting has been
> > done and published to the MapCache Sever.  So in this scenario you end up
> > with some flowfiles having a lower count than others or just not having
> the
> > "true" count.  Now, I can put additional gates in place such as trying to
> > slow down the flowfiles at specific sections to try and allow the
> counting
> > to be done first, but its not a perfect science.
> >
> > I thought ideally it would be good to allow the Wait processor to pull
> > directly from the MapCache if I could provide the key it would need for a
> > lookup, within the "Target Signal Count" field.  It could use the signal
> > coming from Notify to say "I have X number of Notify, for this signal"
> and
> > use the count value I have set in the MapCache to say "This is the total
> > number of files I need to see from Notify, for that same signal". This
> way,
> > I could run the Wait processor every few seconds and the chances of
> running
> > into a miscount condition would be far less.  Is there any way currently
> > where this processor could pull directly from the cache, or does it have
> to
> > rely on an attribute 

Re: Wait/Notify Question

2019-08-19 Thread Koji Kawamura
Hi Chris,

You are correct, Wait processor has to rely on an attribute within a
FlowFile to determine target signal count.
I think the idea of making Wait be able to fetch target signal count
from DistributedMapCache is a nice improvement.

Please create a JIRA for further discussion. I guess we will need to
add a property such as "Fetch Target Signal Count from Cache Service",
boolean, defaults to false. If enabled, Wait processor treats the
configured "Target Signal Count" value as a key in the
DistributedMapCache, then fetch the value to use as a target count. In
case the key is not found, the Wait processor transfer the FlowFile to
wait relationship.
https://issues.apache.org/jira/projects/NIFI

Adding FetchDistributedMapCache right before Wait provides the same
result. But if Wait processor can fetch it, we can reduce the number
of fetch operation required to process multiple FlowFiles at Wait.

To avoid the race condition that Wait processes FlowFiles before the
counting part finishes, I'd use two keys at the counting part.
Temporary one to accumulate the count, and the final one (the signal
identifier), once the counting finished.

Thanks,
Koji

On Tue, Aug 20, 2019 at 1:08 AM Chris Lundeberg  wrote:
>
> Hi all,
>
> I wanted to throw out a question to the larger community before I went down
> a different path.  I might be looking at this wrong or making assumptions I
> shouldn't.
>
> Recently I started working with the Wait and Notify processors a bit more.
> I have a new flow which is a bit more batch in nature and these processors
> seem to work nicely for being able to intelligently wait for chunks or
> files to be processed, before moving on to the next step.  I have one
> specific pattern that I haven't solved with the inbuilt functionality,
> which is:
>
> 1. I have an incoming zip file from SFTP.  That zip contains n-number of
> files within and each of those files need to be split in some way.  I won't
> know the number of files within the zip.
>
> 2.  After they have been split correctly, a few transformations run on each
> of the files.
>
> 3.  At the end of the transformation process, these various files will be
> merged into 5 specific outbound file formats, to be sent to an outbound
> SFTP server.  *Note*: I am not splitting and merging the same files back
> together (I have looked at the fragment index stuff).
>
> I found a nice solution for being able to count the number of flowfiles
> after the split, so I know exactly how many files should be transformed and
> thus I know what my "Target Signal Count" should be within the Wait
> processor.  At the moment I have a counting process to (1) Fetch
> Distributed MapCache, (2) Replace text (incrementing the count number from
> the fetch, if a number is found), and (3) Put Distributed MapCache.  This
> process works as expected and I have a valid key/value pair in the MapCache
> for that particular process (I create a BachID so its very specific for
> each pull from the SFTP processor).  The only way I know how to
> intelligently provide that information back to the Wait processor is to
> pull that value with a Fetch Distributed MapCache right before the flowfile
> enters the Wait processor.  In theory each flowfile waiting would have the
> same attribute from the Fetch process and each attribute would be the same
> count.  However this doesn't always work because there could exist a
> condition where the transformations happen before the counting has been
> done and published to the MapCache Sever.  So in this scenario you end up
> with some flowfiles having a lower count than others or just not having the
> "true" count.  Now, I can put additional gates in place such as trying to
> slow down the flowfiles at specific sections to try and allow the counting
> to be done first, but its not a perfect science.
>
> I thought ideally it would be good to allow the Wait processor to pull
> directly from the MapCache if I could provide the key it would need for a
> lookup, within the "Target Signal Count" field.  It could use the signal
> coming from Notify to say "I have X number of Notify, for this signal" and
> use the count value I have set in the MapCache to say "This is the total
> number of files I need to see from Notify, for that same signal". This way,
> I could run the Wait processor every few seconds and the chances of running
> into a miscount condition would be far less.  Is there any way currently
> where this processor could pull directly from the cache, or does it have to
> rely on an attribute within the flowfile itself?  I think it's the latter,
> but I want to make sure someone doesn't have a better idea.
>
> Sorry for the long message. Thanks!
>
>
> Chris Lundeberg


Re: TLS Toolkit - Token length

2019-08-19 Thread Pierre Villard
Hi Andy,

Thanks for your feedback. I filed a JIRA [1] and will work on a PR.

[1] https://issues.apache.org/jira/browse/NIFI-6571

Le mer. 14 août 2019 à 19:08, Andy LoPresto  a écrit :

> Hi Pierre,
>
> I think you are 100% correct that this would be aa significant
> improvement. I am in the midst of refactoring the TLS Toolkit completely
> [1], so this is something I will keep in mind for that overhaul. In the
> meantime, if you would like to file a Jira and submit a PR for the current
> instance, that would be helpful to people. Please link the Jira to this
> epic [2] where I am tracking a lot of interrelated TLS improvements.
>
> [1] https://issues.apache.org/jira/browse/NIFI-5462 <
> https://issues.apache.org/jira/browse/NIFI-5462>
> [2] https://issues.apache.org/jira/browse/NIFI-5458 <
> https://issues.apache.org/jira/browse/NIFI-5458>
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Aug 14, 2019, at 2:46 AM, Pierre Villard 
> wrote:
> >
> > Hey guys,
> >
> > It is possible to start the TLS toolkit in server mode with a token
> length
> > below the required 16 bits. But when the client is performing the
> request,
> > it'll be denied with the message "Token does not meet minimum size of 16
> > bytes". Would it make sense to just prevent the TLS toolkit to start in
> > server mode when the token is below 16 bytes?
> >
> > Happy to file a JIRA and submit a PR, just wanted to check I'm not
> missing
> > an edge case.
> >
> > Thanks,
> > Pierre
>
>


Re: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics

2019-08-19 Thread Yolanda Davis
Hi Mark and Rob

Mark thanks so much for the info on your work and Rob thanks for jumping in
on the UI! I just wanted to add, Mark, that looking at your branch I think
we also may have some opportunities to exchange notes or collaborate on the
backend as well.  The work in the feature branch is still in progress (with
some decoupling to ensure we can allow flexible configuration of models).
Please feel free to review and leave comments under the parent JIRA.  At
the same time I'll take a deeper dive on your branch and perhaps we can
exchange notes on potential areas for improvement/collaboration if it makes
sense?

Thanks Again,

-yolanda


On Mon, Aug 19, 2019 at 3:34 PM Robert Fellows 
wrote:

> Hey Mark,
>   I've started working on some UI based on the initial commit for this
> proposal. What you have done and what I am working on have a bit of
> overlap, but not much.
> I'm working on getting the predicted count and bytes into the existing
> connection metric display that is already on the canvas. The only overlap
> looks like it might be in the
> Summary table. I plan on adding a PR for my additions hopefully tomorrow.
> Maybe once it is up we can discuss how we bring the them together where it
> makes sense?
>
> This is the main JIRA case:
> https://issues.apache.org/jira/browse/NIFI-6510
> And this is the subtask that I am working toward:
> https://issues.apache.org/jira/browse/NIFI-6568
>
>
> -- Rob Fellows
>
> On Mon, Aug 19, 2019 at 2:26 PM Owens, Mark  wrote:
>
> > The images from the preview email do not appear to be displaying. They
> can
> > be viewed at:
> > https://github.com/jmark99/nifi-images
> >
> > From: Owens, Mark 
> > Sent: Monday, August 19, 2019 2:25 PM
> > To: dev@nifi.apache.org
> > Subject: RE: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics
> >
> >
> > Hi Yolanda,
> >
> >
> >
> > I've been working on a feature that appears to possibly overlap with the
> > work you are pursuing. Perhaps we should see if/should we try to
> coordinate
> > our efforts. I've been updating NiFi to predict the time to queue
> overflow
> > for both flowfiles and bytes and displaying that information in the GUI.
> > For the initial attempt, I’ve been using a simple model of straight line
> > prediction over a sliding window of 15 minutes to predict when flows will
> > fail. This estimate is then displayed on both the NiFi Summary page under
> > the connections tab and in the status history graphs.  Below are examples
> > of what would be displayed to the user.
> >
> >
> >
> > [cid:image001.png@01D55696.E4CCD550]
> >
> >
> >
> > The Connection tab contains a new column on the right that displays the
> > prediction for both flow files and data size. The user can select a
> maximum
> > time at which specific times are no longer displayed. In this example, if
> > the prediction lies beyond 12 hours then the display simply indicates
> that
> > the flow is greater than 12 hours away from failure at the moment.
> >
> >
> >
> > [cid:image002.png@01D55697.2C8AC500]
> >
> >
> >
> > This display graphs the prediction for byte overflow over time. Note that
> > if the estimate is greater than the user provided maximum value of
> interest
> > the graph maxes out at that time, effectively indicating no overflow
> > concerns.
> >
> >
> >
> > [cid:image003.png@01D55697.965C27D0]
> >
> >
> >
> > A similar display for flowfile count is displayed as well.
> >
> >
> >
> > The current state of work can be found at
> > https://github.com/jmark99/nifi/tree/time-to-overflow
> >
> >
> >
> > I welcome your (or any others) feedback on this effort.
> >
> >
> >
> > Thanks,
> > Mark
> >
> >
> >
> > P.S. If the images are not displaying, they can be viewed at
> > https://github.com/jmark99/nifi-images
> >
> >
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Yolanda Davis  > yolanda.m.da...@gmail.com>>
> > Sent: Monday, August 19, 2019 11:29 AM
> > To: dev@nifi.apache.org
> > Subject: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics
> >
> >
> >
> > Hello All,
> >
> >
> >
> > I just wanted to follow up on the discussion we started a couple of weeks
> > ago concerning an analytics framework for NiFi metrics.  Working with
> Andy
> > Christianson and Matt Burgess we shaped our ideas and drafted a proposal
> > for this feature on the Apache NiFi Wiki [1] . We've also begun
> > implementing some of these ideas in a feature branch (which is work in
> >
> > progress) [2].  We’d appreciate any questions or feedback you may have.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > -yolanda
> >
> >
> >
> > [1] -
> >
> >
> >
> https://cwiki.apache.org/confluence/display/NIFI/Operational+Analytics+Framework+for+NiFi
> >
> > [2] - https://github.com/apache/nifi/commits/analytics-framework
> >
> >
> >
> > On Wed, Jul 31, 2019 at 9:58 AM Andy Christianson <
> aichr...@protonmail.com
> > .invalid> wrote:
> >
> >
> >
> > > As someone who operated a 24/7 

Re: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics

2019-08-19 Thread Robert Fellows
Hey Mark,
  I've started working on some UI based on the initial commit for this
proposal. What you have done and what I am working on have a bit of
overlap, but not much.
I'm working on getting the predicted count and bytes into the existing
connection metric display that is already on the canvas. The only overlap
looks like it might be in the
Summary table. I plan on adding a PR for my additions hopefully tomorrow.
Maybe once it is up we can discuss how we bring the them together where it
makes sense?

This is the main JIRA case: https://issues.apache.org/jira/browse/NIFI-6510
And this is the subtask that I am working toward:
https://issues.apache.org/jira/browse/NIFI-6568


-- Rob Fellows

On Mon, Aug 19, 2019 at 2:26 PM Owens, Mark  wrote:

> The images from the preview email do not appear to be displaying. They can
> be viewed at:
> https://github.com/jmark99/nifi-images
>
> From: Owens, Mark 
> Sent: Monday, August 19, 2019 2:25 PM
> To: dev@nifi.apache.org
> Subject: RE: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics
>
>
> Hi Yolanda,
>
>
>
> I've been working on a feature that appears to possibly overlap with the
> work you are pursuing. Perhaps we should see if/should we try to coordinate
> our efforts. I've been updating NiFi to predict the time to queue overflow
> for both flowfiles and bytes and displaying that information in the GUI.
> For the initial attempt, I’ve been using a simple model of straight line
> prediction over a sliding window of 15 minutes to predict when flows will
> fail. This estimate is then displayed on both the NiFi Summary page under
> the connections tab and in the status history graphs.  Below are examples
> of what would be displayed to the user.
>
>
>
> [cid:image001.png@01D55696.E4CCD550]
>
>
>
> The Connection tab contains a new column on the right that displays the
> prediction for both flow files and data size. The user can select a maximum
> time at which specific times are no longer displayed. In this example, if
> the prediction lies beyond 12 hours then the display simply indicates that
> the flow is greater than 12 hours away from failure at the moment.
>
>
>
> [cid:image002.png@01D55697.2C8AC500]
>
>
>
> This display graphs the prediction for byte overflow over time. Note that
> if the estimate is greater than the user provided maximum value of interest
> the graph maxes out at that time, effectively indicating no overflow
> concerns.
>
>
>
> [cid:image003.png@01D55697.965C27D0]
>
>
>
> A similar display for flowfile count is displayed as well.
>
>
>
> The current state of work can be found at
> https://github.com/jmark99/nifi/tree/time-to-overflow
>
>
>
> I welcome your (or any others) feedback on this effort.
>
>
>
> Thanks,
> Mark
>
>
>
> P.S. If the images are not displaying, they can be viewed at
> https://github.com/jmark99/nifi-images
>
>
>
>
>
>
>
> -Original Message-
> From: Yolanda Davis  yolanda.m.da...@gmail.com>>
> Sent: Monday, August 19, 2019 11:29 AM
> To: dev@nifi.apache.org
> Subject: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics
>
>
>
> Hello All,
>
>
>
> I just wanted to follow up on the discussion we started a couple of weeks
> ago concerning an analytics framework for NiFi metrics.  Working with Andy
> Christianson and Matt Burgess we shaped our ideas and drafted a proposal
> for this feature on the Apache NiFi Wiki [1] . We've also begun
> implementing some of these ideas in a feature branch (which is work in
>
> progress) [2].  We’d appreciate any questions or feedback you may have.
>
>
>
> Thanks,
>
>
>
> -yolanda
>
>
>
> [1] -
>
>
> https://cwiki.apache.org/confluence/display/NIFI/Operational+Analytics+Framework+for+NiFi
>
> [2] - https://github.com/apache/nifi/commits/analytics-framework
>
>
>
> On Wed, Jul 31, 2019 at 9:58 AM Andy Christianson  .invalid> wrote:
>
>
>
> > As someone who operated a 24/7 mission-critical NiFi flow, this
>
> > feature would have been a life saver. If I'm heading home on a Friday,
>
> > it would be great to have some blinking red lights to let me know that
>
> > the system predicts that it is going to experience backpressure
>
> > sometime over the weekend, so that corrective action could be taken
> before leaving.
>
> >
>
> > Since there is support in the community for this, I created a JIRA to
>
> > track the effort:
>
> >
>
> > https://issues.apache.org/jira/browse/NIFI-6510
>
> >
>
> > I also created a JIRA to track the remote protocol:
>
> >
>
> > https://issues.apache.org/jira/browse/NIFI-6511
>
> >
>
> >
>
> > Regards,
>
> >
>
> > Andy
>
> >
>
> >
>
> > Sent from ProtonMail, Swiss-based encrypted email.
>
> >
>
> > ‐‐‐ Original Message ‐‐‐
>
> > On Wednesday, July 31, 2019 6:57 AM, Arpad Boda  > wrote:
>
> >
>
> > > If you could share a bit more details about your OPC and Modbus
>
> > > usage,
>
> > that
>
> > > would be highly appreciated!
>
> > >
>
> > > On Wed, 

RE: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics

2019-08-19 Thread Owens, Mark
The images from the preview email do not appear to be displaying. They can be 
viewed at:
https://github.com/jmark99/nifi-images

From: Owens, Mark 
Sent: Monday, August 19, 2019 2:25 PM
To: dev@nifi.apache.org
Subject: RE: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics


Hi Yolanda,



I've been working on a feature that appears to possibly overlap with the work 
you are pursuing. Perhaps we should see if/should we try to coordinate our 
efforts. I've been updating NiFi to predict the time to queue overflow for both 
flowfiles and bytes and displaying that information in the GUI. For the initial 
attempt, I’ve been using a simple model of straight line prediction over a 
sliding window of 15 minutes to predict when flows will fail. This estimate is 
then displayed on both the NiFi Summary page under the connections tab and in 
the status history graphs.  Below are examples of what would be displayed to 
the user.



[cid:image001.png@01D55696.E4CCD550]



The Connection tab contains a new column on the right that displays the 
prediction for both flow files and data size. The user can select a maximum 
time at which specific times are no longer displayed. In this example, if the 
prediction lies beyond 12 hours then the display simply indicates that the flow 
is greater than 12 hours away from failure at the moment.



[cid:image002.png@01D55697.2C8AC500]



This display graphs the prediction for byte overflow over time. Note that if 
the estimate is greater than the user provided maximum value of interest the 
graph maxes out at that time, effectively indicating no overflow concerns.



[cid:image003.png@01D55697.965C27D0]



A similar display for flowfile count is displayed as well.



The current state of work can be found at 
https://github.com/jmark99/nifi/tree/time-to-overflow



I welcome your (or any others) feedback on this effort.



Thanks,
Mark



P.S. If the images are not displaying, they can be viewed at 
https://github.com/jmark99/nifi-images







-Original Message-
From: Yolanda Davis 
mailto:yolanda.m.da...@gmail.com>>
Sent: Monday, August 19, 2019 11:29 AM
To: dev@nifi.apache.org
Subject: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics



Hello All,



I just wanted to follow up on the discussion we started a couple of weeks ago 
concerning an analytics framework for NiFi metrics.  Working with Andy 
Christianson and Matt Burgess we shaped our ideas and drafted a proposal for 
this feature on the Apache NiFi Wiki [1] . We've also begun implementing some 
of these ideas in a feature branch (which is work in

progress) [2].  We’d appreciate any questions or feedback you may have.



Thanks,



-yolanda



[1] -

https://cwiki.apache.org/confluence/display/NIFI/Operational+Analytics+Framework+for+NiFi

[2] - https://github.com/apache/nifi/commits/analytics-framework



On Wed, Jul 31, 2019 at 9:58 AM Andy Christianson 
mailto:aichr...@protonmail.com.invalid>> wrote:



> As someone who operated a 24/7 mission-critical NiFi flow, this

> feature would have been a life saver. If I'm heading home on a Friday,

> it would be great to have some blinking red lights to let me know that

> the system predicts that it is going to experience backpressure

> sometime over the weekend, so that corrective action could be taken before 
> leaving.

>

> Since there is support in the community for this, I created a JIRA to

> track the effort:

>

> https://issues.apache.org/jira/browse/NIFI-6510

>

> I also created a JIRA to track the remote protocol:

>

> https://issues.apache.org/jira/browse/NIFI-6511

>

>

> Regards,

>

> Andy

>

>

> Sent from ProtonMail, Swiss-based encrypted email.

>

> ‐‐‐ Original Message ‐‐‐

> On Wednesday, July 31, 2019 6:57 AM, Arpad Boda 
> mailto:ab...@apache.org>> wrote:

>

> > If you could share a bit more details about your OPC and Modbus

> > usage,

> that

> > would be highly appreciated!

> >

> > On Wed, Jul 31, 2019 at 12:01 PM Craig Knell 
> > craig.kn...@gmail.com

> wrote:

> >

> > > Sounds. Great

> > > Let me know if you need some help

> > > Best regards

> > > Craig

> > >

> > > > On 31 Jul 2019, at 17:31, Arpad Boda 
> > > > ab...@cloudera.com.invalid

> wrote:

> > > > Craig,

> > > > OPC ( https://issues.apache.org/jira/browse/MINIFICPP-819 ) and

> Modbus (

> > > > https://issues.apache.org/jira/browse/MINIFICPP-897 ) are on the

> way for

> > > > MiNiFi c++, hopefully both will be part of next release (0.7.0).

> > > > It's gonna be legen... wait for it! :) Regards, Arpad

> > > >

> > > > > On Wed, Jul 31, 2019 at 2:30 AM Craig Knell

> > > > > craig.kn...@gmail.com

> > > > > wrote:

> > > >

> > > > > Hi Folks

> > > > > That's our use case now. All our Models are run in python.

> > > > > Currently we send events to the ML via http, although this is

> > > > > not optimal

> > > >

RE: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics

2019-08-19 Thread Owens, Mark
Hi Yolanda,



I've been working on a feature that appears to possibly overlap with the work 
you are pursuing. Perhaps we should see if/should we try to coordinate our 
efforts. I've been updating NiFi to predict the time to queue overflow for both 
flowfiles and bytes and displaying that information in the GUI. For the initial 
attempt, I’ve been using a simple model of straight line prediction over a 
sliding window of 15 minutes to predict when flows will fail. This estimate is 
then displayed on both the NiFi Summary page under the connections tab and in 
the status history graphs.  Below are examples of what would be displayed to 
the user.



[cid:image001.png@01D55696.E4CCD550]



The Connection tab contains a new column on the right that displays the 
prediction for both flow files and data size. The user can select a maximum 
time at which specific times are no longer displayed. In this example, if the 
prediction lies beyond 12 hours then the display simply indicates that the flow 
is greater than 12 hours away from failure at the moment.



[cid:image002.png@01D55697.2C8AC500]



This display graphs the prediction for byte overflow over time. Note that if 
the estimate is greater than the user provided maximum value of interest the 
graph maxes out at that time, effectively indicating no overflow concerns.



[cid:image003.png@01D55697.965C27D0]



A similar display for flowfile count is displayed as well.



The current state of work can be found at 
https://github.com/jmark99/nifi/tree/time-to-overflow



I welcome your (or any others) feedback on this effort.



Thanks,
Mark



P.S. If the images are not displaying, they can be viewed at 
https://github.com/jmark99/nifi-images







-Original Message-
From: Yolanda Davis 
Sent: Monday, August 19, 2019 11:29 AM
To: dev@nifi.apache.org
Subject: Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics



Hello All,



I just wanted to follow up on the discussion we started a couple of weeks ago 
concerning an analytics framework for NiFi metrics.  Working with Andy 
Christianson and Matt Burgess we shaped our ideas and drafted a proposal for 
this feature on the Apache NiFi Wiki [1] . We've also begun implementing some 
of these ideas in a feature branch (which is work in

progress) [2].  We’d appreciate any questions or feedback you may have.



Thanks,



-yolanda



[1] -

https://cwiki.apache.org/confluence/display/NIFI/Operational+Analytics+Framework+for+NiFi

[2] - https://github.com/apache/nifi/commits/analytics-framework



On Wed, Jul 31, 2019 at 9:58 AM Andy Christianson 
mailto:aichr...@protonmail.com.invalid>> wrote:



> As someone who operated a 24/7 mission-critical NiFi flow, this

> feature would have been a life saver. If I'm heading home on a Friday,

> it would be great to have some blinking red lights to let me know that

> the system predicts that it is going to experience backpressure

> sometime over the weekend, so that corrective action could be taken before 
> leaving.

>

> Since there is support in the community for this, I created a JIRA to

> track the effort:

>

> https://issues.apache.org/jira/browse/NIFI-6510

>

> I also created a JIRA to track the remote protocol:

>

> https://issues.apache.org/jira/browse/NIFI-6511

>

>

> Regards,

>

> Andy

>

>

> Sent from ProtonMail, Swiss-based encrypted email.

>

> ‐‐‐ Original Message ‐‐‐

> On Wednesday, July 31, 2019 6:57 AM, Arpad Boda 
> mailto:ab...@apache.org>> wrote:

>

> > If you could share a bit more details about your OPC and Modbus

> > usage,

> that

> > would be highly appreciated!

> >

> > On Wed, Jul 31, 2019 at 12:01 PM Craig Knell 
> > craig.kn...@gmail.com

> wrote:

> >

> > > Sounds. Great

> > > Let me know if you need some help

> > > Best regards

> > > Craig

> > >

> > > > On 31 Jul 2019, at 17:31, Arpad Boda 
> > > > ab...@cloudera.com.invalid

> wrote:

> > > > Craig,

> > > > OPC ( https://issues.apache.org/jira/browse/MINIFICPP-819 ) and

> Modbus (

> > > > https://issues.apache.org/jira/browse/MINIFICPP-897 ) are on the

> way for

> > > > MiNiFi c++, hopefully both will be part of next release (0.7.0).

> > > > It's gonna be legen... wait for it! :) Regards, Arpad

> > > >

> > > > > On Wed, Jul 31, 2019 at 2:30 AM Craig Knell

> > > > > craig.kn...@gmail.com

> > > > > wrote:

> > > >

> > > > > Hi Folks

> > > > > That's our use case now. All our Models are run in python.

> > > > > Currently we send events to the ML via http, although this is

> > > > > not optimal

> > > >

> > > > > Our use case is edge ML where we want a light weight wrapper

> > > > > for Python code base.

> > > > > Jython however does not work with the code base I'm think of

> > > > > changing the interface to some thing like REDIS for

> pub/sub

> > > > > Id also like this to be a push deployment via minifi Also

> > > > > support for 

Wait/Notify Question

2019-08-19 Thread Chris Lundeberg
Hi all,

I wanted to throw out a question to the larger community before I went down
a different path.  I might be looking at this wrong or making assumptions I
shouldn't.

Recently I started working with the Wait and Notify processors a bit more.
I have a new flow which is a bit more batch in nature and these processors
seem to work nicely for being able to intelligently wait for chunks or
files to be processed, before moving on to the next step.  I have one
specific pattern that I haven't solved with the inbuilt functionality,
which is:

1. I have an incoming zip file from SFTP.  That zip contains n-number of
files within and each of those files need to be split in some way.  I won't
know the number of files within the zip.

2.  After they have been split correctly, a few transformations run on each
of the files.

3.  At the end of the transformation process, these various files will be
merged into 5 specific outbound file formats, to be sent to an outbound
SFTP server.  *Note*: I am not splitting and merging the same files back
together (I have looked at the fragment index stuff).

I found a nice solution for being able to count the number of flowfiles
after the split, so I know exactly how many files should be transformed and
thus I know what my "Target Signal Count" should be within the Wait
processor.  At the moment I have a counting process to (1) Fetch
Distributed MapCache, (2) Replace text (incrementing the count number from
the fetch, if a number is found), and (3) Put Distributed MapCache.  This
process works as expected and I have a valid key/value pair in the MapCache
for that particular process (I create a BachID so its very specific for
each pull from the SFTP processor).  The only way I know how to
intelligently provide that information back to the Wait processor is to
pull that value with a Fetch Distributed MapCache right before the flowfile
enters the Wait processor.  In theory each flowfile waiting would have the
same attribute from the Fetch process and each attribute would be the same
count.  However this doesn't always work because there could exist a
condition where the transformations happen before the counting has been
done and published to the MapCache Sever.  So in this scenario you end up
with some flowfiles having a lower count than others or just not having the
"true" count.  Now, I can put additional gates in place such as trying to
slow down the flowfiles at specific sections to try and allow the counting
to be done first, but its not a perfect science.

I thought ideally it would be good to allow the Wait processor to pull
directly from the MapCache if I could provide the key it would need for a
lookup, within the "Target Signal Count" field.  It could use the signal
coming from Notify to say "I have X number of Notify, for this signal" and
use the count value I have set in the MapCache to say "This is the total
number of files I need to see from Notify, for that same signal". This way,
I could run the Wait processor every few seconds and the chances of running
into a miscount condition would be far less.  Is there any way currently
where this processor could pull directly from the cache, or does it have to
rely on an attribute within the flowfile itself?  I think it's the latter,
but I want to make sure someone doesn't have a better idea.

Sorry for the long message. Thanks!


Chris Lundeberg


Re:[EXT] [DISCUSS] Predictive Analytics for NiFi Metrics

2019-08-19 Thread Yolanda Davis
Hello All,

I just wanted to follow up on the discussion we started a couple of weeks
ago concerning an analytics framework for NiFi metrics.  Working with Andy
Christianson and Matt Burgess we shaped our ideas and drafted a proposal
for this feature on the Apache NiFi Wiki [1] . We've also begun
implementing some of these ideas in a feature branch (which is work in
progress) [2].  We’d appreciate any questions or feedback you may have.

Thanks,

-yolanda

[1] -
https://cwiki.apache.org/confluence/display/NIFI/Operational+Analytics+Framework+for+NiFi
[2] - https://github.com/apache/nifi/commits/analytics-framework

On Wed, Jul 31, 2019 at 9:58 AM Andy Christianson
 wrote:

> As someone who operated a 24/7 mission-critical NiFi flow, this feature
> would have been a life saver. If I'm heading home on a Friday, it would be
> great to have some blinking red lights to let me know that the system
> predicts that it is going to experience backpressure sometime over the
> weekend, so that corrective action could be taken before leaving.
>
> Since there is support in the community for this, I created a JIRA to
> track the effort:
>
> https://issues.apache.org/jira/browse/NIFI-6510
>
> I also created a JIRA to track the remote protocol:
>
> https://issues.apache.org/jira/browse/NIFI-6511
>
>
> Regards,
>
> Andy
>
>
> Sent from ProtonMail, Swiss-based encrypted email.
>
> ‐‐‐ Original Message ‐‐‐
> On Wednesday, July 31, 2019 6:57 AM, Arpad Boda  wrote:
>
> > If you could share a bit more details about your OPC and Modbus usage,
> that
> > would be highly appreciated!
> >
> > On Wed, Jul 31, 2019 at 12:01 PM Craig Knell craig.kn...@gmail.com
> wrote:
> >
> > > Sounds. Great
> > > Let me know if you need some help
> > > Best regards
> > > Craig
> > >
> > > > On 31 Jul 2019, at 17:31, Arpad Boda ab...@cloudera.com.invalid
> wrote:
> > > > Craig,
> > > > OPC ( https://issues.apache.org/jira/browse/MINIFICPP-819 ) and
> Modbus (
> > > > https://issues.apache.org/jira/browse/MINIFICPP-897 ) are on the
> way for
> > > > MiNiFi c++, hopefully both will be part of next release (0.7.0).
> > > > It's gonna be legen... wait for it! :)
> > > > Regards,
> > > > Arpad
> > > >
> > > > > On Wed, Jul 31, 2019 at 2:30 AM Craig Knell craig.kn...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hi Folks
> > > > > That's our use case now. All our Models are run in python.
> > > > > Currently we send events to the ML via http, although this is not
> > > > > optimal
> > > >
> > > > > Our use case is edge ML where we want a light weight wrapper for
> > > > > Python code base.
> > > > > Jython however does not work with the code base
> > > > > I'm think of changing the interface to some thing like REDIS for
> pub/sub
> > > > > Id also like this to be a push deployment via minifi
> > > > > Also support for sensors via protocols via Modbus and OPC would be
> great
> > > > > Craig
> > > > >
> > > > > > On Wed, Jul 31, 2019 at 1:43 AM Joe Witt joe.w...@gmail.com
> wrote:
> > > > > > Definitely something that I think would really help the
> community. It
> > > > > > might make sense to frame/structure these APIs such that an
> internal
> > > > > > option
> > > > > > could be available to reduce dependencies and get up and running
> but
> > > > > > that
> > > >
> > > > > > also just as easily a remote implementation where the engine
> lives and
> > > > > > is
> > > >
> > > > > > managed externally could also be supported.
> > > > > > Thanks
> > > > > > On Tue, Jul 30, 2019 at 1:40 PM Andy LoPresto
> alopre...@apache.org
> > > > > > wrote:
> > > > > >
> > > > > > > Yolanda,
> > > > > > > I think this sounds like a great idea and will be very useful
> to
> > > > > > > admins/users, as well as enabling some interesting next-level
> > > > > > > functionality
> > > > > >
> > > > > > > and insight generation. Thanks for putting this out there.
> > > > > > > Andy LoPresto
> > > > > > > alopre...@apache.org
> > > > > > > alopresto.apa...@gmail.com
> > > > > > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D
> EF69
> > > > > > >
> > > > > > > > On Jul 30, 2019, at 5:55 AM, Yolanda Davis <
> > > > > > > > yolanda.m.da...@gmail.com>
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello Everyone,
> > > > > > > > I wanted to reach out to the community to discuss potentially
> > > > > > > > enhancing
> > > > > >
> > > > > > > > NiFi to include predictive analytics that can help users
> assess and
> > > > > > > > predict
> > > > > > > > NiFi behavior and performance. Currently NiFi has lots of
> metrics
> > > > > > > > available
> > > > > > > > for areas including jvm and flow component usage (via
> component
> > > > > > > > status)
> > > > > >
> > > > > > > as
> > > > > > >
> > > > > > > > well as provenance data which NiFi makes available either
> through
> > > > > > > > the UI
> > > > > >
> > > > > > > or
> > > > > > >
> > > > > > > > reporting tasks (for consumption by other systems). Past
> discussions
> > > > > > > > in