Re: Custom-processor configuration suggestions

2023-09-27 Thread Otto Fowler
```

   ┌──┐
   │  │
   │  processor instance  │┐
   │  ││
   └──┘│
  .─.
   │
   ,─'   '─.
   │
 ,' `.
   │
╱ ╲
   │
   ╱   ╲
   │
  ; :
   │
┌─┐ │  configuration
│
   │ │
│  ┌─▶│authority│
   │ │
Configruation service │  │  : ;
   ├┬───▶│
│──┘   ╲   ╱
   ┌──┐│││
│   ╲ ╱
   │  │││
└─┘╲   ╱
   │  processor instance  │┘│
  `.   ,'
   │  │ │
'─. ,─'
   └──┘ │
   `───'
│
│
│
│
│
│
│
┌──┐│
│  ││
│  processor instance  │┤
│  ││
└──┘│
│
│
│
│
┌──┐│
│  ││
│  processor instance  │┘
│  │
└──┘
```

You can also make a shared service component that loads the configurations
by some means and serves them to the processors.
The service can get the configurations however makes sense for you ( from
REST API  like Joe is saying, to reading from disk or something ).


```



On September 27, 2023 at 4:04:56 PM, Joe Witt (joe.w...@gmail.com) wrote:

Russ

It sounds like what you have is a case of significant reference data you
need made available to various instances of this processor that knows how
to use that reference state to do its function.

This is similar to cases like IP geo enrichment where the dataset on which
you'd make the decision is larger and more importantly subject to change
over time. In such cases the ideal state is:
(A) The reference dataset(s) is hosted at a RESTful endpoint and can be
periodically pulled and stored some place local/easily accessible.
(B) The processor knows where to look for this reference dataset download
and is able to hot reload it on the fly to include understanding that the
needed datasets might not yet be made available and it should yield until
it sees them and loads them.

Thanks

On Wed, Sep 27, 2023 at 11:51 AM Russell Bateman 
wrote:

> I'm posting this plea for suggestions as I'm short on imagination here.
>
> We have some custom processors that need extraordinary amounts of
> configuration of the sort a flow writer would have to copy and paste
> in--huge amounts of Yaml, regular expressions, etc. This is what our
> flow writers are already doing. It would be easier to insert a filename
> or -path, but...
>
> ...asking a custom processor to perform filesystem I/O is icky because
> of unpredictable filesystem access post installation. Thinking about how
> installation is beyond my control, I don't want to make installation
> messy, etc. Containers, Kubernetes deployment, etc. complicate this.
>
> I thought of wiring /GetFile/ to a subdirectory (problematic, but less
> so?) and accepting files as input to pass on to needy processors who
> would recognize, adopt and incorporate configuration based on
> higher-level and simpler cues posted by flow writers as property values.
>
> Assuming you both grok and are interested in what I'm asking, do you
> have thoughts, cautionary statements or even cat-calls to 

Re: Custom-processor configuration suggestions

2023-09-27 Thread Joe Witt
Russ

It sounds like what you have is a case of significant reference data you
need made available to various instances of this processor that knows how
to use that reference state to do its function.

This is similar to cases like IP geo enrichment where the dataset on which
you'd make the decision is larger and more importantly subject to change
over time.  In such cases the ideal state is:
(A) The reference dataset(s) is hosted at a RESTful endpoint and can be
periodically pulled and stored some place local/easily accessible.
(B) The processor knows where to look for this reference dataset download
and is able to hot reload it on the fly to include understanding that the
needed datasets might not yet be made available and it should yield until
it sees them and loads them.

Thanks

On Wed, Sep 27, 2023 at 11:51 AM Russell Bateman 
wrote:

> I'm posting this plea for suggestions as I'm short on imagination here.
>
> We have some custom processors that need extraordinary amounts of
> configuration of the sort a flow writer would have to copy and paste
> in--huge amounts of Yaml, regular expressions, etc. This is what our
> flow writers are already doing. It would be easier to insert a filename
> or -path, but...
>
> ...asking a custom processor to perform filesystem I/O is icky because
> of unpredictable filesystem access post installation. Thinking about how
> installation is beyond my control, I don't want to make installation
> messy, etc. Containers, Kubernetes deployment, etc. complicate this.
>
> I thought of wiring /GetFile/ to a subdirectory (problematic, but less
> so?) and accepting files as input to pass on to needy processors who
> would recognize, adopt and incorporate configuration based on
> higher-level and simpler cues posted by flow writers as property values.
>
> Assuming you both grok and are interested in what I'm asking, do you
> have thoughts, cautionary statements or even cat-calls to offer? Maybe
> there are obvious answers I'm just not thinking of.
>
> Profuse thanks,
>
> Russ


Custom-processor configuration suggestions

2023-09-27 Thread Russell Bateman

I'm posting this plea for suggestions as I'm short on imagination here.

We have some custom processors that need extraordinary amounts of 
configuration of the sort a flow writer would have to copy and paste 
in--huge amounts of Yaml, regular expressions, etc. This is what our 
flow writers are already doing. It would be easier to insert a filename 
or -path, but...


...asking a custom processor to perform filesystem I/O is icky because 
of unpredictable filesystem access post installation. Thinking about how 
installation is beyond my control, I don't want to make installation 
messy, etc. Containers, Kubernetes deployment, etc. complicate this.


I thought of wiring /GetFile/ to a subdirectory (problematic, but less 
so?) and accepting files as input to pass on to needy processors who 
would recognize, adopt and incorporate configuration based on 
higher-level and simpler cues posted by flow writers as property values.


Assuming you both grok and are interested in what I'm asking, do you 
have thoughts, cautionary statements or even cat-calls to offer? Maybe 
there are obvious answers I'm just not thinking of.


Profuse thanks,

Russ

Re: Property management - reducing duplication

2023-09-27 Thread Pierre Villard
Hey Bence and team,

I'd definitely be in favor of a better approach here. When removing
variables, I found myself with the need to update a lot of copies of
nifi.properties as well as other configuration files across many places of
the codebase. I don't know what is the best option/approach here but having
a single source of truth somewhere and being able to reference this
everywhere with customization definitely sounds nice.

Pierre

Le mar. 26 sept. 2023 à 09:19, Simon Bence  a
écrit :

> Hi Team,
>
> I was touching some test related code in the other day and it brought to
> my attention how much partly duplicated nifi.properties files we do have in
> the project in various places.
>
> While I was searching for the value of a given property in these files, it
> got me thinking that when a property is changing (for example related to
> the 2.x efforts) or added, it indicates changes in multiple places, which
> could lead to oversights and inconsistencies. Additionally, it seems to me
> that duplicating whole configuration files might make one reluctant to
> create specific properties files for specific tests like in case of the
> system tests.
>
> I would like to propose a discussion about this, being curious if the
> community sees any value in improving the configuration management. My
> initial thoughts is to maintain one single “source of truth” properties
> file and providing some kind of utility, which could generate instances as
> needed allowing to override or extend properties when necessary.
>
> Looking forward to your insights and suggestions.
>
> Regards,
> Bence