Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Boris Tyukin
good to know, thanks Bryan!

On Tue, Jul 10, 2018 at 11:37 AM Bryan Bende  wrote:

> You can do nested versioning since the beginning in 0.1.0.
>
> A common scenario might be to have several teams build different
> versioned flows, and then someone who is in charge of deploying them
> will create another version PG that combines the nested versioned
> process groups of each of these teams.
>
> The outer versioned flow in registry does not fully contain the
> others, just pointers to the actual versioned flows, which technically
> could come from another registry if desired.
>
> On Tue, Jul 10, 2018 at 11:22 AM, Boris Tyukin 
> wrote:
> > thanks Bryan. I saw your blog post on that. I think with registry 0.1 it
> was
> > not possible to version nested PGs within parent PGs so I could not have
> > "templatized" PG which has it is own version and use that PG with other
> > versioned PGs. Has it changed with registry 0.2 now out?
> >
> > On Tue, Jul 10, 2018 at 11:08 AM Bryan Bende  wrote:
> >>
> >> Boris,
> >>
> >> Regarding templates being limited... templates were really made as a
> >> way to share example flows, or help with debugging if you need to send
> >> someone your flow. Unfortunately they turned into a deployment
> >> mechanism since there wasn't a better solution at the time.
> >>
> >> Using NiFi Registry should now be the preferred solution, and you can
> >> sync changes in-place. You can have many versioned process groups tied
> >> to the same versioned flow in a registry and update them all.
> >>
> >> Thanks,
> >>
> >> Bryan
> >>
> >>
> >> On Tue, Jul 10, 2018 at 10:50 AM, Boris Tyukin 
> >> wrote:
> >> > I like Ed's recommendations and doing something similar. I use ISPs
> for
> >> > some
> >> > repetitive tasks, used in multiple places / flows. Unfortunately, NiFi
> >> > templates are very limited in use for that purpose (you can only
> >> > import/export them but cannot sync changes in them across flows).
> >> >
> >> > Wanted to use Python, but realized that it was not a good idea because
> >> > NiFi
> >> > was using Jython that comes with a bunch of limitations. Groovy was
> >> > really
> >> > easy to learn for basic scripting.
> >> >
> >> > Boris
> >> >
> >> > On Tue, Jul 10, 2018 at 10:00 AM Ed B  wrote:
> >> >>
> >> >> Hi James,
> >> >>
> >> >> I have implemented couple of custom processors, python- and
> >> >> groovy-based
> >> >> ISP and ES, and obviously implementations using miles-long flows.
> >> >> There are couple of aspects: development, deployment and maintenance.
> >> >> Our client, consider "code change" when you need to deploy new file
> on
> >> >> Linux box, so if you write anything in NIFI - it's not a code change,
> >> >> and
> >> >> doesn't require complex process of deployment activities, so we stick
> >> >> with
> >> >> ISP/ES unless it's not possible at all without Java (custom
> processor).
> >> >> Try to avoid Python, as ES is terribly slow with Python (actually,
> >> >> Jython).
> >> >> If your team won't be able to support your code, but they are OK with
> >> >> NIFI
> >> >> - avoid ISP/ES/custom processors unless absolutely required. Write
> long
> >> >> flows, structure them well in PGs, etc. Because any code requires
> >> >> maintenance.
> >> >> If you are writing more like a reusable component - then obviously
> >> >> single
> >> >> component (ISP/ES or custom processor) will make much more sense, as
> >> >> number
> >> >> of processors (duplicated by number of usages in different flows)
> will
> >> >> have
> >> >> an impact on memory and cpu.
> >> >> There are more, but these are my main considerations.
> >> >> Hope this helps.
> >> >>
> >> >> Regards,
> >> >> Ed.
> >> >>
> >> >> On Mon, Jul 9, 2018 at 3:30 PM James Srinivasan
> >> >>  wrote:
> >> >>>
> >> >>> Hi all,
> >> >>>
> >> >>> I was wondering if there is any general guidance about when to use
> >> >>> ExecuteScript and when to use a chain of processors? For example, in
> >> >>> one application I am downloading a HTML index file, extracting the
> >> >>> links corresponding to more index pages of data per year, fetching
> >> >>> those pages, extracting some more links per month and then
> downloading
> >> >>> the results. I'm currently doing this with a bunch of NiFi
> processors
> >> >>> (about 20 in total), whereas I could replace them all by a single
> >> >>> fairly simple Python or Groovy script called by ExecuteScript.
> >> >>>
> >> >>> In another application, I have written a custom processor but
> probably
> >> >>> could have written the same code in a script.
> >> >>>
> >> >>> Any guidance on how to choose between the three options would be
> much
> >> >>> appreciated (yay for choices!)
> >> >>>
> >> >>> Thanks very much,
> >> >>>
> >> >>> James
>


Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Kevin Doran
There's a lot of great discussion on this thread.

I’ll add that if you intend to use NiFi Registry with NiFi (which has lot of 
advantages, some of which have already been discussed), you’ll want to consider 
what is going to work best with NiFi Registry and your flow 
deployment/promotion strategy.

Here are some considerations as of today (NiFi 1.7.0 and NiFi Registry 0.2.0):

•   Using chains of built-in processors inside a process group (PG) will 
work out of the box with any NiFi and NiFi Registry instance, so things become 
very portable. Flow composition via nested/reusable process groups – that is, 
being able to build a PG out of a few processors, save it to NiFi Registry as a 
reusable component, and then importing it into other flows or multiple places 
in one flow -- is really powerful capability. Boris, to your point, nesting 
versioned process groups is an available feature (and has been since NiFi 
Registry 0.1 actually).

•   If you use ISP or ES, changes to the Script Body processor property 
will be saved/read with your flow definition in NiFi Registry, but if you are 
invoking an external script using the Script File property, only the filename 
will be saved to NiFi Registry, so changes to the script file contents need to 
be versioned and synced outside of NiFi Registry.

•   Likewise, if you use custom processors, those need to be 
versioned/deployed/installed on each NiFi separately from your flow definition. 
This is a limitation of NiFi Registry today as it only handles flow 
definitions, although this will probably not be a limitation in the future once 
the Extension Registry capabilities [1] that have been discussed in various 
forums [2] are implemented.

•   Also, not directly related to your question but may be helpful to folks 
reading this thread – if you haven’t looked at record-oriented processors 
[3][4] , they may solve this problem for you. In places that previously 
required string together long chains of processors or using a custom/ES/ISP 
processor, you may now be able to do the equivalent logic with just one or two 
record processors (and very efficiently in terms of performance too!)

At the end of the day, all these considerations can be overcome, so if you have 
a particular problem to solve my recommendation is always to use what is the 
best fit in terms of simplicity/understandability/maintainability and 
performance. In situations where those factors are more-or-less equal, these 
factors of overall ecosystem start to come into play as well. 

I like Mike’s suggestion of sticking to built-in processors when possible, and 
when not possible using ISP/ES to prototype custom logic that you ultimately 
migrate into custom processor. Once NiFi Registry gets Extension capabilities, 
that workflow should be even better! 

[1] 
https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Registry%29+for+Dynamically-loaded+Extensions
[2] 
http://apache-nifi.1125220.n5.nabble.com/DISCUSS-Apache-NiFi-distribution-has-grown-too-large-td20861.html
 
[3] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi 
[4] 
https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries
 

Hope this helps ,
Kevin

On 7/10/18, 11:37, "Bryan Bende"  wrote:

You can do nested versioning since the beginning in 0.1.0.

A common scenario might be to have several teams build different
versioned flows, and then someone who is in charge of deploying them
will create another version PG that combines the nested versioned
process groups of each of these teams.

The outer versioned flow in registry does not fully contain the
others, just pointers to the actual versioned flows, which technically
could come from another registry if desired.

On Tue, Jul 10, 2018 at 11:22 AM, Boris Tyukin  
wrote:
> thanks Bryan. I saw your blog post on that. I think with registry 0.1 it 
was
> not possible to version nested PGs within parent PGs so I could not have
> "templatized" PG which has it is own version and use that PG with other
> versioned PGs. Has it changed with registry 0.2 now out?
>
> On Tue, Jul 10, 2018 at 11:08 AM Bryan Bende  wrote:
>>
>> Boris,
>>
>> Regarding templates being limited... templates were really made as a
>> way to share example flows, or help with debugging if you need to send
>> someone your flow. Unfortunately they turned into a deployment
>> mechanism since there wasn't a better solution at the time.
>>
>> Using NiFi Registry should now be the preferred solution, and you can
>> sync changes in-place. You can have many versioned process groups tied
>> to the same versioned flow in a registry and update them all.
>>
>> Thanks,
>>
>> Bryan
>>
>>
>> On Tue, Jul 10, 2018 at 10:50 AM, Boris Tyukin 
>> wrote:
>> > I like Ed's recommendations and doing 

Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Bryan Bende
You can do nested versioning since the beginning in 0.1.0.

A common scenario might be to have several teams build different
versioned flows, and then someone who is in charge of deploying them
will create another version PG that combines the nested versioned
process groups of each of these teams.

The outer versioned flow in registry does not fully contain the
others, just pointers to the actual versioned flows, which technically
could come from another registry if desired.

On Tue, Jul 10, 2018 at 11:22 AM, Boris Tyukin  wrote:
> thanks Bryan. I saw your blog post on that. I think with registry 0.1 it was
> not possible to version nested PGs within parent PGs so I could not have
> "templatized" PG which has it is own version and use that PG with other
> versioned PGs. Has it changed with registry 0.2 now out?
>
> On Tue, Jul 10, 2018 at 11:08 AM Bryan Bende  wrote:
>>
>> Boris,
>>
>> Regarding templates being limited... templates were really made as a
>> way to share example flows, or help with debugging if you need to send
>> someone your flow. Unfortunately they turned into a deployment
>> mechanism since there wasn't a better solution at the time.
>>
>> Using NiFi Registry should now be the preferred solution, and you can
>> sync changes in-place. You can have many versioned process groups tied
>> to the same versioned flow in a registry and update them all.
>>
>> Thanks,
>>
>> Bryan
>>
>>
>> On Tue, Jul 10, 2018 at 10:50 AM, Boris Tyukin 
>> wrote:
>> > I like Ed's recommendations and doing something similar. I use ISPs for
>> > some
>> > repetitive tasks, used in multiple places / flows. Unfortunately, NiFi
>> > templates are very limited in use for that purpose (you can only
>> > import/export them but cannot sync changes in them across flows).
>> >
>> > Wanted to use Python, but realized that it was not a good idea because
>> > NiFi
>> > was using Jython that comes with a bunch of limitations. Groovy was
>> > really
>> > easy to learn for basic scripting.
>> >
>> > Boris
>> >
>> > On Tue, Jul 10, 2018 at 10:00 AM Ed B  wrote:
>> >>
>> >> Hi James,
>> >>
>> >> I have implemented couple of custom processors, python- and
>> >> groovy-based
>> >> ISP and ES, and obviously implementations using miles-long flows.
>> >> There are couple of aspects: development, deployment and maintenance.
>> >> Our client, consider "code change" when you need to deploy new file on
>> >> Linux box, so if you write anything in NIFI - it's not a code change,
>> >> and
>> >> doesn't require complex process of deployment activities, so we stick
>> >> with
>> >> ISP/ES unless it's not possible at all without Java (custom processor).
>> >> Try to avoid Python, as ES is terribly slow with Python (actually,
>> >> Jython).
>> >> If your team won't be able to support your code, but they are OK with
>> >> NIFI
>> >> - avoid ISP/ES/custom processors unless absolutely required. Write long
>> >> flows, structure them well in PGs, etc. Because any code requires
>> >> maintenance.
>> >> If you are writing more like a reusable component - then obviously
>> >> single
>> >> component (ISP/ES or custom processor) will make much more sense, as
>> >> number
>> >> of processors (duplicated by number of usages in different flows) will
>> >> have
>> >> an impact on memory and cpu.
>> >> There are more, but these are my main considerations.
>> >> Hope this helps.
>> >>
>> >> Regards,
>> >> Ed.
>> >>
>> >> On Mon, Jul 9, 2018 at 3:30 PM James Srinivasan
>> >>  wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> I was wondering if there is any general guidance about when to use
>> >>> ExecuteScript and when to use a chain of processors? For example, in
>> >>> one application I am downloading a HTML index file, extracting the
>> >>> links corresponding to more index pages of data per year, fetching
>> >>> those pages, extracting some more links per month and then downloading
>> >>> the results. I'm currently doing this with a bunch of NiFi processors
>> >>> (about 20 in total), whereas I could replace them all by a single
>> >>> fairly simple Python or Groovy script called by ExecuteScript.
>> >>>
>> >>> In another application, I have written a custom processor but probably
>> >>> could have written the same code in a script.
>> >>>
>> >>> Any guidance on how to choose between the three options would be much
>> >>> appreciated (yay for choices!)
>> >>>
>> >>> Thanks very much,
>> >>>
>> >>> James


Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Boris Tyukin
thanks Bryan. I saw your blog post on that. I think with registry 0.1 it
was not possible to version nested PGs within parent PGs so I could not
have "templatized" PG which has it is own version and use that PG with
other versioned PGs. Has it changed with registry 0.2 now out?

On Tue, Jul 10, 2018 at 11:08 AM Bryan Bende  wrote:

> Boris,
>
> Regarding templates being limited... templates were really made as a
> way to share example flows, or help with debugging if you need to send
> someone your flow. Unfortunately they turned into a deployment
> mechanism since there wasn't a better solution at the time.
>
> Using NiFi Registry should now be the preferred solution, and you can
> sync changes in-place. You can have many versioned process groups tied
> to the same versioned flow in a registry and update them all.
>
> Thanks,
>
> Bryan
>
>
> On Tue, Jul 10, 2018 at 10:50 AM, Boris Tyukin 
> wrote:
> > I like Ed's recommendations and doing something similar. I use ISPs for
> some
> > repetitive tasks, used in multiple places / flows. Unfortunately, NiFi
> > templates are very limited in use for that purpose (you can only
> > import/export them but cannot sync changes in them across flows).
> >
> > Wanted to use Python, but realized that it was not a good idea because
> NiFi
> > was using Jython that comes with a bunch of limitations. Groovy was
> really
> > easy to learn for basic scripting.
> >
> > Boris
> >
> > On Tue, Jul 10, 2018 at 10:00 AM Ed B  wrote:
> >>
> >> Hi James,
> >>
> >> I have implemented couple of custom processors, python- and groovy-based
> >> ISP and ES, and obviously implementations using miles-long flows.
> >> There are couple of aspects: development, deployment and maintenance.
> >> Our client, consider "code change" when you need to deploy new file on
> >> Linux box, so if you write anything in NIFI - it's not a code change,
> and
> >> doesn't require complex process of deployment activities, so we stick
> with
> >> ISP/ES unless it's not possible at all without Java (custom processor).
> >> Try to avoid Python, as ES is terribly slow with Python (actually,
> >> Jython).
> >> If your team won't be able to support your code, but they are OK with
> NIFI
> >> - avoid ISP/ES/custom processors unless absolutely required. Write long
> >> flows, structure them well in PGs, etc. Because any code requires
> >> maintenance.
> >> If you are writing more like a reusable component - then obviously
> single
> >> component (ISP/ES or custom processor) will make much more sense, as
> number
> >> of processors (duplicated by number of usages in different flows) will
> have
> >> an impact on memory and cpu.
> >> There are more, but these are my main considerations.
> >> Hope this helps.
> >>
> >> Regards,
> >> Ed.
> >>
> >> On Mon, Jul 9, 2018 at 3:30 PM James Srinivasan
> >>  wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I was wondering if there is any general guidance about when to use
> >>> ExecuteScript and when to use a chain of processors? For example, in
> >>> one application I am downloading a HTML index file, extracting the
> >>> links corresponding to more index pages of data per year, fetching
> >>> those pages, extracting some more links per month and then downloading
> >>> the results. I'm currently doing this with a bunch of NiFi processors
> >>> (about 20 in total), whereas I could replace them all by a single
> >>> fairly simple Python or Groovy script called by ExecuteScript.
> >>>
> >>> In another application, I have written a custom processor but probably
> >>> could have written the same code in a script.
> >>>
> >>> Any guidance on how to choose between the three options would be much
> >>> appreciated (yay for choices!)
> >>>
> >>> Thanks very much,
> >>>
> >>> James
>


Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Mike Thomsen
As a rule of thumb, I would strongly suggest using the scripting
capabilities wherever something "feels" like a script and you neither need
the best possible performance nor to bring in new Java dependencies. If it
is more of a core component of your business logic, needs to be thoroughly
tested, etc. then building a custom processor bundle probably makes a lot
of sense to lower risk and get the best performance (compiled
Java/Scala/Groovy code vs a script is no contest most of the time).

On Mon, Jul 9, 2018 at 3:30 PM James Srinivasan 
wrote:

> Hi all,
>
> I was wondering if there is any general guidance about when to use
> ExecuteScript and when to use a chain of processors? For example, in
> one application I am downloading a HTML index file, extracting the
> links corresponding to more index pages of data per year, fetching
> those pages, extracting some more links per month and then downloading
> the results. I'm currently doing this with a bunch of NiFi processors
> (about 20 in total), whereas I could replace them all by a single
> fairly simple Python or Groovy script called by ExecuteScript.
>
> In another application, I have written a custom processor but probably
> could have written the same code in a script.
>
> Any guidance on how to choose between the three options would be much
> appreciated (yay for choices!)
>
> Thanks very much,
>
> James
>


Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Bryan Bende
Boris,

Regarding templates being limited... templates were really made as a
way to share example flows, or help with debugging if you need to send
someone your flow. Unfortunately they turned into a deployment
mechanism since there wasn't a better solution at the time.

Using NiFi Registry should now be the preferred solution, and you can
sync changes in-place. You can have many versioned process groups tied
to the same versioned flow in a registry and update them all.

Thanks,

Bryan


On Tue, Jul 10, 2018 at 10:50 AM, Boris Tyukin  wrote:
> I like Ed's recommendations and doing something similar. I use ISPs for some
> repetitive tasks, used in multiple places / flows. Unfortunately, NiFi
> templates are very limited in use for that purpose (you can only
> import/export them but cannot sync changes in them across flows).
>
> Wanted to use Python, but realized that it was not a good idea because NiFi
> was using Jython that comes with a bunch of limitations. Groovy was really
> easy to learn for basic scripting.
>
> Boris
>
> On Tue, Jul 10, 2018 at 10:00 AM Ed B  wrote:
>>
>> Hi James,
>>
>> I have implemented couple of custom processors, python- and groovy-based
>> ISP and ES, and obviously implementations using miles-long flows.
>> There are couple of aspects: development, deployment and maintenance.
>> Our client, consider "code change" when you need to deploy new file on
>> Linux box, so if you write anything in NIFI - it's not a code change, and
>> doesn't require complex process of deployment activities, so we stick with
>> ISP/ES unless it's not possible at all without Java (custom processor).
>> Try to avoid Python, as ES is terribly slow with Python (actually,
>> Jython).
>> If your team won't be able to support your code, but they are OK with NIFI
>> - avoid ISP/ES/custom processors unless absolutely required. Write long
>> flows, structure them well in PGs, etc. Because any code requires
>> maintenance.
>> If you are writing more like a reusable component - then obviously single
>> component (ISP/ES or custom processor) will make much more sense, as number
>> of processors (duplicated by number of usages in different flows) will have
>> an impact on memory and cpu.
>> There are more, but these are my main considerations.
>> Hope this helps.
>>
>> Regards,
>> Ed.
>>
>> On Mon, Jul 9, 2018 at 3:30 PM James Srinivasan
>>  wrote:
>>>
>>> Hi all,
>>>
>>> I was wondering if there is any general guidance about when to use
>>> ExecuteScript and when to use a chain of processors? For example, in
>>> one application I am downloading a HTML index file, extracting the
>>> links corresponding to more index pages of data per year, fetching
>>> those pages, extracting some more links per month and then downloading
>>> the results. I'm currently doing this with a bunch of NiFi processors
>>> (about 20 in total), whereas I could replace them all by a single
>>> fairly simple Python or Groovy script called by ExecuteScript.
>>>
>>> In another application, I have written a custom processor but probably
>>> could have written the same code in a script.
>>>
>>> Any guidance on how to choose between the three options would be much
>>> appreciated (yay for choices!)
>>>
>>> Thanks very much,
>>>
>>> James


Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Boris Tyukin
I like Ed's recommendations and doing something similar. I use ISPs for
some repetitive tasks, used in multiple places / flows. Unfortunately, NiFi
templates are very limited in use for that purpose (you can only
import/export them but cannot sync changes in them across flows).

Wanted to use Python, but realized that it was not a good idea because NiFi
was using Jython that comes with a bunch of limitations. Groovy was really
easy to learn for basic scripting.

Boris

On Tue, Jul 10, 2018 at 10:00 AM Ed B  wrote:

> Hi James,
>
> I have implemented couple of custom processors, python- and groovy-based
> ISP and ES, and obviously implementations using miles-long flows.
> There are couple of aspects: development, deployment and maintenance.
> Our client, consider "code change" when you need to deploy new file on
> Linux box, so if you write anything in NIFI - it's not a code change, and
> doesn't require complex process of deployment activities, so we stick with
> ISP/ES unless it's not possible at all without Java (custom processor).
> Try to avoid Python, as ES is terribly slow with Python (actually, Jython).
> If your team won't be able to support your code, but they are OK with NIFI
> - avoid ISP/ES/custom processors unless absolutely required. Write long
> flows, structure them well in PGs, etc. Because any code requires
> maintenance.
> If you are writing more like a reusable component - then obviously single
> component (ISP/ES or custom processor) will make much more sense, as number
> of processors (duplicated by number of usages in different flows) will have
> an impact on memory and cpu.
> There are more, but these are my main considerations.
> Hope this helps.
>
> Regards,
> Ed.
>
> On Mon, Jul 9, 2018 at 3:30 PM James Srinivasan <
> james.sriniva...@gmail.com> wrote:
>
>> Hi all,
>>
>> I was wondering if there is any general guidance about when to use
>> ExecuteScript and when to use a chain of processors? For example, in
>> one application I am downloading a HTML index file, extracting the
>> links corresponding to more index pages of data per year, fetching
>> those pages, extracting some more links per month and then downloading
>> the results. I'm currently doing this with a bunch of NiFi processors
>> (about 20 in total), whereas I could replace them all by a single
>> fairly simple Python or Groovy script called by ExecuteScript.
>>
>> In another application, I have written a custom processor but probably
>> could have written the same code in a script.
>>
>> Any guidance on how to choose between the three options would be much
>> appreciated (yay for choices!)
>>
>> Thanks very much,
>>
>> James
>>
>


Re: NiFi ExecuteScript vs multiple processors vs custom processor

2018-07-10 Thread Ed B
Hi James,

I have implemented couple of custom processors, python- and groovy-based
ISP and ES, and obviously implementations using miles-long flows.
There are couple of aspects: development, deployment and maintenance.
Our client, consider "code change" when you need to deploy new file on
Linux box, so if you write anything in NIFI - it's not a code change, and
doesn't require complex process of deployment activities, so we stick with
ISP/ES unless it's not possible at all without Java (custom processor).
Try to avoid Python, as ES is terribly slow with Python (actually, Jython).
If your team won't be able to support your code, but they are OK with NIFI
- avoid ISP/ES/custom processors unless absolutely required. Write long
flows, structure them well in PGs, etc. Because any code requires
maintenance.
If you are writing more like a reusable component - then obviously single
component (ISP/ES or custom processor) will make much more sense, as number
of processors (duplicated by number of usages in different flows) will have
an impact on memory and cpu.
There are more, but these are my main considerations.
Hope this helps.

Regards,
Ed.

On Mon, Jul 9, 2018 at 3:30 PM James Srinivasan 
wrote:

> Hi all,
>
> I was wondering if there is any general guidance about when to use
> ExecuteScript and when to use a chain of processors? For example, in
> one application I am downloading a HTML index file, extracting the
> links corresponding to more index pages of data per year, fetching
> those pages, extracting some more links per month and then downloading
> the results. I'm currently doing this with a bunch of NiFi processors
> (about 20 in total), whereas I could replace them all by a single
> fairly simple Python or Groovy script called by ExecuteScript.
>
> In another application, I have written a custom processor but probably
> could have written the same code in a script.
>
> Any guidance on how to choose between the three options would be much
> appreciated (yay for choices!)
>
> Thanks very much,
>
> James
>