Re: Use a different runtime

2019-07-12 Thread Andy LoPresto
Mike, 

The short answer is “this is not as simple as replacing one class”. The longer 
answer is that you may be interested in the work by Mark Payne and Sam Hjemfelt 
to provide an additional runtime in NiFi (referred to as stateless NiFi [1]). 
Hope this helps. 

[1] https://github.com/apache/nifi/pull/3241 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jul 12, 2019, at 12:12 PM, HWANG, MICHAEL (MICHAEL) 
>  wrote:
> 
> Hello
> 
> I want to use Nifi's UI and APIs (i.e. registry) but not use the Nifi 
> scheduler, controller, runtime.  I want to prototype hooking in my own 
> scheduler, controller, runtime.  How involved would this be?  Would it be 
> more involved than replacing the class 
> "org.apache.nifi.web.controller.ControllerFacade" in the nifi-web-api 
> sub-project with my own version?
> 
> I like Nifi's concepts, defined resources, and UI user experience and 
> unfortunately we have legacy runtime elements like an orchestrator that must 
> be continued to be used.  It's part of my requirement.
> 
> Any thoughts, guidance is most appreciated.
> 
> Thanks again
> 
> Mike



Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Andy LoPresto
Adam,

I think your statements about “definitely a total sum of work that is greater” 
are true for a specific audience. As someone who routinely reviews PRs and 
handles release management tasks, I know where I would like to see 
improvements. I also write feature code (though not as often lately), but I 
think I’ve experienced the “contributor” role enough to be able to balance my 
expectations of required work. 

And yes, I’d much rather be responsible for routinely releasing nifi-commons 
with the self-contained security frameworks and services on a routine cadence 
and only having to worry about the constrained feature set and repeatable tests 
than RM the entire application framework once every 3-4 months and have to wait 
for that. 

Jeff,

I think your entire third paragraph (was about to quote a line but the whole 
thing is great) is exactly where I am. Complex changes require complex thought 
from the contributor and from the reviewers. Smaller changes are easier to 
write correctly, review intelligently, and merge quickly. I accept the 
tradeoffs that requires. 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jul 12, 2019, at 12:33 PM, Jeff  wrote:
> 
> Adam,
> 
> To your point, we currently have the situation you described when
> dependencies exist between NiFi and NiFi Registry, where one must be
> released before the other.
> 
> In my opinion, it is better to be able to verify functionality in smaller
> atomic change-sets and limited scope over a larger change-set.  It may
> initially delay the release windows to coordinate across multiple
> repositories, but over the long term I think it will save more time.  It is
> generally the case that spending more time upfront saves more time in the
> long run.  The same can be said for a mono-repository; split PRs that
> address multiple areas/modules of code into smaller PRs for easier review.
> Both methods encourage the developers and reviewers to think critically
> about the changes being applied.  In a multi-repository setup, that
> critical thinking is basically forced upon the developers and reviewers,
> and I think that is a good thing for the quality of the software.
> 
> Regarding the raising or lowering of the bar for contribution, we should
> find a balance that encourages contribution quality.  Splitting our modules
> across multiple repositories may raise the bar for features or bug fixes
> that span repositories, but a contributor requires knowledge of those
> multiple modules regardless in which those modules live. If a change is
> being made to multiple modules that span more than one repository the only
> impacts to the current workflow are multiple PRs instead of a single PR,
> and the trade-off between shorter per-PR review time of multiple smaller
> PRs verses longer review time for a single larger PR.
> 
> 
> 
> On Fri, Jul 12, 2019 at 3:12 PM Adam Taft  wrote:
> 
>> Andy - fair points. Note that by definition, the process you describe is
>> harder (requires more maneuvers).  Maybe it's warranted/justified for the
>> desired integrity that you are after, but it's most definitely a total sum
>> of work that is greater.
>> 
>> Your registry example is really good.  In your example, you are proposing a
>> change to the framework and commons repositories before a change to the
>> registry can be finalized.  You'd need the changes to framework and commons
>> to "land" and become released before the final change to the registry was
>> committed.  You'd end up with a small release queued up for the framework
>> (whose release cycle is mostly infrequent) and you wouldn't be able to
>> finish the work on the registry changes until that new function was
>> releasable.  The ability to mark that JIRA ticket as "closed" is delayed
>> because you are waiting for releases from dependent components.
>> 
>> Of course, you can build/test against -SNAPSHOT versions in each of those
>> repositories (which is what Bryan was getting to).  But the registry
>> feature itself can't be totally finalized and is waiting on the release
>> cycle of the slowest of the components.  There are definitely tradeoffs
>> with this direction.
>> 
>> 
>> On Fri, Jul 12, 2019 at 12:42 PM Andy LoPresto 
>> wrote:
>> 
>>> I think by definition, a contribution _must_ fit into a single
>> repository.
>>> This will force developers to carefully consider the boundaries between
>>> modules and build clean abstractions. If you are a new contributor, I
>> would
>>> be surprised if you are making a single (logical) contribution that would
>>> span multiple repositories on the first go. I think enforcing clear
>>> divisions is good for both new and experienced contributors. I also
>> think a
>>> change that requires contributions to multiple repositories should be
>>> subdivided into atomic tasks.
>>> 
>>> For example, if someone wants to contribute a new feature to
>> nifi-registry
>>> 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Jeff
Adam,

To your point, we currently have the situation you described when
dependencies exist between NiFi and NiFi Registry, where one must be
released before the other.

In my opinion, it is better to be able to verify functionality in smaller
atomic change-sets and limited scope over a larger change-set.  It may
initially delay the release windows to coordinate across multiple
repositories, but over the long term I think it will save more time.  It is
generally the case that spending more time upfront saves more time in the
long run.  The same can be said for a mono-repository; split PRs that
address multiple areas/modules of code into smaller PRs for easier review.
Both methods encourage the developers and reviewers to think critically
about the changes being applied.  In a multi-repository setup, that
critical thinking is basically forced upon the developers and reviewers,
and I think that is a good thing for the quality of the software.

Regarding the raising or lowering of the bar for contribution, we should
find a balance that encourages contribution quality.  Splitting our modules
across multiple repositories may raise the bar for features or bug fixes
that span repositories, but a contributor requires knowledge of those
multiple modules regardless in which those modules live. If a change is
being made to multiple modules that span more than one repository the only
impacts to the current workflow are multiple PRs instead of a single PR,
and the trade-off between shorter per-PR review time of multiple smaller
PRs verses longer review time for a single larger PR.



On Fri, Jul 12, 2019 at 3:12 PM Adam Taft  wrote:

> Andy - fair points. Note that by definition, the process you describe is
> harder (requires more maneuvers).  Maybe it's warranted/justified for the
> desired integrity that you are after, but it's most definitely a total sum
> of work that is greater.
>
> Your registry example is really good.  In your example, you are proposing a
> change to the framework and commons repositories before a change to the
> registry can be finalized.  You'd need the changes to framework and commons
> to "land" and become released before the final change to the registry was
> committed.  You'd end up with a small release queued up for the framework
> (whose release cycle is mostly infrequent) and you wouldn't be able to
> finish the work on the registry changes until that new function was
> releasable.  The ability to mark that JIRA ticket as "closed" is delayed
> because you are waiting for releases from dependent components.
>
> Of course, you can build/test against -SNAPSHOT versions in each of those
> repositories (which is what Bryan was getting to).  But the registry
> feature itself can't be totally finalized and is waiting on the release
> cycle of the slowest of the components.  There are definitely tradeoffs
> with this direction.
>
>
> On Fri, Jul 12, 2019 at 12:42 PM Andy LoPresto 
> wrote:
>
> > I think by definition, a contribution _must_ fit into a single
> repository.
> > This will force developers to carefully consider the boundaries between
> > modules and build clean abstractions. If you are a new contributor, I
> would
> > be surprised if you are making a single (logical) contribution that would
> > span multiple repositories on the first go. I think enforcing clear
> > divisions is good for both new and experienced contributors. I also
> think a
> > change that requires contributions to multiple repositories should be
> > subdivided into atomic tasks.
> >
> > For example, if someone wants to contribute a new feature to
> nifi-registry
> > which also requires changes to nifi-commons for the security piece and
> adds
> > new behavior to the nifi-framework component to consume new changes from
> > Registry, in my mind those are actually 3 atomic changes which, while
> > related and interdependent, can all be contributed as standalone code to
> > their respective repositories in an ordered fashion. I would prefer this
> > over one large commit to a single repository which influences behavior in
> > all three modules and requires one or more reviewers with comprehensive
> > knowledge over all aspects of the project.
> >
> >
> > Andy LoPresto
> > alopre...@apache.org
> > alopresto.apa...@gmail.com
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> > > On Jul 12, 2019, at 10:49 AM, Adam Taft  wrote:
> > >
> > > Bryan,
> > >
> > > I think both of your points are actually closely related, and they
> > somewhat
> > > speak to my thoughts/concerns about splitting the repository.
> > >
> > > I would argue that one PR that affects multiple modules in a single
> > > repository is _easier_ to review than multiple PRs that affect single
> > > modules.  In the split repository model, if a change affects several
> > > repositories, individual PRs would be issued against each repository.
> A
> > > reviewer would not as easily see the context of the changes and may
> even
> > > 

Use a different runtime

2019-07-12 Thread HWANG, MICHAEL (MICHAEL)
Hello

I want to use Nifi's UI and APIs (i.e. registry) but not use the Nifi 
scheduler, controller, runtime.  I want to prototype hooking in my own 
scheduler, controller, runtime.  How involved would this be?  Would it be more 
involved than replacing the class 
"org.apache.nifi.web.controller.ControllerFacade" in the nifi-web-api 
sub-project with my own version?

I like Nifi's concepts, defined resources, and UI user experience and 
unfortunately we have legacy runtime elements like an orchestrator that must be 
continued to be used.  It's part of my requirement.

Any thoughts, guidance is most appreciated.

Thanks again

Mike


Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft
Andy - fair points. Note that by definition, the process you describe is
harder (requires more maneuvers).  Maybe it's warranted/justified for the
desired integrity that you are after, but it's most definitely a total sum
of work that is greater.

Your registry example is really good.  In your example, you are proposing a
change to the framework and commons repositories before a change to the
registry can be finalized.  You'd need the changes to framework and commons
to "land" and become released before the final change to the registry was
committed.  You'd end up with a small release queued up for the framework
(whose release cycle is mostly infrequent) and you wouldn't be able to
finish the work on the registry changes until that new function was
releasable.  The ability to mark that JIRA ticket as "closed" is delayed
because you are waiting for releases from dependent components.

Of course, you can build/test against -SNAPSHOT versions in each of those
repositories (which is what Bryan was getting to).  But the registry
feature itself can't be totally finalized and is waiting on the release
cycle of the slowest of the components.  There are definitely tradeoffs
with this direction.


On Fri, Jul 12, 2019 at 12:42 PM Andy LoPresto  wrote:

> I think by definition, a contribution _must_ fit into a single repository.
> This will force developers to carefully consider the boundaries between
> modules and build clean abstractions. If you are a new contributor, I would
> be surprised if you are making a single (logical) contribution that would
> span multiple repositories on the first go. I think enforcing clear
> divisions is good for both new and experienced contributors. I also think a
> change that requires contributions to multiple repositories should be
> subdivided into atomic tasks.
>
> For example, if someone wants to contribute a new feature to nifi-registry
> which also requires changes to nifi-commons for the security piece and adds
> new behavior to the nifi-framework component to consume new changes from
> Registry, in my mind those are actually 3 atomic changes which, while
> related and interdependent, can all be contributed as standalone code to
> their respective repositories in an ordered fashion. I would prefer this
> over one large commit to a single repository which influences behavior in
> all three modules and requires one or more reviewers with comprehensive
> knowledge over all aspects of the project.
>
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Jul 12, 2019, at 10:49 AM, Adam Taft  wrote:
> >
> > Bryan,
> >
> > I think both of your points are actually closely related, and they
> somewhat
> > speak to my thoughts/concerns about splitting the repository.
> >
> > I would argue that one PR that affects multiple modules in a single
> > repository is _easier_ to review than multiple PRs that affect single
> > modules.  In the split repository model, if a change affects several
> > repositories, individual PRs would be issued against each repository.  A
> > reviewer would not as easily see the context of the changes and may even
> > consider them out of order.
> >
> > In the single repository model, a PR is atomic. There is no race
> condition,
> > ordering or loss of context across multiple repositories.
> >
> > This is the concern I was making for new contributors.  If your
> > contribution doesn't fit neatly into a single repository, then it's quite
> > the tough process to communicate and deal with changes. It will
> discourage
> > new folks from being involved, because the contribution barrier is
> raised.
> >
> > It's ideal that changesets are atomic, but you definitely lose this
> > property in a multi-repo scenario.  Imagine rolling back a change, for
> > example, that spans multiple repositories.
> >
> > Adam
> >
> > On Fri, Jul 12, 2019 at 11:27 AM Bryan Bende  wrote:
> >
> >> Two other points to throw out there...
> >>
> >> 1) I think something to consider is how the management of pull
> >> requests would be impacted, since that is the main form of
> >> contribution.
> >>
> >> Separate repos forces pull requests to stay scoped to a given module,
> >> making for more straight forward reviews. It also makes it easier to
> >> look at a repo and see what work/contributions are still open,
> >> although I suppose all the PRs in the nifi repo could be labeled by
> >> module and then filtered, but it seems a little more tedious. Just
> >> something to think about.
> >>
> >>
> >> 2) We should also consider how we plan to handle changes across modules.
> >>
> >> As an example, currently we have nifi and nifi-registry in separate
> >> repos, and nifi depends on nifi-registry, but nifi master always stays
> >> on the last release version of nifi-registry.
> >>
> >> So if you are working on a change across both projects, the process is
> >> something like the following...
> >>
> >> - 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Andy LoPresto
I think by definition, a contribution _must_ fit into a single repository. This 
will force developers to carefully consider the boundaries between modules and 
build clean abstractions. If you are a new contributor, I would be surprised if 
you are making a single (logical) contribution that would span multiple 
repositories on the first go. I think enforcing clear divisions is good for 
both new and experienced contributors. I also think a change that requires 
contributions to multiple repositories should be subdivided into atomic tasks. 

For example, if someone wants to contribute a new feature to nifi-registry 
which also requires changes to nifi-commons for the security piece and adds new 
behavior to the nifi-framework component to consume new changes from Registry, 
in my mind those are actually 3 atomic changes which, while related and 
interdependent, can all be contributed as standalone code to their respective 
repositories in an ordered fashion. I would prefer this over one large commit 
to a single repository which influences behavior in all three modules and 
requires one or more reviewers with comprehensive knowledge over all aspects of 
the project. 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jul 12, 2019, at 10:49 AM, Adam Taft  wrote:
> 
> Bryan,
> 
> I think both of your points are actually closely related, and they somewhat
> speak to my thoughts/concerns about splitting the repository.
> 
> I would argue that one PR that affects multiple modules in a single
> repository is _easier_ to review than multiple PRs that affect single
> modules.  In the split repository model, if a change affects several
> repositories, individual PRs would be issued against each repository.  A
> reviewer would not as easily see the context of the changes and may even
> consider them out of order.
> 
> In the single repository model, a PR is atomic. There is no race condition,
> ordering or loss of context across multiple repositories.
> 
> This is the concern I was making for new contributors.  If your
> contribution doesn't fit neatly into a single repository, then it's quite
> the tough process to communicate and deal with changes. It will discourage
> new folks from being involved, because the contribution barrier is raised.
> 
> It's ideal that changesets are atomic, but you definitely lose this
> property in a multi-repo scenario.  Imagine rolling back a change, for
> example, that spans multiple repositories.
> 
> Adam
> 
> On Fri, Jul 12, 2019 at 11:27 AM Bryan Bende  wrote:
> 
>> Two other points to throw out there...
>> 
>> 1) I think something to consider is how the management of pull
>> requests would be impacted, since that is the main form of
>> contribution.
>> 
>> Separate repos forces pull requests to stay scoped to a given module,
>> making for more straight forward reviews. It also makes it easier to
>> look at a repo and see what work/contributions are still open,
>> although I suppose all the PRs in the nifi repo could be labeled by
>> module and then filtered, but it seems a little more tedious. Just
>> something to think about.
>> 
>> 
>> 2) We should also consider how we plan to handle changes across modules.
>> 
>> As an example, currently we have nifi and nifi-registry in separate
>> repos, and nifi depends on nifi-registry, but nifi master always stays
>> on the last release version of nifi-registry.
>> 
>> So if you are working on a change across both projects, the process is
>> something like the following...
>> 
>> - Make change in nifi-registry and run a Maven install locally
>> - Change nifi pom to the snapshot version of nifi-registry
>> - Make changes in nifi and stage them in a branch, possibly a draft PR
>> that can't be merged yet
>> - nifi-registry gets released
>> - Put up a PR for the nifi work, bumping the nifi-registry version to
>> released version
>> 
>> I have no issue continuing to work like this, as long as we accept
>> that the complexity of these scenarios will increase with more
>> modules.
>> 
>> An alternative approach would be to allow master of each module to
>> depend on a snapshot of a dependent module. For example, the nifi PR
>> above could be merged before nifi-registry is ever released. It lets
>> the work proceed instead of letting these draft changes build up, and
>> it forces the dependency chain of releases to occur since now you
>> can't release nifi master until nifi-registry is released. The
>> downside is it requires everyone to locally build all the snapshot
>> modules to get the latest changes, even if they aren't working on
>> those other modules, unless there is a way for them to be provided
>> through an apache infra build process.
>> 
>> This second point is less about mono vs multi repo, and more about how
>> to manage development of a change that requires modifying several
>> dependent modules.
>> 
>> On Fri, Jul 12, 2019 at 1:05 PM Kevin 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft
Bryan,

I think both of your points are actually closely related, and they somewhat
speak to my thoughts/concerns about splitting the repository.

I would argue that one PR that affects multiple modules in a single
repository is _easier_ to review than multiple PRs that affect single
modules.  In the split repository model, if a change affects several
repositories, individual PRs would be issued against each repository.  A
reviewer would not as easily see the context of the changes and may even
consider them out of order.

In the single repository model, a PR is atomic. There is no race condition,
ordering or loss of context across multiple repositories.

This is the concern I was making for new contributors.  If your
contribution doesn't fit neatly into a single repository, then it's quite
the tough process to communicate and deal with changes. It will discourage
new folks from being involved, because the contribution barrier is raised.

It's ideal that changesets are atomic, but you definitely lose this
property in a multi-repo scenario.  Imagine rolling back a change, for
example, that spans multiple repositories.

Adam

On Fri, Jul 12, 2019 at 11:27 AM Bryan Bende  wrote:

> Two other points to throw out there...
>
> 1) I think something to consider is how the management of pull
> requests would be impacted, since that is the main form of
> contribution.
>
> Separate repos forces pull requests to stay scoped to a given module,
> making for more straight forward reviews. It also makes it easier to
> look at a repo and see what work/contributions are still open,
> although I suppose all the PRs in the nifi repo could be labeled by
> module and then filtered, but it seems a little more tedious. Just
> something to think about.
>
>
> 2) We should also consider how we plan to handle changes across modules.
>
> As an example, currently we have nifi and nifi-registry in separate
> repos, and nifi depends on nifi-registry, but nifi master always stays
> on the last release version of nifi-registry.
>
> So if you are working on a change across both projects, the process is
> something like the following...
>
> - Make change in nifi-registry and run a Maven install locally
> - Change nifi pom to the snapshot version of nifi-registry
> - Make changes in nifi and stage them in a branch, possibly a draft PR
> that can't be merged yet
> - nifi-registry gets released
> - Put up a PR for the nifi work, bumping the nifi-registry version to
> released version
>
> I have no issue continuing to work like this, as long as we accept
> that the complexity of these scenarios will increase with more
> modules.
>
> An alternative approach would be to allow master of each module to
> depend on a snapshot of a dependent module. For example, the nifi PR
> above could be merged before nifi-registry is ever released. It lets
> the work proceed instead of letting these draft changes build up, and
> it forces the dependency chain of releases to occur since now you
> can't release nifi master until nifi-registry is released. The
> downside is it requires everyone to locally build all the snapshot
> modules to get the latest changes, even if they aren't working on
> those other modules, unless there is a way for them to be provided
> through an apache infra build process.
>
> This second point is less about mono vs multi repo, and more about how
> to manage development of a change that requires modifying several
> dependent modules.
>
> On Fri, Jul 12, 2019 at 1:05 PM Kevin Doran  wrote:
> >
> > Thanks Adam and Edward! This is exactly the type of discussion I was
> > hoping a detailed and specific proposal would generate, so thanks for
> > the input. I'll reply to each of you in turn:
> >
> > Adam,
> >
> > It is true that a repo-per-project approach is not required. I've
> > worked on projects that do it both ways and there are advantages to
> > both.
> >
> > Single-repo was considered, but as one of the primary goals is to cut
> > down on Travis / CI build times, the mutli-repo approach seemed to
> > have a big advantage. Personally, I've never found a reliable, stable
> > way to introduce CI builds to a repository with multiple projects that
> > did not require building all the projects in the repository. It's
> > possible to try to use commands to determine which files have changed
> > and infer which project(s) to build from that, but maintaining that
> > logic can get messy. If the logic is wrong, it's possible a project
> > that is not built is broken by a PR. Building everything is not an
> > option for a project our size, as our build already time out today.
> > Fast, reliable Travis builds with no false positives / negatives is
> > definitely something NiFi needs, and I think it will be simplest to
> > get there with a multi-repo approach.
> >
> > That said, I agree that the *biggest* win comes from splitting
> > projects, and that splitting repos is a smaller step. I don't feel
> > strongly about it and could live with a 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Bryan Bende
Two other points to throw out there...

1) I think something to consider is how the management of pull
requests would be impacted, since that is the main form of
contribution.

Separate repos forces pull requests to stay scoped to a given module,
making for more straight forward reviews. It also makes it easier to
look at a repo and see what work/contributions are still open,
although I suppose all the PRs in the nifi repo could be labeled by
module and then filtered, but it seems a little more tedious. Just
something to think about.


2) We should also consider how we plan to handle changes across modules.

As an example, currently we have nifi and nifi-registry in separate
repos, and nifi depends on nifi-registry, but nifi master always stays
on the last release version of nifi-registry.

So if you are working on a change across both projects, the process is
something like the following...

- Make change in nifi-registry and run a Maven install locally
- Change nifi pom to the snapshot version of nifi-registry
- Make changes in nifi and stage them in a branch, possibly a draft PR
that can't be merged yet
- nifi-registry gets released
- Put up a PR for the nifi work, bumping the nifi-registry version to
released version

I have no issue continuing to work like this, as long as we accept
that the complexity of these scenarios will increase with more
modules.

An alternative approach would be to allow master of each module to
depend on a snapshot of a dependent module. For example, the nifi PR
above could be merged before nifi-registry is ever released. It lets
the work proceed instead of letting these draft changes build up, and
it forces the dependency chain of releases to occur since now you
can't release nifi master until nifi-registry is released. The
downside is it requires everyone to locally build all the snapshot
modules to get the latest changes, even if they aren't working on
those other modules, unless there is a way for them to be provided
through an apache infra build process.

This second point is less about mono vs multi repo, and more about how
to manage development of a change that requires modifying several
dependent modules.

On Fri, Jul 12, 2019 at 1:05 PM Kevin Doran  wrote:
>
> Thanks Adam and Edward! This is exactly the type of discussion I was
> hoping a detailed and specific proposal would generate, so thanks for
> the input. I'll reply to each of you in turn:
>
> Adam,
>
> It is true that a repo-per-project approach is not required. I've
> worked on projects that do it both ways and there are advantages to
> both.
>
> Single-repo was considered, but as one of the primary goals is to cut
> down on Travis / CI build times, the mutli-repo approach seemed to
> have a big advantage. Personally, I've never found a reliable, stable
> way to introduce CI builds to a repository with multiple projects that
> did not require building all the projects in the repository. It's
> possible to try to use commands to determine which files have changed
> and infer which project(s) to build from that, but maintaining that
> logic can get messy. If the logic is wrong, it's possible a project
> that is not built is broken by a PR. Building everything is not an
> option for a project our size, as our build already time out today.
> Fast, reliable Travis builds with no false positives / negatives is
> definitely something NiFi needs, and I think it will be simplest to
> get there with a multi-repo approach.
>
> That said, I agree that the *biggest* win comes from splitting
> projects, and that splitting repos is a smaller step. I don't feel
> strongly about it and could live with a single repo with multiple
> projects (though, for what it's worth, the NiFi umbrella already has
> several repositories and I personally don't feel it has been
> burdensome).
>
> And I agree - let's not start splitting JIRA projects. Let's use
> components or labels or something to differentiate issues under the
> existing NIFI Jira project.
>
>
> Edward,
>
> Thanks. I totally agree and I know others who feel the same way.
> Better defined boundaries and loosely coupled modules is 100% a
> long-term goal. I think this project restructuring won't solve the
> problem completely (in fact, to your point, it may uncover some
> unfortunate tight-coupling that needs to be reworked on the current
> master before the split can happen), but I do think it will encourage
> developers to more faithfully build to APIs and avoid leaky
> abstractions as there will be more hard division points in the code
> base. Some of those issues might be able to be addressed immediately.
> Others might have to wait for a major version change.
>
> Thanks,
> Kevin
>
> On Fri, Jul 12, 2019 at 1:04 PM Adam Taft  wrote:
> >
> > To be honest and to your point Joe, the thing that optimizes the RM duties
> > should probably be preferred in all of this.  There is so much overhead for
> > the release manager, that lubricating the RM process probably trumps a lot
> 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Kevin Doran
Thanks Adam and Edward! This is exactly the type of discussion I was
hoping a detailed and specific proposal would generate, so thanks for
the input. I'll reply to each of you in turn:

Adam,

It is true that a repo-per-project approach is not required. I've
worked on projects that do it both ways and there are advantages to
both.

Single-repo was considered, but as one of the primary goals is to cut
down on Travis / CI build times, the mutli-repo approach seemed to
have a big advantage. Personally, I've never found a reliable, stable
way to introduce CI builds to a repository with multiple projects that
did not require building all the projects in the repository. It's
possible to try to use commands to determine which files have changed
and infer which project(s) to build from that, but maintaining that
logic can get messy. If the logic is wrong, it's possible a project
that is not built is broken by a PR. Building everything is not an
option for a project our size, as our build already time out today.
Fast, reliable Travis builds with no false positives / negatives is
definitely something NiFi needs, and I think it will be simplest to
get there with a multi-repo approach.

That said, I agree that the *biggest* win comes from splitting
projects, and that splitting repos is a smaller step. I don't feel
strongly about it and could live with a single repo with multiple
projects (though, for what it's worth, the NiFi umbrella already has
several repositories and I personally don't feel it has been
burdensome).

And I agree - let's not start splitting JIRA projects. Let's use
components or labels or something to differentiate issues under the
existing NIFI Jira project.


Edward,

Thanks. I totally agree and I know others who feel the same way.
Better defined boundaries and loosely coupled modules is 100% a
long-term goal. I think this project restructuring won't solve the
problem completely (in fact, to your point, it may uncover some
unfortunate tight-coupling that needs to be reworked on the current
master before the split can happen), but I do think it will encourage
developers to more faithfully build to APIs and avoid leaky
abstractions as there will be more hard division points in the code
base. Some of those issues might be able to be addressed immediately.
Others might have to wait for a major version change.

Thanks,
Kevin

On Fri, Jul 12, 2019 at 1:04 PM Adam Taft  wrote:
>
> To be honest and to your point Joe, the thing that optimizes the RM duties
> should probably be preferred in all of this.  There is so much overhead for
> the release manager, that lubricating the RM process probably trumps a lot
> of my concerns.  I think there's real concern for making the project harder
> for new contributors. But likewise, that concern should be balanced with
> making the project harder for longtime contributors who have pulled the
> cart the most.
>
> I was just at least hoping for a discussion on the concept.  Thanks as
> always for your leadership and contributions to the nifi community.
>
> On Fri, Jul 12, 2019 at 10:48 AM Joe Witt  wrote:
>
> > Ah I agree the JIRA thing would be too heavy handed.  A single JIRA with
> > well defined components tied to 'repos' is good.
> >
> > As far as separate code repos we're talking about different releasable
> > artifacts for which we as a PMC are responsible for the meaning/etc..  As a
> > many time RM I definitely dislike the mono repo construct as I understand
> > it to function.  I prefer repos per source release artifact where all
> > source in that artifact is a function of the release. I am ok with
> > different convenience binaries resulting from a single source release
> > artifact though.
> >
> > Thanks
> >
> > On Fri, Jul 12, 2019 at 12:26 PM Adam Taft  wrote:
> >
> > > I think the concerns around user management are valid, are they not?
> > > Overhead in JIRA goes up (assigning rights to users in JIRA is
> > > multiplied).  Risk to new contributors is high, because each isolated
> > > repository has its own life and code contribution styles.  Maybe the
> > actual
> > > apache infra involvement is low, but the negative effects of community
> > and
> > > source code bifurcation goes up.
> > >
> > > Tagging in mono-repos is done by prefixing the name of the component in
> > the
> > > tag name.  Your release sources are still generated from the component
> > > folder (not from the root).
> > >
> > > Modularization (as being proposed) is a good thing, but can be done in a
> > > single repository. It's not a requirement to split up the git project to
> > > get the benefits of modularization.  That's the point I'm hoping is seen
> > in
> > > this.
> > >
> > >
> > >
> > > On Fri, Jul 12, 2019 at 10:08 AM Joe Witt  wrote:
> > >
> > > > to clarify user management for infra is not a prob.  it is an ldap
> > group.
> > > >
> > > > repo creation is self service as well amd group access is tied to that.
> > > >
> > > > release artifact is the source we produce.  

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft
To be honest and to your point Joe, the thing that optimizes the RM duties
should probably be preferred in all of this.  There is so much overhead for
the release manager, that lubricating the RM process probably trumps a lot
of my concerns.  I think there's real concern for making the project harder
for new contributors. But likewise, that concern should be balanced with
making the project harder for longtime contributors who have pulled the
cart the most.

I was just at least hoping for a discussion on the concept.  Thanks as
always for your leadership and contributions to the nifi community.

On Fri, Jul 12, 2019 at 10:48 AM Joe Witt  wrote:

> Ah I agree the JIRA thing would be too heavy handed.  A single JIRA with
> well defined components tied to 'repos' is good.
>
> As far as separate code repos we're talking about different releasable
> artifacts for which we as a PMC are responsible for the meaning/etc..  As a
> many time RM I definitely dislike the mono repo construct as I understand
> it to function.  I prefer repos per source release artifact where all
> source in that artifact is a function of the release. I am ok with
> different convenience binaries resulting from a single source release
> artifact though.
>
> Thanks
>
> On Fri, Jul 12, 2019 at 12:26 PM Adam Taft  wrote:
>
> > I think the concerns around user management are valid, are they not?
> > Overhead in JIRA goes up (assigning rights to users in JIRA is
> > multiplied).  Risk to new contributors is high, because each isolated
> > repository has its own life and code contribution styles.  Maybe the
> actual
> > apache infra involvement is low, but the negative effects of community
> and
> > source code bifurcation goes up.
> >
> > Tagging in mono-repos is done by prefixing the name of the component in
> the
> > tag name.  Your release sources are still generated from the component
> > folder (not from the root).
> >
> > Modularization (as being proposed) is a good thing, but can be done in a
> > single repository. It's not a requirement to split up the git project to
> > get the benefits of modularization.  That's the point I'm hoping is seen
> in
> > this.
> >
> >
> >
> > On Fri, Jul 12, 2019 at 10:08 AM Joe Witt  wrote:
> >
> > > to clarify user management for infra is not a prob.  it is an ldap
> group.
> > >
> > > repo creation is self service as well amd group access is tied to that.
> > >
> > > release artifact is the source we produce.  this is typically
> correlated
> > to
> > > a tag of the repo.  if we have all source in one repo it isnt clear to
> me
> > > how we can maintain that.
> > >
> > > in any event im not making a statement of whether to do many repos or
> > not.
> > > just correcting some potentially misleading claims.
> > >
> > > thanks
> > >
> > > On Fri, Jul 12, 2019, 12:01 PM Adam Taft  wrote:
> > >
> > > > Just as a point of discussion, I'm not entirely sure that splitting
> > into
> > > > multiple physical git repositories is actually adding any value.  I
> > think
> > > > it's worth consideration that all the (good) changes being proposed
> are
> > > > done under a single mono-repository model.
> > > >
> > > > If we split into multiple repositories, you have substantially
> > increased
> > > > the infra surface area. User account management overhead goes up.
> > Support
> > > > from the infra team goes up. JIRA issue management goes up,
> > > > misfiled/miscategorized issues become common. It becomes harder for
> > > > community members to interact and engage with the project, steeper
> > > learning
> > > > curve for new contributors. There are more "side channel"
> conversations
> > > and
> > > > less transparency into the project as a whole. Git history is much
> > harder
> > > > (or impossible) to follow across the entire project. Tracking down
> bugs
> > > and
> > > > performing git blame or git bisect becomes hard.
> > > >
> > > > There's nothing really stopping all of these changes from occurring
> in
> > > the
> > > > existing repo, we don't have to have a maven pom.xml in the root of
> the
> > > > project repository. It's much easier for contributors to just clone a
> > > > single repository, read the README at the root, and get oriented to
> the
> > > > project layout.  Output artifacts can still be versioned differently
> > (api
> > > > can have a different version from extensions).  "Splitting out"
> modules
> > > can
> > > > still happen in the mono-repository.  Jenkins and friends can be
> taught
> > > the
> > > > project layout.
> > > >
> > > > tl;dr - The changes being proposed can be done in a single
> repository.
> > > > Splitting into multiple repositories is adding overhead on multiple
> > > levels,
> > > > which might be a sneaky form of muda. [1]
> > > >
> > > > Thanks for reading,
> > > > Adam
> > > >
> > > > [1] https://dzone.com/articles/seven-wastes-software
> > > >
> > > >
> > > > On Thu, Jul 11, 2019 at 11:01 AM Otto Fowler <
> ottobackwa...@gmail.com>
> > > > wrote:
> > > >
> > > 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Edward Armes
I think Nifi would really benefit from this. One thing I think that, should
be looked into is something I noticed while trying to get to grips with the
Nifi source, At the start of the year I did a small exercise to track how a
call from the API to start and stop a processor translates to a processor
being scheduled on the underlying thread pool in the core.

While I was doing this I noticed a few things which I think might get in
the way of this. One of these was, there seems to be a lot of good
intention of louse coupling of the classes through the core/framework code
in a lot of places though there is hard codded reliance on the actual
implementation of the interface and not the interface itself. The other was
that it seemed to me that in the call chains for scheduling a processor
certain objects were being unnecessarily created and passed around when
they could just be created further down the line.

Both of these I think create some potential issue aside from the obvious
ones around code complexity, It does make the Nifi core code base have
quite a high barrier entry into people contributing and providing
enhancements in general. For example, I looked to see if I could implement
NIFI-966 (Expose Scheduling Strategy in ProcessContext)and because of the
way the ProcessContext is created and then handed around through the core
there is no good place (that I could see) to add that information.

I was planning to re-do my analysis with a few other API calls and raise a
ticket to propose to simplify the core but since this proposal been created
I thought I would mention it here and see what others thought?

Regards

Edward

https://issues.apache.org/jira/browse/NIFI-966

On Fri, Jul 12, 2019 at 5:48 PM Joe Witt  wrote:

> Ah I agree the JIRA thing would be too heavy handed.  A single JIRA with
> well defined components tied to 'repos' is good.
>
> As far as separate code repos we're talking about different releasable
> artifacts for which we as a PMC are responsible for the meaning/etc..  As a
> many time RM I definitely dislike the mono repo construct as I understand
> it to function.  I prefer repos per source release artifact where all
> source in that artifact is a function of the release. I am ok with
> different convenience binaries resulting from a single source release
> artifact though.
>
> Thanks
>
> On Fri, Jul 12, 2019 at 12:26 PM Adam Taft  wrote:
>
> > I think the concerns around user management are valid, are they not?
> > Overhead in JIRA goes up (assigning rights to users in JIRA is
> > multiplied).  Risk to new contributors is high, because each isolated
> > repository has its own life and code contribution styles.  Maybe the
> actual
> > apache infra involvement is low, but the negative effects of community
> and
> > source code bifurcation goes up.
> >
> > Tagging in mono-repos is done by prefixing the name of the component in
> the
> > tag name.  Your release sources are still generated from the component
> > folder (not from the root).
> >
> > Modularization (as being proposed) is a good thing, but can be done in a
> > single repository. It's not a requirement to split up the git project to
> > get the benefits of modularization.  That's the point I'm hoping is seen
> in
> > this.
> >
> >
> >
> > On Fri, Jul 12, 2019 at 10:08 AM Joe Witt  wrote:
> >
> > > to clarify user management for infra is not a prob.  it is an ldap
> group.
> > >
> > > repo creation is self service as well amd group access is tied to that.
> > >
> > > release artifact is the source we produce.  this is typically
> correlated
> > to
> > > a tag of the repo.  if we have all source in one repo it isnt clear to
> me
> > > how we can maintain that.
> > >
> > > in any event im not making a statement of whether to do many repos or
> > not.
> > > just correcting some potentially misleading claims.
> > >
> > > thanks
> > >
> > > On Fri, Jul 12, 2019, 12:01 PM Adam Taft  wrote:
> > >
> > > > Just as a point of discussion, I'm not entirely sure that splitting
> > into
> > > > multiple physical git repositories is actually adding any value.  I
> > think
> > > > it's worth consideration that all the (good) changes being proposed
> are
> > > > done under a single mono-repository model.
> > > >
> > > > If we split into multiple repositories, you have substantially
> > increased
> > > > the infra surface area. User account management overhead goes up.
> > Support
> > > > from the infra team goes up. JIRA issue management goes up,
> > > > misfiled/miscategorized issues become common. It becomes harder for
> > > > community members to interact and engage with the project, steeper
> > > learning
> > > > curve for new contributors. There are more "side channel"
> conversations
> > > and
> > > > less transparency into the project as a whole. Git history is much
> > harder
> > > > (or impossible) to follow across the entire project. Tracking down
> bugs
> > > and
> > > > performing git blame or git bisect becomes hard.
> > > >
> > > 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft
I think the concerns around user management are valid, are they not?
Overhead in JIRA goes up (assigning rights to users in JIRA is
multiplied).  Risk to new contributors is high, because each isolated
repository has its own life and code contribution styles.  Maybe the actual
apache infra involvement is low, but the negative effects of community and
source code bifurcation goes up.

Tagging in mono-repos is done by prefixing the name of the component in the
tag name.  Your release sources are still generated from the component
folder (not from the root).

Modularization (as being proposed) is a good thing, but can be done in a
single repository. It's not a requirement to split up the git project to
get the benefits of modularization.  That's the point I'm hoping is seen in
this.



On Fri, Jul 12, 2019 at 10:08 AM Joe Witt  wrote:

> to clarify user management for infra is not a prob.  it is an ldap group.
>
> repo creation is self service as well amd group access is tied to that.
>
> release artifact is the source we produce.  this is typically correlated to
> a tag of the repo.  if we have all source in one repo it isnt clear to me
> how we can maintain that.
>
> in any event im not making a statement of whether to do many repos or not.
> just correcting some potentially misleading claims.
>
> thanks
>
> On Fri, Jul 12, 2019, 12:01 PM Adam Taft  wrote:
>
> > Just as a point of discussion, I'm not entirely sure that splitting into
> > multiple physical git repositories is actually adding any value.  I think
> > it's worth consideration that all the (good) changes being proposed are
> > done under a single mono-repository model.
> >
> > If we split into multiple repositories, you have substantially increased
> > the infra surface area. User account management overhead goes up. Support
> > from the infra team goes up. JIRA issue management goes up,
> > misfiled/miscategorized issues become common. It becomes harder for
> > community members to interact and engage with the project, steeper
> learning
> > curve for new contributors. There are more "side channel" conversations
> and
> > less transparency into the project as a whole. Git history is much harder
> > (or impossible) to follow across the entire project. Tracking down bugs
> and
> > performing git blame or git bisect becomes hard.
> >
> > There's nothing really stopping all of these changes from occurring in
> the
> > existing repo, we don't have to have a maven pom.xml in the root of the
> > project repository. It's much easier for contributors to just clone a
> > single repository, read the README at the root, and get oriented to the
> > project layout.  Output artifacts can still be versioned differently (api
> > can have a different version from extensions).  "Splitting out" modules
> can
> > still happen in the mono-repository.  Jenkins and friends can be taught
> the
> > project layout.
> >
> > tl;dr - The changes being proposed can be done in a single repository.
> > Splitting into multiple repositories is adding overhead on multiple
> levels,
> > which might be a sneaky form of muda. [1]
> >
> > Thanks for reading,
> > Adam
> >
> > [1] https://dzone.com/articles/seven-wastes-software
> >
> >
> > On Thu, Jul 11, 2019 at 11:01 AM Otto Fowler 
> > wrote:
> >
> > > I agree that this looks great. I think Mike’s idea is worth considering
> > as
> > > well. I would hope, that as part of this effort some thought will be
> > given
> > > to enhancing the developer documentation around the modules would be
> > given
> > > as well.
> > >
> > >
> > >
> > >
> > > On July 10, 2019 at 18:15:21, Mike Thomsen (mikerthom...@gmail.com)
> > wrote:
> > >
> > > I agree. It's very well thought out. One change to consider is
> splitting
> > > the extensions further into two separate repos. One that would serve
> as a
> > > standard library of sorts for other component developers and another
> that
> > > would include everything else. Things like the Record API would go into
> > the
> > > former so that we could have a more conservative release schedule going
> > > forward with those components.
> > >
> > > On Wed, Jul 10, 2019 at 4:17 PM Andy LoPresto 
> > > wrote:
> > >
> > > > Thanks Kevin, this looks really promising.
> > > >
> > > > Updating the link here as I think the page may have moved:
> > > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Project+and+Repository+Restructuring
> > > > <
> > > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Project+and+Repository+Restructuring
> > > > >
> > > >
> > > > Andy LoPresto
> > > > alopre...@apache.org
> > > > alopresto.apa...@gmail.com
> > > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> > > >
> > > > > On Jul 10, 2019, at 12:08 PM, Kevin Doran 
> wrote:
> > > > >
> > > > > Hi NiFi Dev Community,
> > > > >
> > > > > Jeff Storck, Bryan Bende, and I have been collaborating back and
> > forth
> > > > > on a proposal for how to restructure the 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Joe Witt
to clarify user management for infra is not a prob.  it is an ldap group.

repo creation is self service as well amd group access is tied to that.

release artifact is the source we produce.  this is typically correlated to
a tag of the repo.  if we have all source in one repo it isnt clear to me
how we can maintain that.

in any event im not making a statement of whether to do many repos or not.
just correcting some potentially misleading claims.

thanks

On Fri, Jul 12, 2019, 12:01 PM Adam Taft  wrote:

> Just as a point of discussion, I'm not entirely sure that splitting into
> multiple physical git repositories is actually adding any value.  I think
> it's worth consideration that all the (good) changes being proposed are
> done under a single mono-repository model.
>
> If we split into multiple repositories, you have substantially increased
> the infra surface area. User account management overhead goes up. Support
> from the infra team goes up. JIRA issue management goes up,
> misfiled/miscategorized issues become common. It becomes harder for
> community members to interact and engage with the project, steeper learning
> curve for new contributors. There are more "side channel" conversations and
> less transparency into the project as a whole. Git history is much harder
> (or impossible) to follow across the entire project. Tracking down bugs and
> performing git blame or git bisect becomes hard.
>
> There's nothing really stopping all of these changes from occurring in the
> existing repo, we don't have to have a maven pom.xml in the root of the
> project repository. It's much easier for contributors to just clone a
> single repository, read the README at the root, and get oriented to the
> project layout.  Output artifacts can still be versioned differently (api
> can have a different version from extensions).  "Splitting out" modules can
> still happen in the mono-repository.  Jenkins and friends can be taught the
> project layout.
>
> tl;dr - The changes being proposed can be done in a single repository.
> Splitting into multiple repositories is adding overhead on multiple levels,
> which might be a sneaky form of muda. [1]
>
> Thanks for reading,
> Adam
>
> [1] https://dzone.com/articles/seven-wastes-software
>
>
> On Thu, Jul 11, 2019 at 11:01 AM Otto Fowler 
> wrote:
>
> > I agree that this looks great. I think Mike’s idea is worth considering
> as
> > well. I would hope, that as part of this effort some thought will be
> given
> > to enhancing the developer documentation around the modules would be
> given
> > as well.
> >
> >
> >
> >
> > On July 10, 2019 at 18:15:21, Mike Thomsen (mikerthom...@gmail.com)
> wrote:
> >
> > I agree. It's very well thought out. One change to consider is splitting
> > the extensions further into two separate repos. One that would serve as a
> > standard library of sorts for other component developers and another that
> > would include everything else. Things like the Record API would go into
> the
> > former so that we could have a more conservative release schedule going
> > forward with those components.
> >
> > On Wed, Jul 10, 2019 at 4:17 PM Andy LoPresto 
> > wrote:
> >
> > > Thanks Kevin, this looks really promising.
> > >
> > > Updating the link here as I think the page may have moved:
> > >
> >
> >
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Project+and+Repository+Restructuring
> > > <
> > >
> >
> >
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Project+and+Repository+Restructuring
> > > >
> > >
> > > Andy LoPresto
> > > alopre...@apache.org
> > > alopresto.apa...@gmail.com
> > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> > >
> > > > On Jul 10, 2019, at 12:08 PM, Kevin Doran  wrote:
> > > >
> > > > Hi NiFi Dev Community,
> > > >
> > > > Jeff Storck, Bryan Bende, and I have been collaborating back and
> forth
> > > > on a proposal for how to restructure the NiFi source code into
> smaller
> > > > Maven projects and repositories based on the discussion that took
> > > > place awhile back on this thread. I'm reviving this older thread in
> > > > order to share that proposal with the community and generate farther
> > > > discussion about at solidifying a destination and a plan for how to
> > > > get there.
> > > >
> > > > Specifically, the proposal we've started working on has three parts:
> > > >
> > > > 1. Goals (more or less a summary of the earlier discussion that took
> > > > place on this thread)
> > > > 2. Proposed end state of the new Maven project and repository
> structure
> > > > 3. Proposed approach for how to get from where we are today to the
> > > > desired end state
> > > >
> > > > The proposal is on the Apache NiFi Wiki [1], so that we can all
> > > > collaborate on it or leave comments there.
> > > >
> > > > [1]
> > >
> >
> >
> https://cwiki.apache.org/confluence/display/NIFIREG/NiFi+Project+and+Repository+Restructuring
> > > >
> > > > Thanks,
> > > > Kevin, Jeff, and Bryan
> > > >
> > > > On 

Re: [EXT] [discuss] Splitting NiFi framework and extension repos and releases

2019-07-12 Thread Adam Taft
Just as a point of discussion, I'm not entirely sure that splitting into
multiple physical git repositories is actually adding any value.  I think
it's worth consideration that all the (good) changes being proposed are
done under a single mono-repository model.

If we split into multiple repositories, you have substantially increased
the infra surface area. User account management overhead goes up. Support
from the infra team goes up. JIRA issue management goes up,
misfiled/miscategorized issues become common. It becomes harder for
community members to interact and engage with the project, steeper learning
curve for new contributors. There are more "side channel" conversations and
less transparency into the project as a whole. Git history is much harder
(or impossible) to follow across the entire project. Tracking down bugs and
performing git blame or git bisect becomes hard.

There's nothing really stopping all of these changes from occurring in the
existing repo, we don't have to have a maven pom.xml in the root of the
project repository. It's much easier for contributors to just clone a
single repository, read the README at the root, and get oriented to the
project layout.  Output artifacts can still be versioned differently (api
can have a different version from extensions).  "Splitting out" modules can
still happen in the mono-repository.  Jenkins and friends can be taught the
project layout.

tl;dr - The changes being proposed can be done in a single repository.
Splitting into multiple repositories is adding overhead on multiple levels,
which might be a sneaky form of muda. [1]

Thanks for reading,
Adam

[1] https://dzone.com/articles/seven-wastes-software


On Thu, Jul 11, 2019 at 11:01 AM Otto Fowler 
wrote:

> I agree that this looks great. I think Mike’s idea is worth considering as
> well. I would hope, that as part of this effort some thought will be given
> to enhancing the developer documentation around the modules would be given
> as well.
>
>
>
>
> On July 10, 2019 at 18:15:21, Mike Thomsen (mikerthom...@gmail.com) wrote:
>
> I agree. It's very well thought out. One change to consider is splitting
> the extensions further into two separate repos. One that would serve as a
> standard library of sorts for other component developers and another that
> would include everything else. Things like the Record API would go into the
> former so that we could have a more conservative release schedule going
> forward with those components.
>
> On Wed, Jul 10, 2019 at 4:17 PM Andy LoPresto 
> wrote:
>
> > Thanks Kevin, this looks really promising.
> >
> > Updating the link here as I think the page may have moved:
> >
>
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Project+and+Repository+Restructuring
> > <
> >
>
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Project+and+Repository+Restructuring
> > >
> >
> > Andy LoPresto
> > alopre...@apache.org
> > alopresto.apa...@gmail.com
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> >
> > > On Jul 10, 2019, at 12:08 PM, Kevin Doran  wrote:
> > >
> > > Hi NiFi Dev Community,
> > >
> > > Jeff Storck, Bryan Bende, and I have been collaborating back and forth
> > > on a proposal for how to restructure the NiFi source code into smaller
> > > Maven projects and repositories based on the discussion that took
> > > place awhile back on this thread. I'm reviving this older thread in
> > > order to share that proposal with the community and generate farther
> > > discussion about at solidifying a destination and a plan for how to
> > > get there.
> > >
> > > Specifically, the proposal we've started working on has three parts:
> > >
> > > 1. Goals (more or less a summary of the earlier discussion that took
> > > place on this thread)
> > > 2. Proposed end state of the new Maven project and repository structure
> > > 3. Proposed approach for how to get from where we are today to the
> > > desired end state
> > >
> > > The proposal is on the Apache NiFi Wiki [1], so that we can all
> > > collaborate on it or leave comments there.
> > >
> > > [1]
> >
>
> https://cwiki.apache.org/confluence/display/NIFIREG/NiFi+Project+and+Repository+Restructuring
> > >
> > > Thanks,
> > > Kevin, Jeff, and Bryan
> > >
> > > On Thu, May 30, 2019 at 1:31 PM Kevin Doran  wrote:
> > >>
> > >> I am also in favor of splitting the nifi maven project up into smaller
> > >> projects with independent release cycles in order to decouple
> > >> development at well defined boundaries/interfaces and also to
> > >> facilitate code reuse.
> > >>
> > >> In anticipation of eventually working towards a NiFi 2.0 that
> > >> introduces bigger changes for developers and users, I've started work
> > >> on a nifi-commons project in which I've extracted out some of the code
> > >> that originally got ported from NiFi -> NiFi Registry, and now exists
> > >> as similar code in both projects, into a standalone modular library.
> > >> That premilinary work is here on my