Re: [DISCUSS] Closing in on a 0.x release

2016-09-27 Thread Brandon DeVries
I agree sooner rather than later for cutting 0.7.1. I think Mike's question
to some degree was whether or not some of those tickets were worth fixing
in 0.x. For example, I'm not sure how much I care about:

NIFI-2571 deprecate NiFiProperties.getInstance()
NIFI-2163 nifi.sh follow the Linux service spec

On the other, there are some I would like to see, even if its in 0.7.2 or
0.8.0, e.g.:

NIFI-2433 "Primary Node Only" processors
NIFI-2562 PutHDFS data corruption

But, there are a number of things that are currently committed (or have
patch available) that I'd like to see available as soon as possible. So
rather than wait for more "nice to haves", I'd rather address the immediate
needs... Immediately.

Brandon


On Tue, Sep 27, 2016 at 10:15 PM Tony Kurc  wrote:

> I think I brought this up before, I sort of expected we may do more 0.x
> releases. I certainly think the more the bugs we can fix, the merrier, and
> it seems like your list is a good initial strawman for a bug fix release of
> we collectively would like to put one together.
>
> While the tickets with work to do on them would be great to have fixed, I
> personally would rather see a release with some fixes and a couple known
> issues than holding off for "perfection", especially as a lot of our effort
> is on 1.x. Are you asking if effort would be wasted if patches were
> developed for the 0.x issues?
>
> Fwiw, I certainly could do the RM work if there is interest/demand signal
> for in another 0.x.
>
> On Sep 27, 2016 5:28 PM, "Michael Moser"  wrote:
>
> > All,
> >
> > I would like to start the discussion of making the next official release
> of
> > the 0.x branch.  I propose that this release be numbered 0.7.1 since it
> > seems that only bug fixes have occurred on the 0.x branch since 0.7.0 was
> > released.
> >
> > The JIRA link [1] below can show you the tickets that have been completed
> > in the 0.x branch.  There are 33 tickets in this list that are resolved.
> >
> > Here is a list of JIRA tickets that are not yet complete that we need to
> > decide what to do with.
> >
> > Patch Available
> > NIFI-2429 PersistentProvenanceRepository
> > NIFI-2774 ConsumeJMS
> >
> > Open against 0.7.0
> > NIFI-2383 ListFiles
> > NIFI-2433 "Primary Node Only" processors (fixed in master but this ticket
> > is for 0.x)
> > NIFI-2798 Zookeeper security upgrade
> > NIFI-2801 Kafka processors documentation
> >
> > Other high priority bugs not yet specifically targeted to the 0.x branch,
> > should we try to work these?
> > NIFI-1696 Event Driven processors
> > NIFI-1912 PutEmail content-type
> > NIFI-2163 nifi.sh follow the Linux service spec
> > NIFI-2409 StoreKiteInDataset invalid URI
> > NIFI-2562 PutHDFS data corruption
> > NIFI-2571 deprecate NiFiProperties.getInstance()
> >
> > -- Mike
> >
> > [1] -
> > https://issues.apache.org/jira/browse/NIFI-2801?jql=
> > project%20%3D%20NIFI%20AND%20fixVersion%20in%20%280.7.1%2C%200.8.0%29
> >
>


Re: [DISCUSS] Closing in on a 0.x release

2016-09-27 Thread Tony Kurc
I think I brought this up before, I sort of expected we may do more 0.x
releases. I certainly think the more the bugs we can fix, the merrier, and
it seems like your list is a good initial strawman for a bug fix release of
we collectively would like to put one together.

While the tickets with work to do on them would be great to have fixed, I
personally would rather see a release with some fixes and a couple known
issues than holding off for "perfection", especially as a lot of our effort
is on 1.x. Are you asking if effort would be wasted if patches were
developed for the 0.x issues?

Fwiw, I certainly could do the RM work if there is interest/demand signal
for in another 0.x.

On Sep 27, 2016 5:28 PM, "Michael Moser"  wrote:

> All,
>
> I would like to start the discussion of making the next official release of
> the 0.x branch.  I propose that this release be numbered 0.7.1 since it
> seems that only bug fixes have occurred on the 0.x branch since 0.7.0 was
> released.
>
> The JIRA link [1] below can show you the tickets that have been completed
> in the 0.x branch.  There are 33 tickets in this list that are resolved.
>
> Here is a list of JIRA tickets that are not yet complete that we need to
> decide what to do with.
>
> Patch Available
> NIFI-2429 PersistentProvenanceRepository
> NIFI-2774 ConsumeJMS
>
> Open against 0.7.0
> NIFI-2383 ListFiles
> NIFI-2433 "Primary Node Only" processors (fixed in master but this ticket
> is for 0.x)
> NIFI-2798 Zookeeper security upgrade
> NIFI-2801 Kafka processors documentation
>
> Other high priority bugs not yet specifically targeted to the 0.x branch,
> should we try to work these?
> NIFI-1696 Event Driven processors
> NIFI-1912 PutEmail content-type
> NIFI-2163 nifi.sh follow the Linux service spec
> NIFI-2409 StoreKiteInDataset invalid URI
> NIFI-2562 PutHDFS data corruption
> NIFI-2571 deprecate NiFiProperties.getInstance()
>
> -- Mike
>
> [1] -
> https://issues.apache.org/jira/browse/NIFI-2801?jql=
> project%20%3D%20NIFI%20AND%20fixVersion%20in%20%280.7.1%2C%200.8.0%29
>


Re: Git branch clean-up

2016-09-27 Thread Tony Kurc
Any branches I created can be deleted

On Sep 26, 2016 11:57 AM, "Joe Witt"  wrote:

> Andre
>
> The support branch for 0.7 would happen if there were a incremental
> release for it.
>
> The tags represent the exact released bits (version values are
> set/etc..).  The support/RC ones have the correct snapshot/etc.. on
> them.
>
> The items that are quite old are probably fair game to cleanup.
>
> Thanks
> Joe
>
> On Mon, Sep 26, 2016 at 11:43 AM, Andre  wrote:
> > All,
> >
> > Clean-up is almost done, but I have a question.
> >
> > Would anyone have a suggestion on what to do with the multiple RC
> branches?
> >
> > We currently have some inconsistency around version related branches,
> with
> > branches like "support/nifi-0.6.x" or even "release-nifi-0.1.0-rc13" but
> no
> > branch like "support/nifi-0.7.x".
> >
> > Happy to create the appropriate branches but wondering if we prefer using
> > tags for release control?
> >
> > Tag land looks to be in far better shape with a reasonably consistent
> > naming convention (i.e. rel/version is being used since 0.6.1 and before
> > that version).
> >
> > Ready to organise the required changes once we reach an agreement
> >
> > Cheers
> >
> > On Sat, Sep 24, 2016 at 3:25 PM, Andre  wrote:
> >
> >> All,
> >>
> >> Seems like we are making some progress around this. Some of the branches
> >> have been already deleted and some more have been documented and
> deleted.
> >>
> >> The following branches belong to closed ticket but had commits ahead of
> >> master and were left behind until we clarify they are safe to delete
> >>
> >> NIFI-274 - bbende
> >> NIFI-376 - mcgilman
> >> NIFI-631 - trkurc
> >> NIFI-640 - mcgilman
> >> NIFI-731 - marlap14
> >> NIFI-744 - markap14
> >> NIFI-919 - bbende
> >> NIFI-1073 - trkurc
> >> NIFI-1107 - trkurc
> >>
> >>
> >> The following branches have been documented as part of NIFI-2817 and
> >> deleted
> >>
> >> ListHDFS - e0d4484ee8e7b80fe3b47d5bb132d95019253a46
> >> NIFI-25 - 0211d0d71561fb63df4a06cd7b1a3b430b7a3e6c
> >> NIFI-259 - 16f185245309ac3bcf9629b1c4d4dc655f18ca68
> >> NIFI-433 - e1e1aecc8b8e6c75f260edf248f046c1557c088b
> >> NIFI-730 - dbf0c7893fef964bfbb3a4c039c756396587ce12
> >> NIFI-810-InputRequirement - 0636f0e731cd28299edd3a6e9db90de5045ab662
> >> NIFI-1085 - 6add372bc142304ef25e0bafb49709654a995f08
> >>
> >> Cheers
> >>
> >> On Fri, Jul 22, 2016 at 8:26 AM, Andre  wrote:
> >>
> >>> Devs,
> >>>
> >>> I was wondering if anyone had any success deleting the stalled remote
> >>> branches?
> >>>
> >>> Should we perhaps add the clean up as part of 1.0s the release cycle?
> >>>
> >>> Cheers
> >>>
> >>> On 7 Jun 2016 22:42, "Joe Witt"  wrote:
> >>>
>  Definitely agree with the spirit of this thread.  I am happy to do a
>  purge
>  on clearly dead branches.
> 
>  Of note there was a decent chunk of time where we could create
> branches
>  but
>  not remove them.
> 
>  Thanks.
>  Joe
>  On Jun 7, 2016 8:25 AM, "Joe Skora"  wrote:
> 
>  > +1 for cleaning up the *Branches for resolved tickets* group.
>  >
>  > The other groups should probably be reviewed by the last committers
> and
>  > removed if appropriate.
>  >
>  > A Jira ticket (or tickets) identifying each branch and its last
> commit
>  hash
>  > will document their removal while also making it easy to recreate
> them
>  if
>  > needed from the last commit.
>  >
>  >
>  > On Tue, Jun 7, 2016 at 7:58 AM, Andre  wrote:
>  >
>  > > Devs,
>  > >
>  > > It seems like the git tree is filled with stagnated / legacy
> branches
>  > > covering issues that have been already resolved.
>  > >
>  > > Unclear ticket number
>  > > Branch name - last committer (date) - Suspected related issue
>  > > - improve-prov-performance - mcgilman (Apr 2015) - suspect it is
>  NIFI-524
>  > > - journaling-prov-repo - markap14 - (Mar 2015) - Suspect it is
>  NIFI-388
>  > > - prov-query-language - markap14 (Mar 2015) - Suspect it is
> NIFI-40
>  > > - ListHDFS - markap14 (Apr 2015) - Suspect it is NIFI-533
>  > >
>  > > Branches for resolved tickets
>  > > - Issue  (Resolved date)
>  > > - NIFI-25 (Apr 2015)
>  > > - NIFI-259 (Feb 2016)
>  > > - NIFI-274 (Nov 2015)
>  > > - NIFI-376 (Aug 2015)
>  > > - NIFI-433 (May 2015)
>  > > - NIFI-631 (Nov 2015)
>  > > - NIFI-640 (July 2015)
>  > > - NIFI-655 (Dec 2015)
>  > > - NIFI-730 (Nov 2015)
>  > > - NIFI-731 (Jul 2015)
>  > > - NIFI-744 (Aug 2015)
>  > > - NIFI-810-InputRequirement (Oct 2015)
>  > > - NIFI-919 (Sep 2015)
>  > > - NIFI-1073 (Nov 2015)
>  > > - NIFI-1085 (Nov 2015)
>  > > - NIFI-1107 (Feb 2016)
>  > >
>  > >
>  > > Branches with Open tickets
>  > > - 

Re: ExecuteProcess Question

2016-09-27 Thread Lee Laim
Hi Dale,

You could also check the permissions on the script.  I was able to echo
text from a bash script which placed the text in the flowfile content.

Thanks,
Lee





On Tue, Sep 27, 2016 at 6:32 PM, Andy LoPresto  wrote:

> Hi Dale,
>
> I just tried to replicate this and I’m not sure I fully understand the
> issue.
>
> You can see the actual contents of my Java class [1], Bash script [2], and
> command-line activity [3] in the Gists provided. I then set up a flow [4]
> which simply executed the script every 5 seconds and logged the output. I
> saw both the output from System.out.println() and System.err.println() as
> the content of the flowfile [5]. I did need to set the RedirectErrorStream
> property in ExecuteProcess to capture the error output as well.
>
> After doing this, I re-read your question and notice you mention that the
> “echo commands” are not showing up. Does this refer to echo output from the
> bash script itself? I also replicated this [6][7].
>
> If you are referring to output from the Scala code, is this perhaps being
> indirected via logging mechanism? If you run the command directly from the
> command line, does this output appear in the standard output console?
>
> Can you please clarify what I misunderstood from your question or let me
> know if I missed something. Thanks.
>
> [1] https://gist.github.com/alopresto/f71a85793cabcb22917546b7e504fe00
> [2] https://gist.github.com/alopresto/b166a7a9ccc3347541aaa24410883d4b/
> c4f2d7e7cda5d40defafe3796f7587915612c5ac
> [3] https://gist.github.com/alopresto/1157967ec70da8a25c176c82911613c8
> [4] https://gist.github.com/alopresto/9ef34727a7e72bb81944c214f921078a
> [5] https://gist.github.com/alopresto/72ba8f56495813e3fd0f0d9b9165d0e7
> [6] https://gist.github.com/alopresto/b166a7a9ccc3347541aaa24410883d4b/
> ae888959fec099afc951dea23186b681329d4b22
> [7] https://gist.github.com/alopresto/9c93a5b3daed983219f43c7766450e90
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Sep 27, 2016, at 2:40 PM, dale.chang13 
> wrote:
>
> So I have a bash script that I am able to run from the command line, and I
> want to be able to let NiFi call it using the ExecuteProcess processor.
>
> The script itself runs fine from the command line, and it looks like the
> ExecuteProcess is executing the script as well (I have a LogAttribute
> processor as a downstream processor that verifies that no problems were
> executing the script), but neither the Bulletin nor the logs tell me if
> anything is wrong or successful.
>
> I found that the echo commands in the script should write to the NiFi
> FlowFile Content, but I do not see anything show up.
>
>
>
> The script is simply a java -jar xx.jar file, which happens to contain a
> java wrapper class with a main method that calls a scala main object that
> then performs Apache Spark operations.
>
> Any ideas?
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/ExecuteProcess-Question-tp13471.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com .
>
>
>


Re: ExecuteProcess Question

2016-09-27 Thread Andy LoPresto
Hi Dale,

I just tried to replicate this and I’m not sure I fully understand the issue.

You can see the actual contents of my Java class [1], Bash script [2], and 
command-line activity [3] in the Gists provided. I then set up a flow [4] which 
simply executed the script every 5 seconds and logged the output. I saw both 
the output from System.out.println() and System.err.println() as the content of 
the flowfile [5]. I did need to set the RedirectErrorStream property in 
ExecuteProcess to capture the error output as well.

After doing this, I re-read your question and notice you mention that the “echo 
commands” are not showing up. Does this refer to echo output from the bash 
script itself? I also replicated this [6][7].

If you are referring to output from the Scala code, is this perhaps being 
indirected via logging mechanism? If you run the command directly from the 
command line, does this output appear in the standard output console?

Can you please clarify what I misunderstood from your question or let me know 
if I missed something. Thanks.

[1] https://gist.github.com/alopresto/f71a85793cabcb22917546b7e504fe00 

[2] 
https://gist.github.com/alopresto/b166a7a9ccc3347541aaa24410883d4b/c4f2d7e7cda5d40defafe3796f7587915612c5ac
 

[3] https://gist.github.com/alopresto/1157967ec70da8a25c176c82911613c8 

[4] https://gist.github.com/alopresto/9ef34727a7e72bb81944c214f921078a 

[5] https://gist.github.com/alopresto/72ba8f56495813e3fd0f0d9b9165d0e7 

[6] 
https://gist.github.com/alopresto/b166a7a9ccc3347541aaa24410883d4b/ae888959fec099afc951dea23186b681329d4b22
 

[7] https://gist.github.com/alopresto/9c93a5b3daed983219f43c7766450e90 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Sep 27, 2016, at 2:40 PM, dale.chang13  wrote:
> 
> So I have a bash script that I am able to run from the command line, and I
> want to be able to let NiFi call it using the ExecuteProcess processor.
> 
> The script itself runs fine from the command line, and it looks like the
> ExecuteProcess is executing the script as well (I have a LogAttribute
> processor as a downstream processor that verifies that no problems were
> executing the script), but neither the Bulletin nor the logs tell me if
> anything is wrong or successful.
> 
> I found that the echo commands in the script should write to the NiFi
> FlowFile Content, but I do not see anything show up.
> 
> 
> 
> The script is simply a java -jar xx.jar file, which happens to contain a
> java wrapper class with a main method that calls a scala main object that
> then performs Apache Spark operations.
> 
> Any ideas?
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/ExecuteProcess-Question-tp13471.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.



signature.asc
Description: Message signed with OpenPGP using GPGMail


ExecuteProcess Question

2016-09-27 Thread dale.chang13
So I have a bash script that I am able to run from the command line, and I
want to be able to let NiFi call it using the ExecuteProcess processor.

The script itself runs fine from the command line, and it looks like the
ExecuteProcess is executing the script as well (I have a LogAttribute
processor as a downstream processor that verifies that no problems were
executing the script), but neither the Bulletin nor the logs tell me if
anything is wrong or successful. 

I found that the echo commands in the script should write to the NiFi
FlowFile Content, but I do not see anything show up.



The script is simply a java -jar xx.jar file, which happens to contain a
java wrapper class with a main method that calls a scala main object that
then performs Apache Spark operations.

Any ideas?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/ExecuteProcess-Question-tp13471.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


[DISCUSS] Closing in on a 0.x release

2016-09-27 Thread Michael Moser
All,

I would like to start the discussion of making the next official release of
the 0.x branch.  I propose that this release be numbered 0.7.1 since it
seems that only bug fixes have occurred on the 0.x branch since 0.7.0 was
released.

The JIRA link [1] below can show you the tickets that have been completed
in the 0.x branch.  There are 33 tickets in this list that are resolved.

Here is a list of JIRA tickets that are not yet complete that we need to
decide what to do with.

Patch Available
NIFI-2429 PersistentProvenanceRepository
NIFI-2774 ConsumeJMS

Open against 0.7.0
NIFI-2383 ListFiles
NIFI-2433 "Primary Node Only" processors (fixed in master but this ticket
is for 0.x)
NIFI-2798 Zookeeper security upgrade
NIFI-2801 Kafka processors documentation

Other high priority bugs not yet specifically targeted to the 0.x branch,
should we try to work these?
NIFI-1696 Event Driven processors
NIFI-1912 PutEmail content-type
NIFI-2163 nifi.sh follow the Linux service spec
NIFI-2409 StoreKiteInDataset invalid URI
NIFI-2562 PutHDFS data corruption
NIFI-2571 deprecate NiFiProperties.getInstance()

-- Mike

[1] -
https://issues.apache.org/jira/browse/NIFI-2801?jql=project%20%3D%20NIFI%20AND%20fixVersion%20in%20%280.7.1%2C%200.8.0%29


Re: Closing on solving NIFI-1500 - NiFi requires too much write permissions to bootstrap

2016-09-27 Thread James Wing
Andre,

Is your target for NIFI-1500 the default installation permission scheme, or
just that NiFi does not fail to start without the permissions in a
customized scheme?  Would it be acceptable to distinguish between
permissions required to initialize NiFi the first time, and the permissions
required for ongoing use?

With respect to #3, conf directory permissions -- I like the
./conf/flows/... directory idea (or ./flows/...).  But there are other
writes to ./conf, including:

* Start-time initialization of users.xml, authorizations.xml, conversion
from legacy authorized-users.xml, etc.
* Existing UI for configuration of policies, users, and groups
* Any future UI-driven management options

I believe these can be optional to the experienced admin, but the default
installation requires write access to create and update these files.

Thanks,

James

On Tue, Sep 27, 2016 at 8:13 AM, Andre  wrote:

> devs,
>
> A while ago (0.4.0 IIRC) we had a brief exchange of messages around the
> permissions NiFi requires to run (NIFI-1500).
>
> The debate revolved mostly around 4 things:
>
> 1 - write access to $NIFI_HOME/bin
>
> 2. write access to $NIFI_HOME/lib - NIFI-2818  / #1059 (review is welcome)
>
> 3. write access to $NIFI_HOME/conf (i.e. by default the flow is saved under
> "conf/")
>
> 4. write access to $NIFI_HOME/. (i.e. creates the missing repo and working
> folders upon boot).
>
> Good news is that inspection of the code suggests #1 has been solved when
> we re-wrote the nifi.sh script and that #3 and #4 may be solved just by
> small changes to default configuration and documentation.
>
>
> Would anyone have thoughts on what would be the preferred approach to deal
> with issues #3 and #4?
>
>
> IMNSHO, the least impacting way of addressing #3 is to modify the default
> behaviour, and ship NiFi with the following settings:
>
> nifi.flow.configuration.file=./conf/flows/flow.xml.gz
> nifi.flow.configuration.archive.dir=./conf/flows/archive/
>
> instead of the currently used:
>
> nifi.flow.configuration.file=./conf/flow.xml.gz
> nifi.flow.configuration.archive.dir=./conf/archive/
>
> This way, ./conf and all files could be owned by root.root and have fs
> permissions set to 755 (drwxr-xr-x)
>
> while conf/flows could be set to runsasuser.runasgroup and fs permissions
> set to 700 (drwx--).
>
>
> #4 Can be solved by modifying the pom files to add the adequate directory
> structure to the  tar and gzip archives (i.e. pre-populating the directory
> structure) or by adjusting rpm and deb files
>
>
> We would also update the documentation to ensure people installing NiFi are
> informed of the ideal filesystem permissions.
>
>
> Would everyone be in agreement with this approach?
>
>
> On a related note:
>
> I would truly appreciate if we could get some eyes over PR-1059 as early as
> possible . Whilst a minor change to the code, I suspect it needs to be well
> thought off and tested before we commit.
>
>
> Cheers
>


Closing on solving NIFI-1500 - NiFi requires too much write permissions to bootstrap

2016-09-27 Thread Andre
devs,

A while ago (0.4.0 IIRC) we had a brief exchange of messages around the
permissions NiFi requires to run (NIFI-1500).

The debate revolved mostly around 4 things:

1 - write access to $NIFI_HOME/bin

2. write access to $NIFI_HOME/lib - NIFI-2818  / #1059 (review is welcome)

3. write access to $NIFI_HOME/conf (i.e. by default the flow is saved under
"conf/")

4. write access to $NIFI_HOME/. (i.e. creates the missing repo and working
folders upon boot).

Good news is that inspection of the code suggests #1 has been solved when
we re-wrote the nifi.sh script and that #3 and #4 may be solved just by
small changes to default configuration and documentation.


Would anyone have thoughts on what would be the preferred approach to deal
with issues #3 and #4?


IMNSHO, the least impacting way of addressing #3 is to modify the default
behaviour, and ship NiFi with the following settings:

nifi.flow.configuration.file=./conf/flows/flow.xml.gz
nifi.flow.configuration.archive.dir=./conf/flows/archive/

instead of the currently used:

nifi.flow.configuration.file=./conf/flow.xml.gz
nifi.flow.configuration.archive.dir=./conf/archive/

This way, ./conf and all files could be owned by root.root and have fs
permissions set to 755 (drwxr-xr-x)

while conf/flows could be set to runsasuser.runasgroup and fs permissions
set to 700 (drwx--).


#4 Can be solved by modifying the pom files to add the adequate directory
structure to the  tar and gzip archives (i.e. pre-populating the directory
structure) or by adjusting rpm and deb files


We would also update the documentation to ensure people installing NiFi are
informed of the ideal filesystem permissions.


Would everyone be in agreement with this approach?


On a related note:

I would truly appreciate if we could get some eyes over PR-1059 as early as
possible . Whilst a minor change to the code, I suspect it needs to be well
thought off and tested before we commit.


Cheers


Re: Questions about heterogeneous cluster and queue problem/bug/oddity in 1.0.0

2016-09-27 Thread Joe Skora
Joe,

Thanks, your tuning comments all make sense.

If they didn't have the similar CPU and RAM scales I probably would not
have tried it.  It's only been running a couple of days, but I've already
noticed some anecdotal performance differences.  For instance, the Linux
and OSX nodes appear process more flow files than the Windows node, I don't
know if that's due to the SSDs or the different file systems.

The cluster runs better than I expected for non-server hardware.  I haven't
hammered it hard yet, but eventually I'll pull together some NiFi
performance stats and system/OS benchmark control numbers.

I had some bad hot spots in the flow, specifically before the ControlRate
and UpdateAttribute processors, so I tried splitting the flow with a
DistributeLoad to 3 instances of each and did the same for the highest
volume PutFile too.  That made a big difference and the hot spots were
gone.  Now there are several warms spots, but the queue sizes are much more
even across the graph and a big influx of files moves more steadily through
the graph instead of racing from one backup to the next.  Does that make
sense?

Joe

On Tue, Sep 27, 2016 at 8:31 AM, Joe Witt  wrote:

> JoeS
>
> I think you are seeing a queue bug that has been corrected or reported
> on the 1.x line.
>
> As for the frankencluster concept i think it is generally fair game.
> There are a number of design reasons, most notably back pressure, that
> make this approach feasible.  So the big ticket items to consider are
> things like
>
> CPU
> Since the model of NiFi is that basically all processes/tasks are
> eligible to run on all nodes and that when configuring the number of
> threads and tasks per controller and component that they are applied
> to all nodes this could be problematic when there is a substantive
> imbalance of power on the various systems.  If this were important to
> improve we could allow node-local overrides of max controller threads.
> That helps a bit but doesn't really solve it.  Again back pressure is
> probably the most effective.  There are probably a number of things we
> could do here if needed.
>
> Disk
> We have to consider the speed, congestion, and storage available on
> the disk(s) and how they're partitioned and such for our various
> repositories.  Again back pressure is one of the more effective
> mechanisms here because it is all about doing as much as you can which
> means other nodes should be able to take on more/less.  Fortunately
> the configuration of the repositories and such here are node-local so
> we can have pretty considerable variety here and things work pretty
> well.
>
> Network
> Back pressure for the win.  Though significant imbalances could lead
> to significant congestion which could cause inefficiencies in general
> so would need to be careful.  That scenario would require wildly
> imbalanced node capabilities and very high rate flows most likely.
>
> Memory
> JVM Heap size variability and/or off heap memory differences could
> cause some nodes to behave wildly different than others in ways that
> back pressure will not necessarily solve.  For instance a node with
> too low heap size for the types of processes in the flow could yield
> order(s) of magnitude lower performance than another node.  We should
> do more for these things.  Users should not have to configure things
> like swapping thresholds for instance.  We should at runtime determine
> and tune those values.  It is simply too hard to find a good magic
> number that predicts the likely number of flow file attributes and
> size that might be needed and those can have a substantial impact on
> heap usage.  Right now we treat swapping on a per queue basis though
> it is configured globally.  If you have say just 100 queues each
> holding in memory 1000 flowfiles you have all the attributes of those
> 100,000 flowfiles in memory.  If each flow file took up just 1KB of
> memory we're talking 100+MB.  Perhaps a slightly odd example but users
> aren't going to go through and think about every queue and the optimal
> global swapping setting.  Though it is an important number.  The
> system should be watching them all and doing this automatically.  That
> could help quite a lot.  We may also end up needing to not even have
> flowfile attributes held in memory though supporting this would
> require API changes to ensure they're only accessed in stream friendly
> ways.  Doing this for all uses of EL is probably pretty
> straightforward but all the direct attribute map accesses would need
> consideration.
>
> ...And we also need to think through things like
>
> OS Differences in accessing resources
> We generally follow "Pure Java (tm)" practices where possible.  So
> this helps a lot.  But still things like accessing specific file paths
> as might be needed in flow configurations themselves (GetFile/PutFile
> for example) could be tricky (but doable).
>
> The protocols used to source data matter a lot
> With all this 

Re: Questions about heterogeneous cluster and queue problem/bug/oddity in 1.0.0

2016-09-27 Thread Joe Witt
JoeS

I think you are seeing a queue bug that has been corrected or reported
on the 1.x line.

As for the frankencluster concept i think it is generally fair game.
There are a number of design reasons, most notably back pressure, that
make this approach feasible.  So the big ticket items to consider are
things like

CPU
Since the model of NiFi is that basically all processes/tasks are
eligible to run on all nodes and that when configuring the number of
threads and tasks per controller and component that they are applied
to all nodes this could be problematic when there is a substantive
imbalance of power on the various systems.  If this were important to
improve we could allow node-local overrides of max controller threads.
That helps a bit but doesn't really solve it.  Again back pressure is
probably the most effective.  There are probably a number of things we
could do here if needed.

Disk
We have to consider the speed, congestion, and storage available on
the disk(s) and how they're partitioned and such for our various
repositories.  Again back pressure is one of the more effective
mechanisms here because it is all about doing as much as you can which
means other nodes should be able to take on more/less.  Fortunately
the configuration of the repositories and such here are node-local so
we can have pretty considerable variety here and things work pretty
well.

Network
Back pressure for the win.  Though significant imbalances could lead
to significant congestion which could cause inefficiencies in general
so would need to be careful.  That scenario would require wildly
imbalanced node capabilities and very high rate flows most likely.

Memory
JVM Heap size variability and/or off heap memory differences could
cause some nodes to behave wildly different than others in ways that
back pressure will not necessarily solve.  For instance a node with
too low heap size for the types of processes in the flow could yield
order(s) of magnitude lower performance than another node.  We should
do more for these things.  Users should not have to configure things
like swapping thresholds for instance.  We should at runtime determine
and tune those values.  It is simply too hard to find a good magic
number that predicts the likely number of flow file attributes and
size that might be needed and those can have a substantial impact on
heap usage.  Right now we treat swapping on a per queue basis though
it is configured globally.  If you have say just 100 queues each
holding in memory 1000 flowfiles you have all the attributes of those
100,000 flowfiles in memory.  If each flow file took up just 1KB of
memory we're talking 100+MB.  Perhaps a slightly odd example but users
aren't going to go through and think about every queue and the optimal
global swapping setting.  Though it is an important number.  The
system should be watching them all and doing this automatically.  That
could help quite a lot.  We may also end up needing to not even have
flowfile attributes held in memory though supporting this would
require API changes to ensure they're only accessed in stream friendly
ways.  Doing this for all uses of EL is probably pretty
straightforward but all the direct attribute map accesses would need
consideration.

...And we also need to think through things like

OS Differences in accessing resources
We generally follow "Pure Java (tm)" practices where possible.  So
this helps a lot.  But still things like accessing specific file paths
as might be needed in flow configurations themselves (GetFile/PutFile
for example) could be tricky (but doable).

The protocols used to source data matter a lot
With all this talk of back pressure keep in mind that how data gets
into NiFi becomes really critical in these clusters.  If you use
protocols which do not afford fault tolerance and load balancing then
things are not great.  So protocols which have queuing semantics or
feedback mechanisms or let NiFi as the consumer control things will
work out well.  Some portions of JMS are good for this.  Kafka is good
for this.  NiFi's own site-to-site is good for this.

The frankencluster testing is a valuable way to force and think
through interesting issues. Maybe the frankencluster as you have it
isn't realistic but it still exposes the concepts that need to be
thought through for cases that definitely are.

Thanks
Joe

On Tue, Sep 27, 2016 at 7:37 AM, Joe Skora  wrote:
> The images just show what the text described, 13 files queued, EmptyQueue
> returns 0 of 13 removed, and ListQueue returns the queue has no flowfiles.
>
> There were 13 files of 1k sitting in a queue between a SegmentContent and
> ControlRate.  After I sent that email I had to stop/start the processors a
> couple of times for other things and somewhere in the midst of that the
> queue cleared.
>
>
>
> On Mon, Sep 26, 2016 at 11:05 PM, Peter Wicks (pwicks) 
> wrote:
>
>> Joe,
>>
>> I didn’t get the images (might just be my exchange 

Re: Questions about heterogeneous cluster and queue problem/bug/oddity in 1.0.0

2016-09-27 Thread Joe Skora
The images just show what the text described, 13 files queued, EmptyQueue
returns 0 of 13 removed, and ListQueue returns the queue has no flowfiles.

There were 13 files of 1k sitting in a queue between a SegmentContent and
ControlRate.  After I sent that email I had to stop/start the processors a
couple of times for other things and somewhere in the midst of that the
queue cleared.



On Mon, Sep 26, 2016 at 11:05 PM, Peter Wicks (pwicks) 
wrote:

> Joe,
>
> I didn’t get the images (might just be my exchange server). How many files
> are in the queue? (exact count please)
>
> --Peter
>
> From: Joe Skora [mailto:jsk...@gmail.com]
> Sent: Monday, September 26, 2016 8:20 PM
> To: dev@nifi.apache.org
> Subject: Questions about heterogeneous cluster and queue
> problem/bug/oddity in 1.0.0
>
> I have a 3 node test franken-cluster that I'm abusing for the sake of
> learning.  The systems run Ubuntu 15.04, OS X 10.11.6, and Windows 10 and
> though far comparable each has quad-core i7 between 2.5 and 3.5 GHz and
> 16GB of RAM.  Two have SSDs and the third has a 7200RPM SATA III drive.
>
> 1) Is there any reason mixing operating systems with the cluster would be
> a bad idea.  Once configured it seems to run ok.
> 2) Will performance disparities affect reliable ability or performance
> within the cluster?
> 3) Are there ways to configure disparate systems such that they can all
> perform at peak?
>
> The bug or issues I have run into is a queue showing files that can't be
> remove or listed.  Screen shots attached below.  I don't know if it's a
> mixed-OS issues, something I did while torturing the systems (all stayed
> up, this time), or just a weird anomaly.
>
> Regards,
> Joe
>
> Trying to empty queue seen in background
> [Inline image 1]
>
> but the flowfiles cannot be deleted.
> [Inline image 2]
>
> But try to list them and it says there are no files in the queue?
> [Inline image 3]
>