[jira] [Commented] (NIFI-2731) MergeContent default max number of flow files and max number of bins should be smaller

2016-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573037#comment-15573037
 ] 

ASF GitHub Bot commented on NIFI-2731:
--

Github user pvillard31 closed the pull request at:

https://github.com/apache/nifi/pull/1071


> MergeContent default max number of flow files and max number of bins should 
> be smaller
> --
>
> Key: NIFI-2731
> URL: https://issues.apache.org/jira/browse/NIFI-2731
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joseph Witt
>Assignee: Pierre Villard
> Fix For: 1.1.0
>
>
> Presently there is no default on max entries.  It should probably be set to 
> 1000 by default.  These are flow files and their objects are read into memory 
> and can add up quickly.  Further, if we have 100 default max bins we could 
> end up with 100s of thousands of flow file objects held in memory during 
> common dataflow scenarios.  Recommend moving to max 5 different bins by 
> default and max 1000 flow files by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2731) MergeContent default max number of flow files and max number of bins should be smaller

2016-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572917#comment-15572917
 ] 

ASF GitHub Bot commented on NIFI-2731:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/1071
  
@pvillard31 looks good. +1 merged to master. Unfortunately I forgot to 
include the magical "This closes #1071." message, though. Can you close this PR?


> MergeContent default max number of flow files and max number of bins should 
> be smaller
> --
>
> Key: NIFI-2731
> URL: https://issues.apache.org/jira/browse/NIFI-2731
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joseph Witt
>Assignee: Pierre Villard
> Fix For: 1.1.0
>
>
> Presently there is no default on max entries.  It should probably be set to 
> 1000 by default.  These are flow files and their objects are read into memory 
> and can add up quickly.  Further, if we have 100 default max bins we could 
> end up with 100s of thousands of flow file objects held in memory during 
> common dataflow scenarios.  Recommend moving to max 5 different bins by 
> default and max 1000 flow files by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2731) MergeContent default max number of flow files and max number of bins should be smaller

2016-10-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572914#comment-15572914
 ] 

ASF subversion and git services commented on NIFI-2731:
---

Commit 6e7793305d4ebcf154cf42ae7f6e9e17cab4e857 in nifi's branch 
refs/heads/master from [~pvillard]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=6e77933 ]

NIFI-2731 MergeContent default max number of flow files and max number of bins 
should be smaller


> MergeContent default max number of flow files and max number of bins should 
> be smaller
> --
>
> Key: NIFI-2731
> URL: https://issues.apache.org/jira/browse/NIFI-2731
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joseph Witt
>Assignee: Pierre Villard
> Fix For: 1.1.0
>
>
> Presently there is no default on max entries.  It should probably be set to 
> 1000 by default.  These are flow files and their objects are read into memory 
> and can add up quickly.  Further, if we have 100 default max bins we could 
> end up with 100s of thousands of flow file objects held in memory during 
> common dataflow scenarios.  Recommend moving to max 5 different bins by 
> default and max 1000 flow files by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2731) MergeContent default max number of flow files and max number of bins should be smaller

2016-09-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531636#comment-15531636
 ] 

ASF GitHub Bot commented on NIFI-2731:
--

Github user mosermw commented on the issue:

https://github.com/apache/nifi/pull/1071
  
Yes @joewitt you've alleviated my concerns about the default for Max Bins.  
I forgot that default values get serialized to the flow.xml when a new 
processor is dropped on the graph.  Thanks.

I was on board with setting a default for Max Entries and not leaving that 
unbounded.  It certainly can cause trouble otherwise, and new users won't know 
why.


> MergeContent default max number of flow files and max number of bins should 
> be smaller
> --
>
> Key: NIFI-2731
> URL: https://issues.apache.org/jira/browse/NIFI-2731
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joseph Witt
>Assignee: Pierre Villard
> Fix For: 1.1.0
>
>
> Presently there is no default on max entries.  It should probably be set to 
> 1000 by default.  These are flow files and their objects are read into memory 
> and can add up quickly.  Further, if we have 100 default max bins we could 
> end up with 100s of thousands of flow file objects held in memory during 
> common dataflow scenarios.  Recommend moving to max 5 different bins by 
> default and max 1000 flow files by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2731) MergeContent default max number of flow files and max number of bins should be smaller

2016-09-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531603#comment-15531603
 ] 

ASF GitHub Bot commented on NIFI-2731:
--

Github user joewitt commented on the issue:

https://github.com/apache/nifi/pull/1071
  
@mosermw Keep in mind changing the default value for number of bins from 
100 to 5 will not change peoples existing flows.  I just verified this in both 
the flow.xml and a created template that indeed the default value is serialized 
and isn't just some UI trick.  So, it just means that for NEW uses of 
MergeContent the initial default value will be 5.  So I believe the main case 
you're laying out as a point of concern does not apply as it would still behave 
the way you'd prefer for existing uses.  Does that eliminate your concern for 
this part or is there an aspect I am overlooking?

For the maximum number of entries value the previous setting of nothing 
meant there was no defined ceiling on the number of flow files which could be 
merged in a single pass.  This unbounded sort of behavior can be problematic 
from a memory usage perspective so setting a small default value just means it 
will now be defined.  Going from undefined to defined behavior could alter the 
behavior of the flow but of course since it was undefined there was no 
guarantee.  It also seems like something that could be easily managed in the 
migration guide and something which would have behavior which is now easy to 
reason over.  Now, if we were going from defined to undefined then I'd 
certainly think this would be inappropriate to do outside of probably a major 
release.

As for the target memory size the idea wasn't really about a target memory 
size but rather having a conservative default behavior which the user could 
adjust if they wanted to.  The value of 100 for max bins was also not targeting 
a specific memory size when it was chosen and the decision to have the number 
of entries allowed in a bin by default unbounded was simply a mistake.  What I 
observed was that on a vanilla install on a laptop I could easily create memory 
exhaustion using these defaults and the reason why became pretty clear as I had 
lots of open bins holding lots of flow file objects.  Once I changed to the 
values mentioned here the behavior was far more intuitive.

All that said, MergeContent is a complicated animal and desperately needs a 
custom UI.  That would also serve us quite well but I believe this proposed 
JIRA and PR is a good step unless I am overlooking some important detail.



> MergeContent default max number of flow files and max number of bins should 
> be smaller
> --
>
> Key: NIFI-2731
> URL: https://issues.apache.org/jira/browse/NIFI-2731
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joseph Witt
>Assignee: Pierre Villard
> Fix For: 1.1.0
>
>
> Presently there is no default on max entries.  It should probably be set to 
> 1000 by default.  These are flow files and their objects are read into memory 
> and can add up quickly.  Further, if we have 100 default max bins we could 
> end up with 100s of thousands of flow file objects held in memory during 
> common dataflow scenarios.  Recommend moving to max 5 different bins by 
> default and max 1000 flow files by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2731) MergeContent default max number of flow files and max number of bins should be smaller

2016-09-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530925#comment-15530925
 ] 

ASF GitHub Bot commented on NIFI-2731:
--

Github user mosermw commented on the issue:

https://github.com/apache/nifi/pull/1071
  
For backwards compatibility purposes, I think changing the default 100 
Maximum Number of Bins to 5 will have a large impact on users.  Right now most 
people just leave this at the default.  If they upgrade NiFi and they were 
regularly using 10 bins, for instance, then after the upgrade their 
MergeContent will behave a LOT differently.

What memory footprint are we targetting for this change?  Even with a 512 
MB heap, 100 bins does not cause problems.

I have really big concerns about this one.  Can we discuss more?


> MergeContent default max number of flow files and max number of bins should 
> be smaller
> --
>
> Key: NIFI-2731
> URL: https://issues.apache.org/jira/browse/NIFI-2731
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joseph Witt
>Assignee: Pierre Villard
> Fix For: 1.1.0
>
>
> Presently there is no default on max entries.  It should probably be set to 
> 1000 by default.  These are flow files and their objects are read into memory 
> and can add up quickly.  Further, if we have 100 default max bins we could 
> end up with 100s of thousands of flow file objects held in memory during 
> common dataflow scenarios.  Recommend moving to max 5 different bins by 
> default and max 1000 flow files by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2731) MergeContent default max number of flow files and max number of bins should be smaller

2016-09-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527137#comment-15527137
 ] 

ASF GitHub Bot commented on NIFI-2731:
--

GitHub user pvillard31 opened a pull request:

https://github.com/apache/nifi/pull/1071

NIFI-2731 MergeContent default max number of flow files and max number of 
bins should be smaller



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pvillard31/nifi NIFI-2731

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/1071.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1071


commit 5745a1e6224f1b852071b29b7710d0b5f13f0024
Author: Pierre Villard 
Date:   2016-09-27T19:09:51Z

NIFI-2731 MergeContent default max number of flow files and max number of 
bins should be smaller




> MergeContent default max number of flow files and max number of bins should 
> be smaller
> --
>
> Key: NIFI-2731
> URL: https://issues.apache.org/jira/browse/NIFI-2731
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Joseph Witt
>
> Presently there is no default on max entries.  It should probably be set to 
> 1000 by default.  These are flow files and their objects are read into memory 
> and can add up quickly.  Further, if we have 100 default max bins we could 
> end up with 100s of thousands of flow file objects held in memory during 
> common dataflow scenarios.  Recommend moving to max 5 different bins by 
> default and max 1000 flow files by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)