Re: A new resource config proposal

2018-04-18 Thread Karthik Ramasamy
Thanks Ning. This is an excellent idea - simplifying from resource point of
view.

On Wed, Apr 18, 2018 at 10:56 AM, Ning Wang  wrote:

> In the discussions with a few developers, we have some new thoughts about
> the resource config and packing algorithms in heron. Basically we want to
> step back from the current configs and try to come up with a new solution
> from user's point of view.
>
> Here is a brief proposal document for reference and discussion.
>
> https://docs.google.com/document/d/1huySPugRcR5LXmlBxCCtie9grkc-
> X67t57_R_tZM-4w/edit#
>
> Please feel free to comment. Thanks in advance.
>
> Regards,
> --ning
>


Re: Adding source task id to default metrics

2018-04-18 Thread Karthik Ramasamy
I like the idea of both the metrics and it might be great to include them.

Prometheus can aggregate metrics downstream by component-id/source-task etc
It is a nice tool.

cheers
/karthik

On Wed, Apr 18, 2018 at 8:32 PM, Fu Maosong  wrote:

> One concern is that it will significantly increase the number of metrics,
> potentially leading performance concerns.
>
> 2018-04-18 18:58 GMT-07:00 Thomas Cooper :
>
> > Hi All,
> >
> > This started out a quick slack post, then a reasonably sized email and
> now
> > it has headings!
> >
> > *Introduction*
> >
> > I am working on a performance modeling system for Heron. Hopefully this
> > system will be useful for checking proposed plans will meet performance
> > targets and also for checking if currently running physical plans will
> have
> > back pressure issues with higher traffic rates.
> >
> > To do this I need to know what proportion of tuples are routed from each
> > upstream instance to its downstream instances, which is a metric that
> Heron
> > does not provide by default.
> >
> > *Proposal*
> >
> > I have implemented a custom metric to do what I need in my test
> topologies,
> > it is a simple multi-count metric called "__receive-count" where the key
> > now includes the "sourceTaskId" value (which you can get from the tuple
> > instance) as well as the source component name and incoming stream name.
> >
> > This is basically the same as the default "__execute-count" metric but
> the
> > metric name format is
> > "__receive-count///"
> > instead of "__execute-count//"
> >
> > So I see two options:
> >
> >1. Create a new "__receive-count" metric and leave the
> "__execute-count"
> >alone
> >2. Alter "__execute-count" to include the source task ID.
> >
> > *Questions*
> >
> > My first question is weather the metric name is parsed anywhere further
> > down the line, such as aggregating component metrics in the metrics
> > manager? So changing the name would break things?
> >
> > My second is if we do change "__execute-count" should we also add the
> > source task ID to other bolt metrics like "__execute-latency" (it would
> be
> > nice to see how latency changes by source instance --- this is a
> particular
> > issue in two consecutive fields grouped components as instances will
> > receive very different key distributions which could lead to very
> different
> > processing latency).
> >
> > *Implementation*
> >
> > To add this to the default metrics (or change "__execute-count") seems
> like
> > it would be reasonably straight forward (famous last words). We would
> need
> > to modify the `FullBoltMetric` class to include the new metrics (if
> > required) and edit the `FullBoltMetric.executeTuple` method to accept the
> > "sourceTaskId" (which is already available in the
> > "BoltInstance.readTuplesAndExecute" method) as a 4th argument.
> >
> > Obviously, we will need to do the same with the Python implementation.
> Will
> > this also need to be changed in the Storm compatibility layer?
> >
> > *Conclusion*
> >
> > Having the information on where tuples are flowing is really important if
> > we want to be able to do more intelligent routing and adaptive
> auto-scaling
> > in the future and hopefully this one small change/extra metric won't add
> > any significant processing overhead.
> >
> > I look forward to hearing what you think.
> >
> > Cheers,
> >
> > Tom Cooper
> > W: www.tomcooper.org.uk  | Twitter: @tomncooper
> > 
> >
>
>
>
> --
> With my best Regards
> --
> Fu Maosong
> Twitter Inc.
> Mobile: +001-415-244-7520
>


Re: Adding source task id to default metrics

2018-04-18 Thread Fu Maosong
One concern is that it will significantly increase the number of metrics,
potentially leading performance concerns.

2018-04-18 18:58 GMT-07:00 Thomas Cooper :

> Hi All,
>
> This started out a quick slack post, then a reasonably sized email and now
> it has headings!
>
> *Introduction*
>
> I am working on a performance modeling system for Heron. Hopefully this
> system will be useful for checking proposed plans will meet performance
> targets and also for checking if currently running physical plans will have
> back pressure issues with higher traffic rates.
>
> To do this I need to know what proportion of tuples are routed from each
> upstream instance to its downstream instances, which is a metric that Heron
> does not provide by default.
>
> *Proposal*
>
> I have implemented a custom metric to do what I need in my test topologies,
> it is a simple multi-count metric called "__receive-count" where the key
> now includes the "sourceTaskId" value (which you can get from the tuple
> instance) as well as the source component name and incoming stream name.
>
> This is basically the same as the default "__execute-count" metric but the
> metric name format is
> "__receive-count///"
> instead of "__execute-count//"
>
> So I see two options:
>
>1. Create a new "__receive-count" metric and leave the "__execute-count"
>alone
>2. Alter "__execute-count" to include the source task ID.
>
> *Questions*
>
> My first question is weather the metric name is parsed anywhere further
> down the line, such as aggregating component metrics in the metrics
> manager? So changing the name would break things?
>
> My second is if we do change "__execute-count" should we also add the
> source task ID to other bolt metrics like "__execute-latency" (it would be
> nice to see how latency changes by source instance --- this is a particular
> issue in two consecutive fields grouped components as instances will
> receive very different key distributions which could lead to very different
> processing latency).
>
> *Implementation*
>
> To add this to the default metrics (or change "__execute-count") seems like
> it would be reasonably straight forward (famous last words). We would need
> to modify the `FullBoltMetric` class to include the new metrics (if
> required) and edit the `FullBoltMetric.executeTuple` method to accept the
> "sourceTaskId" (which is already available in the
> "BoltInstance.readTuplesAndExecute" method) as a 4th argument.
>
> Obviously, we will need to do the same with the Python implementation. Will
> this also need to be changed in the Storm compatibility layer?
>
> *Conclusion*
>
> Having the information on where tuples are flowing is really important if
> we want to be able to do more intelligent routing and adaptive auto-scaling
> in the future and hopefully this one small change/extra metric won't add
> any significant processing overhead.
>
> I look forward to hearing what you think.
>
> Cheers,
>
> Tom Cooper
> W: www.tomcooper.org.uk  | Twitter: @tomncooper
> 
>



-- 
With my best Regards
--
Fu Maosong
Twitter Inc.
Mobile: +001-415-244-7520


[GitHub] srkukarni commented on issue #2692: Making emit, ack, and fail thread safe

2018-04-18 Thread GitBox
srkukarni commented on issue #2692: Making emit, ack, and fail thread safe
URL: https://github.com/apache/incubator-heron/pull/2692#issuecomment-382591000
 
 
   @nlu90 can you please take a look as well


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sreev closed pull request #2866: create 'asf-site' and redirect files

2018-04-18 Thread GitBox
sreev closed pull request #2866: create 'asf-site' and redirect files
URL: https://github.com/apache/incubator-heron/pull/2866
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.htaccess b/.htaccess
new file mode 100644
index 00..ded074df6f
--- /dev/null
+++ b/.htaccess
@@ -0,0 +1,2 @@
+Redirect /index.html https://apache.github.io/incubator-heron/
+
diff --git 
a/heron/healthmgr/src/java/org/apache/heron/healthmgr/HealthManagerMetrics.java 
b/heron/healthmgr/src/java/org/apache/heron/healthmgr/HealthManagerMetrics.java
deleted file mode 100644
index 1989fec457..00
--- 
a/heron/healthmgr/src/java/org/apache/heron/healthmgr/HealthManagerMetrics.java
+++ /dev/null
@@ -1,264 +0,0 @@
-// Copyright 2018 Twitter. All rights reserved.
-//
-// Licensed under the Apache License, Version 2.0 (the "License");
-// you may not use this file except in compliance with the License.
-// You may obtain a copy of the License at
-//
-// http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-package org.apache.heron.healthmgr;
-
-import java.io.IOException;
-import java.net.InetAddress;
-import java.net.UnknownHostException;
-import java.time.Duration;
-import java.util.Map.Entry;
-import java.util.logging.Level;
-import java.util.logging.Logger;
-
-import javax.inject.Inject;
-import javax.inject.Singleton;
-
-import com.google.protobuf.Message;
-
-import org.apache.heron.api.metric.MultiCountMetric;
-import org.apache.heron.common.basics.NIOLooper;
-import org.apache.heron.common.basics.SingletonRegistry;
-import org.apache.heron.common.config.SystemConfig;
-import org.apache.heron.common.network.HeronClient;
-import org.apache.heron.common.network.HeronSocketOptions;
-import org.apache.heron.common.network.StatusCode;
-import org.apache.heron.common.utils.metrics.JVMMetrics;
-import org.apache.heron.proto.system.Common;
-import org.apache.heron.proto.system.Metrics;
-
-/**
- * HealthMgr's metrics to be collect
- */
-@Singleton
-public class HealthManagerMetrics implements Runnable, AutoCloseable {
-  public static final String METRICS_THREAD = "HealthManagerMetrics";
-  private static final Logger LOG = 
Logger.getLogger(HealthManagerMetrics.class.getName());
-  private static final String METRICS_MGR_HOST = "127.0.0.1";
-
-  private final String metricsPrefix = "__healthmgr/";
-  private final String metricsSensor = metricsPrefix + "sensor/";
-  private final String metricsDetector = metricsPrefix + "detector/";
-  private final String metricsDiagnoser = metricsPrefix + "diagnoser/";
-  private final String metricsResolver = metricsPrefix + "resolver/";
-  private final String metricsName = metricsPrefix + "customized/";
-  private final JVMMetrics jvmMetrics;
-  private final MultiCountMetric executeSensorCount;
-  private final MultiCountMetric executeDetectorCount;
-  private final MultiCountMetric executeDiagnoserCount;
-  private final MultiCountMetric executeResolverCount;
-  private final MultiCountMetric executeCount;
-
-  private NIOLooper looper;
-  private HeronClient metricsMgrClient;
-
-  /**
-   * constructor to expose healthmgr metrics to local metricsmgr
-   * @param metricsMgrPort local MetricsMgr port
-   * @throws IOException
-   */
-  @Inject
-  public HealthManagerMetrics(int metricsMgrPort) throws IOException {
-jvmMetrics = new JVMMetrics();
-
-executeSensorCount = new MultiCountMetric();
-executeDetectorCount = new MultiCountMetric();
-executeDiagnoserCount = new MultiCountMetric();
-executeResolverCount = new MultiCountMetric();
-executeCount = new MultiCountMetric();
-
-looper = new NIOLooper();
-
-SystemConfig systemConfig =
-(SystemConfig) 
SingletonRegistry.INSTANCE.getSingleton(SystemConfig.HERON_SYSTEM_CONFIG);
-
-HeronSocketOptions socketOptions =
-new HeronSocketOptions(systemConfig.getInstanceNetworkWriteBatchSize(),
-systemConfig.getInstanceNetworkWriteBatchTime(),
-systemConfig.getInstanceNetworkReadBatchSize(),
-systemConfig.getInstanceNetworkReadBatchTime(),
-systemConfig.getInstanceNetworkOptionsSocketSendBufferSize(),
-systemConfig.getInstanceNetworkOptionsSocketReceivedBufferSize(),
-systemConfig.getInstanceNetworkOptionsMaximumPacketSize());
-metricsMgrClient =
-new SimpleMetricsManagerClient(looper, METRICS_MGR_HOST, 
metricsMgrPort, socketOptions);
-
-int interval = (int) 

A new resource config proposal

2018-04-18 Thread Ning Wang
In the discussions with a few developers, we have some new thoughts about
the resource config and packing algorithms in heron. Basically we want to
step back from the current configs and try to come up with a new solution
from user's point of view.

Here is a brief proposal document for reference and discussion.

https://docs.google.com/document/d/1huySPugRcR5LXmlBxCCtie9grkc-X67t57_R_tZM-4w/edit#

Please feel free to comment. Thanks in advance.

Regards,
--ning


[GitHub] nwangtw commented on a change in pull request #2866: create 'asf-site' and redirect files

2018-04-18 Thread GitBox
nwangtw commented on a change in pull request #2866: create 'asf-site' and 
redirect files
URL: https://github.com/apache/incubator-heron/pull/2866#discussion_r182325643
 
 

 ##
 File path: index.html
 ##
 @@ -0,0 +1,8 @@
+
+  
+http://www.example.org;>
 
 Review comment:
   kk thx.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sreev commented on a change in pull request #2866: create 'asf-site' and redirect files

2018-04-18 Thread GitBox
sreev commented on a change in pull request #2866: create 'asf-site' and 
redirect files
URL: https://github.com/apache/incubator-heron/pull/2866#discussion_r18236
 
 

 ##
 File path: index.html
 ##
 @@ -0,0 +1,8 @@
+
+  
+http://www.example.org;>
 
 Review comment:
   this file many not be needed with that content as .htaccess is enough. but 
using it as a place holder.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sreev commented on a change in pull request #2866: create 'asf-site' and redirect files

2018-04-18 Thread GitBox
sreev commented on a change in pull request #2866: create 'asf-site' and 
redirect files
URL: https://github.com/apache/incubator-heron/pull/2866#discussion_r182322042
 
 

 ##
 File path: 
heron/healthmgr/src/java/org/apache/heron/healthmgr/HealthManagerMetrics.java
 ##
 @@ -1,264 +0,0 @@
-// Copyright 2018 Twitter. All rights reserved.
 
 Review comment:
   it came in from merge conflict.
   
   CONFLICT (rename/delete): 
heron/healthmgr/src/java/com/twitter/heron/healthmgr/HealthManagerMetrics.java 
deleted in f444372de2749c8a6953ff98a5178550ec026cd8 and renamed to 
heron/healthmgr/src/java/org/apache/heron/healthmgr/HealthManagerMetrics.java 
in HEAD. Version HEAD of 
heron/healthmgr/src/java/org/apache/heron/healthmgr/HealthManagerMetrics.java 
left in tree.
   
   
   Deleted merge conflict for 
'heron/healthmgr/src/java/org/apache/heron/healthmgr/HealthManagerMetrics.java':
 {local}: created file
 {remote}: deleted
   Use (c)reated or (d)eleted file, or (a)bort? d
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nwangtw commented on issue #2847: Replace CHECK() with CHECK_*() so that when a check is failing, more …

2018-04-18 Thread GitBox
nwangtw commented on issue #2847: Replace CHECK() with CHECK_*() so that when a 
check is failing, more …
URL: https://github.com/apache/incubator-heron/pull/2847#issuecomment-382277581
 
 
   Thanks for reviewing.
   
   This was a super quick PR and incremental improvement. It seems to be more 
time consuming now. I will leave this PR here for now and come back when I have 
the time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nwangtw commented on a change in pull request #2847: Replace CHECK() with CHECK_*() so that when a check is failing, more …

2018-04-18 Thread GitBox
nwangtw commented on a change in pull request #2847: Replace CHECK() with 
CHECK_*() so that when a check is failing, more …
URL: https://github.com/apache/incubator-heron/pull/2847#discussion_r182320947
 
 

 ##
 File path: heron/common/src/cpp/network/baseconnection.cpp
 ##
 @@ -56,9 +56,10 @@ BaseConnection::BaseConnection(ConnectionEndPoint* 
endpoint, ConnectionOptions*
 }
 
 BaseConnection::~BaseConnection() {
-  CHECK(mState == INIT || mState == DISCONNECTED);
-  bufferevent_free(buffer_);
+  CHECK(mState == INIT || mState == DISCONNECTED)
+  << "Deleting connection object while it is still connected";
   disableRateLimit();  // To free the config object
+  bufferevent_free(buffer_);
 
 Review comment:
   Yeah. It is an unrelated change and has been merged in a different PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nwangtw commented on a change in pull request #2866: create 'asf-site' and redirect files

2018-04-18 Thread GitBox
nwangtw commented on a change in pull request #2866: create 'asf-site' and 
redirect files
URL: https://github.com/apache/incubator-heron/pull/2866#discussion_r182319801
 
 

 ##
 File path: index.html
 ##
 @@ -0,0 +1,8 @@
+
+  
+http://www.example.org;>
 
 Review comment:
   why refresh?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nwangtw commented on a change in pull request #2866: create 'asf-site' and redirect files

2018-04-18 Thread GitBox
nwangtw commented on a change in pull request #2866: create 'asf-site' and 
redirect files
URL: https://github.com/apache/incubator-heron/pull/2866#discussion_r182320076
 
 

 ##
 File path: index.html
 ##
 @@ -0,0 +1,8 @@
+
+  
+http://www.example.org;>
+  
+  
+The url of this site has been changed. Please update your bookmarks!
 
 Review comment:
   maybe "changed to http://...;? I am wondering if apache has a standard 
page with extra features like a better design, localization, etc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nwangtw commented on a change in pull request #2866: create 'asf-site' and redirect files

2018-04-18 Thread GitBox
nwangtw commented on a change in pull request #2866: create 'asf-site' and 
redirect files
URL: https://github.com/apache/incubator-heron/pull/2866#discussion_r182319743
 
 

 ##
 File path: 
heron/healthmgr/src/java/org/apache/heron/healthmgr/HealthManagerMetrics.java
 ##
 @@ -1,264 +0,0 @@
-// Copyright 2018 Twitter. All rights reserved.
 
 Review comment:
   why this file is here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sreev commented on issue #2866: create 'asf-site' and redirect files

2018-04-18 Thread GitBox
sreev commented on issue #2866: create 'asf-site' and redirect files
URL: https://github.com/apache/incubator-heron/pull/2866#issuecomment-382275780
 
 
   @kramasamy, removed the merge conflict file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kramasamy commented on issue #2864: S3Uploader improvements

2018-04-18 Thread GitBox
kramasamy commented on issue #2864: S3Uploader improvements
URL: https://github.com/apache/incubator-heron/pull/2864#issuecomment-382275638
 
 
   @ajorgensen - once the ci passes, we can merge it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services