Re: Re: Worker Profiling

2017-08-09 Thread 王 纯超
Hi Bobby,
   I am using Storm 1.1.0. Oh, I forgot. F.Y.I, before I turn on the feature, 
the section is displayed. However, according to kishorvpatil's 
comment(https://issues.apache.org/jira/browse/STORM-1157?jql=text%20~%20%22Worker%20profiling%22)
 , it shouldn't. Is this a problem? In addition, in the first place, I forgot 
to sync storm.yaml to all worker nodes. After sync, the Storm cluster can be 
started but topologies can not run. So I checked logs and found there were a 
lot of errors about Netty. Connection is refused to some workers. And some of 
my workers have OpenJDK installed. I do not know whether there is causal link 
here. If I want to use a different profiler, how to set 
worker.profiler.command? Set to path relative to $JAVA_HOME/bin? And 
worker.profiler.childopts?


wangchunc...@outlook.com

From: Bobby Evans
Date: 2017-08-09 22:07
To: user@storm.apache.org
Subject: Re: Worker Profiling
I'm not sure.  That should be enough.  There might be a bug in the ui code 
though.  What version of storm are you using?  Also did you select any workers 
to profile?


- Bobby



On Tuesday, August 8, 2017, 8:53:10 PM CDT, ? ??  
wrote:


Hi,
I want to enable Storm's worker profiling feature, but failed. The action 
buttons in Profiling and Debugging section are disabled. Below is the related 
configuration. If my configuration is wrong, would somebody tell me how to do 
this?
[cid:_Foxmail.1c532722-148e-0b8c-b328-140890edf84e][cid:_Foxmail.328aeafb-1a69-649c-f721-e29572b1aca7]

wangchunc...@outlook.com


Re: Fields grouping

2017-08-09 Thread Jakes John
" Using direct grouping will let the bolt upstream of the ES writing bolt
decide which ES bolt receives a given message. So you could have spouts ->
sorter bolts -> ES bolts, where sorter bolts use direct grouping to
partition the stream by index id in whatever way you need. "
   What is the best way to use direct grouping in a dynamic way?
For eg: The distribution of index ids will be different across time. I
might need more threads for a index during one point while lesser threads
during the other

On Tue, Aug 8, 2017 at 1:35 AM, Stig Rohde Døssing  wrote:

> You can implement your own grouping by using direct grouping (from
> http://storm.apache.org/releases/2.0.0-SNAPSHOT/Concepts.html): "*Direct
> grouping*: This is a special kind of grouping. A stream grouped this way
> means that the *producer* of the tuple decides which task of the consumer
> will receive this tuple. Direct groupings can only be declared on streams
> that have been declared as direct streams. Tuples emitted to a direct
> stream must be emitted using one of the [emitDirect](javadocs/org/
> apache/storm/task/OutputCollector.html#emitDirect(int, int,
> java.util.List) methods. A bolt can get the task ids of its consumers by
> either using the provided TopologyContext
> 
> or by keeping track of the output of the emit method in OutputCollector
> 
> (which returns the task ids that the tuple was sent to)."
>
> Using direct grouping will let the bolt upstream of the ES writing bolt
> decide which ES bolt receives a given message. So you could have spouts ->
> sorter bolts -> ES bolts, where sorter bolts use direct grouping to
> partition the stream by index id in whatever way you need.
>
> On another note, I want to say that ES' bulk API supports writing to
> multiple indices in one go, so if you haven't already you should benchmark
> to see what the performance penalty of mixing indices in one bulk API call
> would be. If the penalty isn't much, you might be fine with shuffle
> grouping still.
>
> 2017-08-08 2:46 GMT+02:00 Jakes John :
>
>> Hi,
>>
>>I need to have a streaming pipeline  Kafka->storm-> ElasticSearch.
>> The volume of message produced to Kafka is in order of  millions. Hence, I
>> need to have maximum throughput in Elasticsearch writes.  Each message has
>> an id which is mapped to a Elasticsearch index.  The number of possible
>> message ids possible are less than 50(which means, max number of created ES
>> indices). I would like to batch the ES writes where messages are grouped by
>> index *as much as possible*.
>> The problem is that message counts per id are dynamic and certain ids can
>> have very huge message inflows when compared to other.  Largest message id
>> can have > 10x message inflow than the smallest. Hence, shuffle grouping on
>> ids doesn't work here.  Partial key grouping also won't work as I need more
>> number of output streams for largest message ids.
>>
>> eg:  i have 10 tasks that write to ES
>>
>>
>> Say all my messages are spread across 2 message ids - ID1, ID2 which I
>> have to write to 2 separate index in ES
>>
>> Say ID1 has 4 times more messages than ID2 at one instant
>>
>> So, the best possible output would be,
>> First 8 tasks writes messages with ID1 to ES
>> Last 2 tasks writes messages with ID2 to ES
>>
>>
>> Say at a different instant,  ID1 has same number of messages as ID2
>>
>> So, the best possible output would be,
>> First 5 tasks writes messages with ID1 to ES
>> Last 5 tasks writes messages with ID2 to ES
>>
>>
>> My grouping requirement is just an optimization but it is not a
>> requirement.   What is the best way where I can group messages
>> *dynamically* on input streams with hugely varying message counts* in
>> the best way possible*?  Also, I have the control over creating message
>> ids if it helps the data distribution
>>
>
>


[CVE-2017-9799] Apache Storm Possible Code Execution As A Different User

2017-08-09 Thread P. Taylor Goetz
Severity: High

Vendor: The Apache Software Foundation

Versions Affected:
Apache Storm 1.0.0, 1.0.1, 1.0.2, 1.0.3
Apache Storm 1.1.0

Description:
It was found that under some situations and configurations of storm it is 
theoretically possible for the owner of a topology to trick the supervisor to 
launch a worker as a different, non-root, user. In the worst case this could 
lead to secure credentials of the other user being compromised.  This 
vulnerability only applies to Apache Storm installations with security 
components enabled.

Mitigation:
Users of the affected versions should apply one of the following mitigations:

- Upgrade to Apache Storm 1.0.4 or later
- Upgrade to Apache Storm 1.1.1 or later

Apache Storm 1.1.1 and 1.0.4 can be downloaded here:

http://storm.apache.org/downloads.html

Credit:
This issue was identified by the Apche Storm PMC

References:
https://github.com/apache/storm/blob/v1.1.1/SECURITY.md 

https://github.com/apache/storm/blob/v1.0.4/SECURITY.md 




Re: Stream groupings

2017-08-09 Thread Bobby Evans
1.  PartialKeyGrouping is preferable is you have a data skew problem.  But it 
is not a drop in replacement.  You often need an extra layer of bolts to come 
up with the same answer in the end.2. Yes it can.  We are working on a better 
option that takes network distance and load into account when deciding where to 
send a message, but that is still a work in progress.


- Bobby


On Tuesday, August 8, 2017, 8:24:10 PM CDT, ? ??  
wrote:

#yiv6558257798 body {line-height:1.5;}#yiv6558257798 body 
{font-size:10.5pt;font-family:'Microsoft YaHei UI';color:rgb(0, 0, 
0);line-height:1.5;}Hi,   
   - Is Partial Key grouping always preferable than Fields grouping?
   - Will Local or shuffle grouping cause load imbalance between tasks?

wangchunc...@outlook.com

Re: Worker Profiling

2017-08-09 Thread Bobby Evans
I'm not sure.  That should be enough.  There might be a bug in the ui code 
though.  What version of storm are you using?  Also did you select any workers 
to profile?


- Bobby


On Tuesday, August 8, 2017, 8:53:10 PM CDT, ? ??  
wrote:

#yiv8119227291 body {line-height:1.5;}#yiv8119227291 body 
{font-size:10.5pt;font-family:'Microsoft YaHei UI';color:rgb(0, 0, 
0);line-height:1.5;}Hi,    I want to enable Storm's worker profiling feature, 
but failed. The action buttons in Profiling and Debugging section are disabled. 
Below is the related configuration. If my configuration is wrong, would 
somebody tell me how to do this?wangchunc...@outlook.com

Re: 回复: RE: How to Improve Storm Application's Throughput

2017-08-09 Thread Brian Taylor
Unsubscribe

⁣Sent from BlueMail ​

On Aug 9, 2017, 8:40 AM, at 8:40 AM, "Hannum, Daniel" 
 wrote:
>I think the problem is that capacity of 3.5. That indicates that
>there’s a backlog on that bolt, so it’s saying that actual time spent
>processing in the bolt is small, but the total time spent (including
>wait time) is large. Scale the bolts up or scale the spout down or make
>the bolt faster
>
>From: "fanxi...@travelsky.com" 
>Reply-To: "user@storm.apache.org" 
>Date: Tuesday, August 8, 2017 at 11:02 PM
>To: user , libo19 , kabhwan
>
>Subject: 回复: RE: How to Improve Storm Application's Throughput
>
>This email did not originate from the Premier, Inc. network. Use
>caution when opening attachments or clicking on URLs.*
>
>
>.
>Hi LiBo, Jungtaek :
>
>Yes, storm tunning depends on situations. Thank u for your kindly
>advice.
>The follow is one of my situations, any hints from you will be
>appreciated. The storm version is 1.0.0.
>The topology just has a spout which reads message from kafka and a bolt
>to parse the message and put it into the hbase.
>[cid:image001.png@01D310E7.19BFB400]
>As you can see from the above picture, the Execute latency of the bolt
>is small(0.5ms), but the Complete latency is much more larger(4365ms),
>so as to slow down the throughput of the topology.
>Which part will consume so much additional time? the transfer between
>the spout and the bolt ? or the ack part? I tried to increase
>parallelism for the component, but it did not work.
>Is there a tool to analyze the time consumption in general? It will be
>a great news to know it.
>
>There is another thing to explain in the above picture. It seems that
>the Capacity is high as 1.617, but there are 64 bolts, most Capacity of
>it is low, as picture below shows.
>[cid:image002.png@01D310E7.19BFB400]
>[cid:image003.png@01D310E7.19BFB400]
>So, another puzzle is both the Execute latency and the Executed is
>about equal, but the Capacity turns out to be so much different.  Any
>hints?
>
>The follow is another topology.
>[cid:image004.png@01D310E7.19BFB400]
>Maybe the history_Put bolt has both high Capacity and larger Execute
>latency, this would definitely lead to the Complete latency as 56964ms?
>
>Thank you all for your time.
>
>
>
>Joshua
>
>
>
>发件人: 李波
>发送时间: 2017-08-07 16:56
>收件人: user@storm.apache.org
>抄送:
>jiangyc_cui...@si-tech.com.cn;
>'zhangxl_bds'
>主题: RE: How to Improve Storm Application's Throughput
>你好!
>
>Storm的性能排查过程需要不断的尝试最后达到一个经验值,我个人的排查过程如下,希望可以有一些帮助:
>
>1、Kafka PartitionNumber and  KafkaSpout’s parallelism
>首先你要确认是KafkaSpout的接收能力不行导致的延迟,还是由于后续的bolt处理能力有限造成拥堵,导致的上游KafkaSpout也一起拥堵造成的延迟
>KafkaSpout接收能力不行的话,需要增加Kakfa的分区数量,同时把KafkaSpout的并行度设置为和分区数量一致,这样逐步提升以达到吞吐要求
>
>2、Bolt’s business logical and Bolt’s parallelism
>首先看下是不是硬件资源不行了。。
>其次,需要查出拥堵在哪一个bolt造成的拥堵,可以配合Storm的那个动态图来判断越红拥堵越厉害,同时参考Capacity这个值来判断超过1的Bolt或多或少会出现拥堵的情况
>根据自己的算法来优化处理逻辑提升效率,或者通过增加Bolt并行度的方式来提升Bolt处理能力(前提是硬件资源没有到上限)
>
>从你的这个业务来看应该是后续的几个Bolt要访问外部的数据存储系统进行最终结果的存储,关注一下存取数据是否延迟较大,目标系统压力比较大
>
>多数情况都是由于Bolt处理能力不够造成的,需要找出压力点优化业务处理逻辑和同时调整Bolt并行度
>关注下bolt的逻辑是计算密集型还是以来外部io,计算密集型只能增加worker
>
>3、Config.TOPOLOGY_BACKPRESSURE_ENABLE
>是否开启了反压机制Config.TOPOLOGY_BACKPRESSURE_ENABLE (好像是storm1.0.0以后才有的)
>
>4、Netty
>另外要想提升Storm在底层传输上的吞吐量,可以通过修改storm.yaml的netty配置,来提升netty的发送批量大
>
>5、Executor‘s throughput params
>// Net io set
>config.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 1024 * 16); // default
>is 1024
>config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 1024 * 16);//
>batched; default is 1024
>config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 1024 * 16); //
>individual tuples; default is 1024
>config.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, 200);
>
>
>
>李波 13813887096 lib...@asiainfo.com
>北京亚信智慧数据科技有限公司
>亚信是我家 发展靠大家
>
>From: 王 纯超 [mailto:wangchunc...@outlook.com]
>Sent: 2017年8月7日 10:58
>To: user 
>Cc: 姜艳春jiangyc_cui...@si-tech.com.cn ;
>zhangxl_bds 
>Subject: How to Improve Storm Application's Throughput
>
>Hi,
>
>I am now considering improve a Storm application's throughput because I
>find that the consumption speed of KafkaSpout is slower than the
>producing speed. And the lag gets larger and larger. Below is the bolt
>statistics. I tried to bring forward the tuple projection and filtering
>logic in a custom scheme with intention of reducing network traffic.
>However, after observation, things go contrary to my wishes. Am I going
>the wrong way? Are there any principles tuning Storm applications? Or
>could anyone give some suggestions for this specific case?
>[cid:image005.jpg@01D310E7.19BFB400]
>
>wangchunc...@outlook.com


unsubscribe

2017-08-09 Thread Nurul Ferdous
unsubscribe