Re: Metrics Instrumentation

2016-09-21 Thread Otis Gospodnetić
Hi Joaquin,
Hi Cody :)

If you need something that works out of the box, try SPM -
https://sematext.com/spm/

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



On Wed, Sep 21, 2016 at 9:31 AM, Cody Lee  wrote:

> If you need some topology metrics  and custom metrics, I’ve updated the
> older storm-metrics-statsd reporter for the 1.x version of storm:
> https://github.com/platinummonkey/storm-metrics-statsd and provided an
> example working with for example datadog here: https://github.com/
> platinummonkey/datadog-storm-metrics-example this would allow you to
> easily forward to ganglia, cloudwatch, prometheus,  has a statsd plugin>.
>
>
>
> Cody Lee
>
>
>
> *From: *Joaquin Menchaca 
> *Reply-To: *"user@storm.apache.org" 
> *Date: *Wednesday, September 21, 2016 at 12:24 AM
> *To: *"user@storm.apache.org" 
> *Subject: *Metrics Instrumentation
>
>
>
> Anyone have good notes, suggestions or know docs/blogs/etc on
> instrumenting Storm.  I wanted to try out some stuff, like Ganglia (old
> school), CloudWatch, and Prometheus.
>
>
>
> I saw a link on 1.0.2 docs, but it gets File Not Found (e.g.
> http://storm.apache.org/releases/1.0.2/storm-metrics-
> profiling-internal-actions.html)
>
> - Joaquin Menchaca
>
>
> --
>
>
> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>


Re: Submitting Topology Questions (2 small questions)

2016-09-21 Thread Ambud Sharma
Just the nimbus address and port

On Sep 21, 2016 6:50 PM, "Joaquin Menchaca"  wrote:

> What is the minimal storm.yaml configuration do I need for `storm jar ...
> remote`?
>
> Is there a command line option or way to specify locally crafted
> storm.yaml, e.g. /tmp/storm.yaml?
>
> --
>
> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>


Submitting Topology Questions (2 small questions)

2016-09-21 Thread Joaquin Menchaca
What is the minimal storm.yaml configuration do I need for `storm jar ...
remote`?

Is there a command line option or way to specify locally crafted
storm.yaml, e.g. /tmp/storm.yaml?

-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Re: Cannot submit topology in local mode on Storm 1.0.1

2016-09-21 Thread Jungtaek Lim
Unfortunately without the minimal code which can reproduce this I'm hard to
dig more.
I followed the path based on stack trace, and it would be
IllegalArgumentException, but it should have message which describes why it
is thrown.
Do you find more stack trace informations on that?

- Jungtaek Lim (HeartSaVioR)

2016년 9월 21일 (수) 오후 3:07, Chen Junfeng 님이 작성:

> I have tried on Storm 1.0.2, the same error occurs again:
>
>
>
> 4774 [main] WARN  o.a.s.d.nimbus - Topology submission exception.
> (topology name='DPI') #error {
>
> :cause nil
>
> :via
>
> [{:type org.apache.storm.generated.InvalidTopologyException
>
>:message nil
>
>:at
> [org.apache.storm.daemon.nimbus$fn__7166$exec_fn__2466__auto__$reify__7195$fn__7196
> invoke nimbus.clj 1462]}]
>
> :trace
>
> [[org.apache.storm.daemon.nimbus$fn__7166$exec_fn__2466__auto__$reify__7195$fn__7196
> invoke nimbus.clj 1462]
>
>
> [org.apache.storm.daemon.nimbus$fn__7166$exec_fn__2466__auto__$reify__7195
> submitTopologyWithOpts nimbus.clj 1459]
>
>
> [org.apache.storm.daemon.nimbus$fn__7166$exec_fn__2466__auto__$reify__7195
> submitTopology nimbus.clj 1531]
>
>   [sun.reflect.NativeMethodAccessorImpl invoke0
> NativeMethodAccessorImpl.java -2]
>
>   [sun.reflect.NativeMethodAccessorImpl invoke
> NativeMethodAccessorImpl.java 62]
>
>   [sun.reflect.DelegatingMethodAccessorImpl invoke
> DelegatingMethodAccessorImpl.java 43]
>
>   [java.lang.reflect.Method invoke Method.java 498]
>
>   [clojure.lang.Reflector invokeMatchingMethod Reflector.java 93]
>
>   [clojure.lang.Reflector invokeInstanceMethod Reflector.java 28]
>
>   [org.apache.storm.testing$submit_local_topology invoke testing.clj 301]
>
>   [org.apache.storm.LocalCluster$_submitTopology invoke LocalCluster.clj
> 49]
>
>   [org.apache.storm.LocalCluster submitTopology nil -1]
>
>   [MainTopology main MainTopology.java 182]]}
>
> 4774 [main] ERROR o.a.s.s.o.a.z.s.NIOServerCnxnFactory - Thread
> Thread[main,5,main] died
>
> org.apache.storm.generated.InvalidTopologyException
>
>at
> org.apache.storm.daemon.nimbus$fn__7166$exec_fn__2466__auto__$reify__7195$fn__7196.invoke(nimbus.clj:1462)
> ~[storm-core-1.0.2.jar:1.0.2]
>
>at
> org.apache.storm.daemon.nimbus$fn__7166$exec_fn__2466__auto__$reify__7195.submitTopologyWithOpts(nimbus.clj:1459)
> ~[storm-core-1.0.2.jar:1.0.2]
>
>at
> org.apache.storm.daemon.nimbus$fn__7166$exec_fn__2466__auto__$reify__7195.submitTopology(nimbus.clj:1531)
> ~[storm-core-1.0.2.jar:1.0.2]
>
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_102]
>
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_102]
>
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_102]
>
>at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_102]
>
>at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> ~[clojure-1.7.0.jar:?]
>
>at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
> ~[clojure-1.7.0.jar:?]
>
>at
> org.apache.storm.testing$submit_local_topology.invoke(testing.clj:301)
> ~[storm-core-1.0.2.jar:1.0.2]
>
>at
> org.apache.storm.LocalCluster$_submitTopology.invoke(LocalCluster.clj:49)
> ~[storm-core-1.0.2.jar:1.0.2]
>
>at org.apache.storm.LocalCluster.submitTopology(Unknown Source)
> ~[storm-core-1.0.2.jar:1.0.2]
>
>at MainTopology.main(MainTopology.java:182) ~[storm-dpi-0.0.1.jar:?]
>
> 4899 [timer] INFO  o.a.s.d.nimbus - Cleaning up api-6-1473314673
>
> 4916 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:api-6-1473314673-stormjar.jar)
>
> 4928 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:api-6-1473314673-stormconf.ser)
>
> 4935 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:api-6-1473314673-stormcode.ser)
>
> 4935 [timer] INFO  o.a.s.d.nimbus - Cleaning up DPI-15-1473765913
>
> 4946 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:DPI-15-1473765913-stormjar.jar)
>
> 4952 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:DPI-15-1473765913-stormconf.ser)
>
> 4958 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:DPI-15-1473765913-stormcode.ser)
>
> 4959 [timer] INFO  o.a.s.d.nimbus - Cleaning up DPI-13-1473749244
>
> 4968 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:DPI-13-1473749244-stormjar.jar)
>
> 4975 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:DPI-13-1473749244-stormconf.ser)
>
> 4981 [timer] INFO  o.a.s.d.nimbus -
> ExceptionKeyNotFoundException(msg:DPI-13-1473749244-stormcode.ser)
>
>
>
>
>
> *发件人: *Jungtaek Lim 
> *发送时间: *2016年9月20日 16:38
> *收件人: *user@storm.apache.org
> *主题: *Re: Cannot submit topology in local mode on Storm 1.0.1
>
>
> Hi Chen,
>
> Could you try running your topology with Storm 1.0.2 in local mode? Since
> it's in 

Re: Large number of very small streams

2016-09-21 Thread Ambud Sharma
Two solutions:

1. You can group users by some sort of classification and create topics
based on that then for each user the consumer can check if it's interested
in the topic and consumer or reject the messages.

2. If each user writes a lot of data then you can use the concept of key
based custom partitioner.

For consumption depending on wrote volume and velocity you may want to
consider using a database for viewing events per user.

On Sep 21, 2016 3:01 PM, "Ivan Gozali"  wrote:

> Hi everyone,
>
> I'm very new to Storm, and have read various documentation but haven't
> started using it.
>
> I have a use case where I could potentially have many users producing data
> points that are accumulated in one huge, single Kafka topic/Kinesis stream,
> and I was going to use Storm to "route" per-user mini-streams coming from
> this single huge stream to multiple processors.
>
> I was wondering how this use case is typically handled. I was going to
> create a topology (where the spout consumes the big stream) for each user's
> mini-stream that is then pushed to some new derived stream in
> Kinesis/Kafka, but this doesn't seem right, since there could be 100,000s,
> if not 1,000,000s of users and I would be creating 1,000,000 topics.
>
> Thanks in advance for any advice!
>
> --
> Regards,
>
>
> Ivan Gozali
>


Large number of very small streams

2016-09-21 Thread Ivan Gozali
Hi everyone,

I'm very new to Storm, and have read various documentation but haven't
started using it.

I have a use case where I could potentially have many users producing data
points that are accumulated in one huge, single Kafka topic/Kinesis stream,
and I was going to use Storm to "route" per-user mini-streams coming from
this single huge stream to multiple processors.

I was wondering how this use case is typically handled. I was going to
create a topology (where the spout consumes the big stream) for each user's
mini-stream that is then pushed to some new derived stream in
Kinesis/Kafka, but this doesn't seem right, since there could be 100,000s,
if not 1,000,000s of users and I would be creating 1,000,000 topics.

Thanks in advance for any advice!

--
Regards,


Ivan Gozali


Re: Metrics Instrumentation

2016-09-21 Thread Cody Lee
If you need some topology metrics  and custom metrics, I’ve updated the older 
storm-metrics-statsd reporter for the 1.x version of storm:  
https://github.com/platinummonkey/storm-metrics-statsd and provided an example 
working with for example datadog here: 
https://github.com/platinummonkey/datadog-storm-metrics-example this would 
allow you to easily forward to ganglia, cloudwatch, prometheus, .

Cody Lee

From: Joaquin Menchaca 
Reply-To: "user@storm.apache.org" 
Date: Wednesday, September 21, 2016 at 12:24 AM
To: "user@storm.apache.org" 
Subject: Metrics Instrumentation

Anyone have good notes, suggestions or know docs/blogs/etc on instrumenting 
Storm.  I wanted to try out some stuff, like Ganglia (old school), CloudWatch, 
and Prometheus.

I saw a link on 1.0.2 docs, but it gets File Not Found (e.g. 
http://storm.apache.org/releases/1.0.2/storm-metrics-profiling-internal-actions.html)
- Joaquin Menchaca

--

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Re: Syncing multiple streams to compute final result from a bolt

2016-09-21 Thread Harsh Choudhary
It is real-time. I get streaming JSONs from Kafka.



On Wed, Sep 21, 2016 at 4:15 AM, Ambud Sharma 
wrote:

> Is this real-time or batch?
>
> If batch this is perfect for MapReduce or Spark.
>
> If real-time then you should use Spark or Storm Trident.
>
> On Sep 20, 2016 9:39 AM, "Harsh Choudhary"  wrote:
>
>> My use case is that I have a json which contains an array. I need to
>> split that array into multiple jsons and do some computations on them.
>> After that, results from each json has to be used in further calculation
>> altogether and come up with the final result.
>>
>> *Cheers!*
>>
>> Harsh Choudhary / Software Engineer
>>
>> Blog / express.harshti.me
>>
>> [image: Facebook]  [image: Twitter]
>>  [image: Google Plus]
>> 
>>  [image: Linkedin]
>>  [image: Instagram]
>> 
>> [image: 500px]
>>  [image: github]
>> 
>>
>> On Tue, Sep 20, 2016 at 9:09 PM, Ambud Sharma 
>> wrote:
>>
>>> What's your use case?
>>>
>>> The complexities can be manage d as long as your tuple branching is
>>> reasonable I.e. 1 tuple creates several other tuples and you need to sync
>>> results between them.
>>>
>>> On Sep 20, 2016 8:19 AM, "Harsh Choudhary"  wrote:
>>>
 You're right. For that I have to manage the queue and all those
 complexities of timeout. If Storm is not the right place to do this then
 what else?



 On Tue, Sep 20, 2016 at 8:25 PM, Ambud Sharma 
 wrote:

> The correct way is to perform time window aggregation using bucketing.
>
> Use the timestamp on your event computed from.various stages and send
> it to a single bolt where the aggregation happens. You only emit from this
> bolt once you receive results from both parts.
>
> It's like creating a barrier or the join phase of a fork join pool.
>
> That said the more important question is is Storm the right place do
> to this? When you perform time window aggregation you are susceptible to
> tuple timeouts and have to also deal with making sure your aggregation is
> idempotent.
>
> On Sep 20, 2016 7:49 AM, "Harsh Choudhary" 
> wrote:
>
>> But how would that solve the syncing problem?
>>
>>
>>
>> On Tue, Sep 20, 2016 at 8:12 PM, Alberto São Marcos <
>> alberto@gmail.com> wrote:
>>
>>> I would dump the *Bolt-A* results in a shared-data-store/queue and
>>> have a separate workflow with another spout and Bolt-B draining from 
>>> there
>>>
>>> On Tue, Sep 20, 2016 at 9:20 AM, Harsh Choudhary <
>>> shry.ha...@gmail.com> wrote:
>>>
 Hi

 I am thinking of doing the following.

 Spout subscribed to Kafka and get JSONs. Spout emits the JSONs as
 individual tuples.

 Bolt-A has subscribed to the spout. Bolt-A creates multiple JSONs
 from a json and emits them as multiple streams.

 Bolt-B receives these streams and do the computation on them.

 I need to make a cumulative result from all the multiple JSONs
 (which are emerged from a single JSON) in a Bolt. But a bolt static
 instance variable is only shared between tasks per worker. How do 
 achieve
 this syncing process.

   --->
 Spout ---> Bolt-A   --->   Bolt-B  ---> Final result
   --->

 The final result is per JSON which was read from Kafka.

 Or is there any other way to achieve this better?

>>>
>>>
>>

>>