Re: Metrics Instrumentation

2016-09-20 Thread Erik Weathers
https://github.com/apache/storm/search?utf8=%E2%9C%93=profiling=Code -> https://github.com/apache/storm/blob/0cade64f3a8a6cc83a6ff3098b71979e685e9dec/docs/storm-metrics-profiling-internal-actions.md On Tue, Sep 20, 2016 at 10:24 PM, Joaquin Menchaca wrote: > Anyone have

Metrics Instrumentation

2016-09-20 Thread Joaquin Menchaca
Anyone have good notes, suggestions or know docs/blogs/etc on instrumenting Storm. I wanted to try out some stuff, like Ganglia (old school), CloudWatch, and Prometheus. I saw a link on 1.0.2 docs, but it gets File Not Found (e.g. http://storm.apache.org/releases/1.0.2/storm-metrics-

Re: Cannot submit topology in local mode on Storm 1.0.1

2016-09-20 Thread Joaquin Menchaca
What happens if you run it in single / local mode on a supervisor? On Tue, Sep 20, 2016 at 1:37 AM, Jungtaek Lim wrote: > Hi Chen, > > Could you try running your topology with Storm 1.0.2 in local mode? Since > it's in local mode you can easily try it out. > > Thanks, >

Storm 1.0.2 Monitoring Docs

2016-09-20 Thread Joaquin Menchaca
http://storm.apache.org/releases/1.0.2/storm-metrics-profiling-internal-actions.html Not Found The requested URL /releases/1.0.2/storm-metrics-profiling-internal-actions.html was not found on this server. -- 是故勝兵先勝而後求戰,敗兵先戰而後求勝。

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
Is this real-time or batch? If batch this is perfect for MapReduce or Spark. If real-time then you should use Spark or Storm Trident. On Sep 20, 2016 9:39 AM, "Harsh Choudhary" wrote: > My use case is that I have a json which contains an array. I need to split > that

Re: Who needs more memory?

2016-09-20 Thread Ambud Sharma
Allocate RAM for workers that are launched on supervisor nodes. Workers do the heavy lifting and are the component that actually run your topology. On Sep 20, 2016 11:51 AM, "Thomas Cristanis" wrote: > I am using the storm for an academic experiment and have a

Who needs more memory?

2016-09-20 Thread Thomas Cristanis
I am using the storm for an academic experiment and have a question. Where it is necessary to allocate more memory (RAM) for Zookeeper, Nimbus or the supervisors? Why?

Storm 1.0.2 - KafkaSpout not updating the offset/retrying tuples

2016-09-20 Thread Dominik Safaric
H, I’ve implemented a topology consisting of a spout, processing bolt and a sink bolt pushing data back to Kafka. By examining the logs, I’ve seen that for all of the 12 partitions the totalSpoutLag and totalLatestTimeOffset remain constant (i.e. approximately at 830K), although tuples are

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
My use case is that I have a json which contains an array. I need to split that array into multiple jsons and do some computations on them. After that, results from each json has to be used in further calculation altogether and come up with the final result. *Cheers!* Harsh Choudhary / Software

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
What's your use case? The complexities can be manage d as long as your tuple branching is reasonable I.e. 1 tuple creates several other tuples and you need to sync results between them. On Sep 20, 2016 8:19 AM, "Harsh Choudhary" wrote: > You're right. For that I have to

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
You're right. For that I have to manage the queue and all those complexities of timeout. If Storm is not the right place to do this then what else? On Tue, Sep 20, 2016 at 8:25 PM, Ambud Sharma wrote: > The correct way is to perform time window aggregation using

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
The correct way is to perform time window aggregation using bucketing. Use the timestamp on your event computed from.various stages and send it to a single bolt where the aggregation happens. You only emit from this bolt once you receive results from both parts. It's like creating a barrier or

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
But how would that solve the syncing problem? On Tue, Sep 20, 2016 at 8:12 PM, Alberto São Marcos wrote: > I would dump the *Bolt-A* results in a shared-data-store/queue and have a > separate workflow with another spout and Bolt-B draining from there > > On Tue, Sep 20,

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Alberto São Marcos
I would dump the *Bolt-A* results in a shared-data-store/queue and have a separate workflow with another spout and Bolt-B draining from there On Tue, Sep 20, 2016 at 9:20 AM, Harsh Choudhary wrote: > Hi > > I am thinking of doing the following. > > Spout subscribed to

Re: Cannot submit topology in local mode on Storm 1.0.1

2016-09-20 Thread Jungtaek Lim
Hi Chen, Could you try running your topology with Storm 1.0.2 in local mode? Since it's in local mode you can easily try it out. Thanks, Jungtaek Lim (HeartSaVioR) 2016년 9월 20일 (화) 오후 5:26, Chen Junfeng 님이 작성: > My topology runs well in cluster mode but throws exceptions

Cannot submit topology in local mode on Storm 1.0.1

2016-09-20 Thread Chen Junfeng
My topology runs well in cluster mode but throws exceptions when it is submitted in local mode. It makes the debug work so tough. The error is: 4842 [main] INFO o.a.s.l.ThriftAccessLogger - Request ID: 1 access from: principal: operation: submitTopology 4857 [main] WARN o.a.s.d.nimbus -

Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
Hi I am thinking of doing the following. Spout subscribed to Kafka and get JSONs. Spout emits the JSONs as individual tuples. Bolt-A has subscribed to the spout. Bolt-A creates multiple JSONs from a json and emits them as multiple streams. Bolt-B receives these streams and do the computation