Re: error in the cluster

2015-05-15 Thread Matthias J. Sax
You should check the log files to see the complete stack trace of the exceptions. That should held to identify the problem. -Matthias On 05/15/2015 09:52 PM, Hadi Sotudeh wrote: > I've three Vms. > Now, I've run the commands you've said. > -

Re: Can't alllocate memory in the cluster

2015-05-18 Thread Matthias J. Sax
Google is your friend in this case ;) The problem is not related to Storm. You need to reconfigure your OS. https://confluence.atlassian.com/display/FISHKB/java.io.IOException%3A+error%3D12,+Cannot+allocate+memory https://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error

Re: How Storm Process tuples atleast once?

2015-05-18 Thread Matthias J. Sax
I just want to add, that duplicates in the result are easily possible. Let's say, the last bolt (ie, a sink), writes some tuples to a file before acking it. If the write is successful and the bolt fails for some reason before the ack goes through, the tuple will be replayed and written to the file

Re: Question concerning process latency and number of acked

2015-05-21 Thread Matthias J. Sax
Hi, Storm processes multiple tuple in an "overlapping" manner, ie, emitting from spout, network transfer, processing at bolt is fully pipelined and multiple tuples are in the pipeline at the same time. Additionally, this pipeline contains multiple buffers and thread transferring the tuple from buf

Re: Whats the alternative to creating a fat jar for topology deployment

2015-05-21 Thread Matthias J. Sax
You can put your dependency jars into Storm's jar folder (eg /opt/storm-0.9.4/lib/). -Matthias On 05/21/2015 04:16 PM, rajesh_kall...@dellteam.com wrote: > *Dell - Internal Use - Confidential * > > I am aware of the local mode, was looking for alternatives of on the > cluster testing. > > >

Re: Supervisor / Worker information on CustomStreamGrouping

2015-05-22 Thread Matthias J. Sax
Hi, I think it is a ticky problem you want to solve. The TopologyContext object does not give enough information to get it done. Maybe you can get it done, by implementing a custom scheduler. And example how to implement a custom Scheduler is given here: https://xumingming.sinaapp.com/885/twitter

Re: How Spout emits multiple Tuples!

2015-05-22 Thread Matthias J. Sax
It can. You can call collector.emit() any number of times during a single .nextTuple() call. However, it might be bad practice to do it. In your case, I would no recommend to read the whole file in a single .nextTuple() call, but emit only a single tuple. -Matthias On 05/22/2015 06:27 AM, prasad

Aeolus 0.1 available

2015-05-26 Thread Matthias J. Sax
Dear Storm community, we would like to share our project Aeolus with you. While the project is not finished, our first component --- a transparent batching layer --- is available now. Aeolus' batching component, is a transparent layer that can increase Storm's throughput by an order of magnitude

Re: custom scheduler ClassNotFoundException

2015-05-28 Thread Matthias J. Sax
Put our Scheduler into a jar and put the jar into the lib folder of your Storm installation (eg, /opt/storm-0.9.3/lib). -Matthias On 05/28/2015 12:40 PM, Franca van Kaam wrote: > As indicated in the tutorial I put this in the storm.yaml file of the > nimbus node: > |storm.scheduler: ||"storm.Ca

Re: custom scheduler ClassNotFoundException

2015-05-28 Thread Matthias J. Sax
Hi, you build the jar in the wrong way: 2) You need to package the class file, not the source file 2) "CameraScheduler" in in package "storm". Thus "CameraScheulder.class" must in in directory "storm" within the jar The correct command would be: jar cvf CameraScheduler.jar storm/CameraScheu

Re: [DISCUSS] Drop Support for Java 1.6 in Storm 0.10.0

2015-06-01 Thread Matthias J. Sax
I thinks it is a good idea to drop Java 6. It reached it's life cycle already 2 year ago. -Matthias On 06/01/2015 08:37 PM, P. Taylor Goetz wrote: > CC user@ > > I’d like to poll the community about the possibility of dropping support for > Java 1.6 in the Storm 0.10.0 release. To date, we hav

Re: Storm Fails Many A lot of Values

2015-06-04 Thread Matthias J. Sax
Hi, You are looping within "nextTuple()" to emit a tuple for each lines for the whole file. This is "bad practice" because the spout is prevented to take "acks" while "nextTuple()" is executing. I guess, that is the reason why your tuples time out and fail. You should return from "nextTuple()" af

Re: Storm Fails Many A lot of Values

2015-06-05 Thread Matthias J. Sax
ot;data")) { >> >> streamId = tuple.getStringByField("streamId"); >> time = tuple.getStringByField("timestamp"); >> value = Float.parseFloat(tuple.getStringByField("value")); >> >> univer

Re: Storm Fails Many A lot of Values

2015-06-05 Thread Matthias J. Sax
ach emit? > Add return; after emiting tuple os something else i haven’t understood? > Could you post a code sample? > >> On 5 Ιουν 2015, at 14:44, Matthias J. Sax >> mailto:mj...@informatik.hu-berlin.de>> >> wrote: >> >> Hi, >> >> I don

Re: Storm out of memory error while deploying jar

2015-06-05 Thread Matthias J. Sax
I would set the needed JVM arguments in storm.yaml file. This must be done on every worker node. worker.childopts: "-Xmx4096m" ( or maybe supervisor.childopts: "-Xmx496m" ) -Matthias On 06/05/2015 05:11 PM, Prakash Ramesh Dayaramani wrote: > Hi All, > > I am trying to deploy jar

Re: Question on Parallelsim

2015-06-09 Thread Matthias J. Sax
One comment: The suggestion to use a single worker to avoid overhead is basically right. It only has the drawback of coarse grained fault-tolerance -- if the worker JVM goes done, be one bad behaving spout/bolt, all other spouts/bolts die, too. Also keep in mind, that a worker will only process spo

Re: The argument in setSpout()

2015-06-09 Thread Matthias J. Sax
As the name suggests (parallelism_hint), it is the number of parallel spout instances you want to start. Of course, the UDF code must be parallelizable, eg, different instances should emit different data. If the UDF code is not parallelizeable (eg, the UDF reads a single file), using a parallelism

Re: Network topology in Storm?

2015-06-14 Thread Matthias J. Sax
Hi, Storm has no support for this natively. However, you can implement a custom scheduler. See https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/ -Matthias On 06/14/2015 11:06 AM, Jiaming Lin wrote: > Hi > > I am just starting to learn Storm and streaming

Re: Load balance biased shuffle grouping?

2015-06-14 Thread Matthias J. Sax
Hi, the idea from Mike and Nathan does not apply to your problem because in your case the different execution times do not depend on the tuples but on the executors. Thus, on the producer side you cannot separate "slow" tuples from "fast" tuples. If you can identify the "slow" executors, you can

Re: default scheduler

2015-06-14 Thread Matthias J. Sax
Hi, as far as I understand it, it is a round-robin scheduling over all available workers. -Matthias On 06/14/2015 01:31 PM, Franca van Kaam wrote: > Hello, > > I am trying to understand how the default scheduler distributes the > tasks. Are they evenly distributed throughout the cluster or do

Re: DYNAMIC ADJUSTMENT OF NUMBER OF TASKS

2015-06-19 Thread Matthias J. Sax
Just want to clarify: The number of task is not the number parallel running bolt instances (called executors, which are threads). So I don't understand why you don't want to start with the maximum number of tasks? There should be almost no overhead if you have more tasks than executors (executors c

Re: DYNAMIC ADJUSTMENT OF NUMBER OF TASKS

2015-06-19 Thread Matthias J. Sax
rm. > > Thanks a lot. > > On 19/06/2015 6:59 pm, "Matthias J. Sax" <mailto:mj...@informatik.hu-berlin.de>> wrote: > > Just want to clarify: The number of task is not the number parallel > running bolt instances (called executors, which are thread

Re: DYNAMIC ADJUSTMENT OF NUMBER OF TASKS

2015-06-19 Thread Matthias J. Sax
). -Matthias On 06/19/2015 01:01 PM, Harshit Gupta wrote: > That's what. I want to have an arbitrary degree of parallelism. I don't > wish to hard code it. The current release doesn't allow that, isn't it ? > > On 19/06/2015 8:55 pm, "Matthias J. Sax" <

Re: Rebalancing executors

2015-06-22 Thread Matthias J. Sax
>> So, it appears the expectation is to overprovision the number of tasks, >> start with minimal number of executors, and then grow executors to >> achieve parallelism as workload increases. Is this right ? Yes. This is correct. However, over provisioning the number of tasks does not result in ov

Re: DYNAMIC ADJUSTMENT OF NUMBER OF TASKS

2015-06-23 Thread Matthias J. Sax
ess we could do something like S4 where every key got a new bolt > instance, but then they had a lot of issues with check-pointing all > of these bolt instances and swapping them out. They also didn't > allow for pluggable groupings. Everything was keyed grouping. >

Re: Fwd: Reshare: Uneven distribution with shuffle grouping

2015-06-23 Thread Matthias J. Sax
I don't see any in-balance. The value of "Executed" is 440/460 for each bolt. Thus each bolt processed about the same number of tuples. Shuffle grouping does a round robin distribution and balances the number of tuples sent to each receiver. I you refer to the values "capactiy", "execute latency"

Re: Fwd: Reshare: Uneven distribution with shuffle grouping

2015-06-24 Thread Matthias J. Sax
mailto:ncle...@gmail.com>> wrote: > > Also to clarify, unless you change the sample frequency the counts > in the ui are not precise. Note that they are all multiples of 20. > > On Jun 23, 2015 7:16 AM, "Matthias J. Sax" > <mailto:mj...@in

Re: Fwd: Reshare: Uneven distribution with shuffle grouping

2015-06-25 Thread Matthias J. Sax
ne guide us on how to build such this custom grouping? > > > > On Wed, Jun 24, 2015 at 1:40 PM, Matthias J. Sax > mailto:mj...@informatik.hu-berlin.de>> > wrote: > > Worried might not be the right term. However, as a rule of thumb, > capacity should

Re: Fwd: Reshare: Uneven distribution with shuffle grouping

2015-06-25 Thread Matthias J. Sax
ount the loads of > the bolts it sends to? Is it possible to modify it to not include a key? > > Alternatively, If I partialkeygroup on a unique key would that balance > my load? > > On Thu, Jun 25, 2015 at 2:24 PM, Matthias J. Sax > mailto:mj...@informatik.hu-berlin.de>> >

Re: Fwd: Reshare: Uneven distribution with shuffle grouping

2015-06-29 Thread Matthias J. Sax
t? > > Thanks and Regards > Aditya Rajan > > On Thu, Jun 25, 2015 at 6:05 PM, Matthias J. Sax > mailto:mj...@informatik.hu-berlin.de>> > wrote: > > No. This does not work for you. > > PartialKeyGrouping does a count based load balancing. Thus, it is

Re: Unit Testing Topologies

2015-06-29 Thread Matthias J. Sax
I would recommend to write a unit test for each spout/bolt and an integration test for the whole topology using LocalCluster. -Matthias On 06/29/2015 07:01 AM, Rakeshsharma PR wrote: > > > > > Greetings!! > > > > > > Hi > > > > I am new to storm. I am trying to wri

Re: How to send acknowledgement to clients using storm?

2015-07-01 Thread Matthias J. Sax
Hi, I guess you refer to Apache Storm (not Spark ;)) In Storm, there is no special support to notify your client (as far as I know). The right place to implement the notification should be the method Spout.ack(). I guess you are using a provided KafkaSpout. Thus, you could extend KafkaSpout with

Re: What happens when a Storm supervisor goes down?

2015-07-06 Thread Matthias J. Sax
https://storm.apache.org/documentation/Fault-tolerance.html On 07/06/2015 07:47 AM, Thilina Rathnayake wrote: > I recently got to know about storm and I find it really interesting. I > want to know > what happens when a storm supervisor goes down. > > I am going through the following article to

Re: Problem to recept massive tuples

2015-07-08 Thread Matthias J. Sax
Hi, this sounds weird... Storm should apply back pressure an slow down the spout if bolts cannot keep up... Have you tried in increase the dop of the bolt? Which bolt is the problematic one? NumberAvg or TimeGlobalAvg? You might also checkout out `max spout pending` property to solve the problem:

Re: Problem to recept massive tuples

2015-07-09 Thread Matthias J. Sax
Hi Charlie, yes, if you want to use back preassure, you need to use message-ids for tuples in spouts and in bolts, anchor emitted tuples (input tuples are anchors) and ack processed tuples. On the other hand, I was wondering, if you did try to increase the parallelism of NumberAvgBolt to avoid th

Re: Problem to recept massive tuples

2015-07-09 Thread Matthias J. Sax
ges with my example storm chain and > when I used a complex chain (Asn.1 Decoding and AvroEncoding), i received any > ack messages and i i don't understand why ? > > Best regards, > Charlie > > > De : Matthias J. Sax > Envo

Re: Re-emitting failed tuples

2015-07-15 Thread Matthias J. Sax
What Spout do you use? Failing tuples result in back-calls to Spout.fail(). If you use your own Spout implementation, you need to overwrite this method. The default implementation does nothing. Or do you already use a (so-called) reliable Spout? -Matthias On 07/15/2015 07:37 AM, Rahul wrote: >

Fwd: DELIVERY FAILURE: Error transferring to GAPAR017/SRV/SOCGEN mail.box; Maximum hop count exceeded. Message probably in a routing loop.

2015-07-15 Thread Matthias J. Sax
Hi, I recently get those error messages back, when sending to us...@storm.apache.org Does anyone experience the same problem? Thanks! -Matthias Forwarded Message Subject: DELIVERY FAILURE: Error transferring to GAPAR017/SRV/SOCGEN mail.box; Maximum hop count exceeded. Messa

Re: Problem to submit a topology to my storm cluster

2015-07-15 Thread Matthias J. Sax
Your jar file contains two copies of "defaults.yaml". You need to make sure that there is at max one. Do you include "storm-core.jar" in your own jar? For this case, exclude "defaults.yaml" that is contained in "storm-core.jar" -Matthias On 07/15/2015 02:08 PM, charlie quillard wrote: > Hi, >

Re: Deploying my topology to Remote Cluster (From: Eclipse, To: Cluster)

2015-07-16 Thread Matthias J. Sax
Using Eclipe export will package "storm-core.jar" and it's dependencies into you user jar. "storm-core.jar" contains the duplicate default.yaml class. You should not include "storm-core.jar" and its dependencies after all. You might try "export -> Java -> jar". This only packages your own code. If

Re: Difference in performance on direct grouping

2015-07-20 Thread Matthias J. Sax
Hi, using declareStream() does not necessary declare a direct stream. There are 4 methods: OutputFieldsDeclarer.declare(Fields) OutputFieldsDeclarer.declare(boolean, Fields) OutputFieldsDeclarer.declareStream(String, Fields) OutputFieldsDeclarer.declareStream(String, boolean, Fields) To declare

Fwd: DELIVERY FAILURE: Error transferring to GAPAR017/SRV/SOCGEN mail.box; Maximum hop count exceeded. Message probably in a routing loop.

2015-07-20 Thread Matthias J. Sax
Can anyone take care of this, please? Emails bouncing back each time on user@... (this is kind of annoying) Thank a lot! I really appreciate it! -Matthias Forwarded Message Subject: DELIVERY FAILURE: Error transferring to GAPAR017/SRV/SOCGEN mail.box; Maximum hop count exceed

Re: What happens when a message times out?

2015-07-21 Thread Matthias J. Sax
If the call to "execute()" does loop infinitely, yes; the whole computation stops (or might fail completely with an exception if all buffers are full -> OutOfMemoryException) -Matthias On 07/21/2015 02:05 PM, Ganesh Chandrasekaran wrote: > So let’s say we have a single threaded topology with sin

Re: Storm Logs

2015-07-22 Thread Matthias J. Sax
It's the ID of the task that emitted the tuple. On 07/23/2015 12:08 AM, Kashyap Mhaisekar wrote: > Hi, > > What does the number 3 in the text highlighted indicate - > --- > backtype.storm.daemon.executor - Processing received message > source:* f**etchoffercount:3*, stream: defaul >

Re: Disable rebalancing

2015-07-22 Thread Matthias J. Sax
Storm does not automatically rebalance (ie, it is kind of disabled by default). Rebalancing has to be triggered manually (for example via command line: "storm rebalance ") -Matthias On 07/23/2015 01:16 AM, Thilina Buddhika wrote: > Hi, > > Is there a way to disable rebalancing in Storm? Also und

Re: Process Latency < Execute latency for BaseBasicBolt?

2015-07-27 Thread Matthias J. Sax
Hi, I am not sure, but I doubt that BaseBasicBolt does ack automatically. Process-Latency should be lower, if a tuple is acked before execute finishes. So if you ack within execute, this should be normal. For complete latency at spout you are right. If it should 0, it raises the question if you en

Re: Lost Acks/messages (Possible Buffer size issue)

2015-07-28 Thread Matthias J. Sax
Hi, yes, it is correct to use Spout.ack() to throttle the ingestion rate. However, you must also use Spout.fail(), in case tuples time-out or are failed manually within a bolt. Maybe, that is the reason you are "missing acks". (I personally doubt, that Storm looses any tuples or ack/fail messages

Re: Storm execution

2015-07-29 Thread Matthias J. Sax
Did you package all your classes correctly in your jar file that is submitted to Nimbus? -Matthias On 07/29/2015 09:39 AM, Vamsikrishna Vinjam wrote: > i can run the topologies which are in storm examples..but i can the > topologies which i have written if i try to run them i getting error like >

Re: storm execution

2015-07-30 Thread Matthias J. Sax
It the class "storm.starter.util.StormRunner" contained in the your kafka.jar file? Of not, you need to add it. -Matthias On 07/30/2015 10:51 AM, Vamsikrishna Vinjam wrote: > im trying to run my storm jar file ..but im getting error > > Exception in thread "main" java.lang.NoClassDefFoundError:

Re: The relation between executor and Task

2015-07-30 Thread Matthias J. Sax
Having more tasks than executors is only helpful, if you want to change the parallelism of a spout or bolt during runtime (by using command "storm rebalance"). If you do not want to change parallelism there is no advantage in having more tasks than executors. -Matthias On 07/30/2015 10:24 AM, 鄢来琼

Re: What will happen if Storm doesn't have enough worker slots to run a topology?

2015-08-04 Thread Matthias J. Sax
Storm can place multiple executors into a single worker. Thus, there is no deployment problem, as long as your hardware can handle it. -Matthias On 08/05/2015 03:18 AM, TSD-贾宏超 wrote: > Hi everyone, > What will happen if Storm doesn't have enough worker slots to run a topology? > In my cluster, w

Re: Does Storm support rack awareness?

2015-08-07 Thread Matthias J. Sax
You an implement a custom scheduler to get this done. See an example here: https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/ -Matthias On 08/07/2015 08:40 AM, 이승진 wrote: > AFAIK, there's no way to deploy a topology into certain type of machines. > > > > Let

Re: one worker per machine per topology, is it still recommended?

2015-08-07 Thread Matthias J. Sax
IMHO, it's a question about fault-tolerance. If you have a single worker per node per topology, the impact in failure case (ie, rack going down) on a topology is low. Of course, all topologies using this failure rack are effected. If you use multiple workers for a single topology on the same supe

Re: Zookeeper : increase tick time

2015-08-27 Thread Matthias J. Sax
Hi, this is a Zookeeper setting (and not a Storm parameter). You need to update ZK config. For example, /opt/zookeeper/conf/zoo.cfg -Matthias On 08/26/2015 04:38 PM, Lina FAHED wrote: > Hello, > > i’m new in Apache Storm, i have a problem that the Zookeeper tick time is > very small for the tre

Re: Zookeeper : increase tick time

2015-08-27 Thread Matthias J. Sax
Hi, this is a Zookeeper setting (and not a Storm parameter). You need to update ZK config. For example, /opt/zookeeper/conf/zoo.cfg -Matthias On 08/26/2015 04:38 PM, Lina FAHED wrote: > Hello, > > i’m new in Apache Storm, i have a problem that the Zookeeper tick time is > very small for the tre

Re: Zookeeper : increase tick time

2015-08-27 Thread Matthias J. Sax
-Matthias On 08/27/2015 10:28 AM, Lina FAHED wrote: > Hi, > > as the Zookeeper is embedded in the Storm version, so, i didn’t found the > zookeeper in /opt/ > so, how to access it ? > > Thanks, > > Lina >> Le 27 août 2015 à 10:19, Matthias J. Sax a >> écri

Re: Zookeeper : increase tick time

2015-08-27 Thread Matthias J. Sax
orm > release i installed, > i didn’t found a path for ZK in order to change its configurations. > > do you think that it would be better to turn on a cluster mode ? > > thanks > > Lina > >> Le 27 août 2015 à 10:31, Matthias J. Sax a >> écrit : >> >&

Re: Zookeeper : increase tick time

2015-08-27 Thread Matthias J. Sax
Server - Created server with > *tickTime 2000* minSessionTimeout 4000 maxSessionTimeout 4 > > so maybe the Client session is timed out because of the ZK « small » > tickTime. I have a doubt that the changes i made are not really > considered by storm. > > Thanks, > > Lina >

Re: Bolt with information on submitting bolt/spout

2015-08-29 Thread Matthias J. Sax
The information you are looking for is provided by each incoming Tuple of Bolt.execute(Tuple). For example, tuple.getSourceComponent() Have a look here for other available methods: https://storm.apache.org/javadoc/apidocs/backtype/storm/tuple/Tuple.html -Matthias On 08/29/2015 02:48 PM, Marc Ro

Re: Average bolt

2015-08-31 Thread Matthias J. Sax
You have two options to determine your producers: 1) each incoming tuples contains meta data you can access. For example: Tuple.getSourceComponent(). See https://storm.apache.org/javadoc/apidocs/backtype/storm/tuple/Tuple.html 2) in Bolt.prepare(...) one parameter is a TopologyContext object. T

Re: Removing duplicates

2015-08-31 Thread Matthias J. Sax
Sounds reasonable. Storm does not provide any help with this. I assume that your sensors attach a timestamp as regular attribute to each tuple. Or do you timestamp your date in spout? -Matthias On 08/31/2015 09:11 AM, Marc Roos wrote: > > I have sensors that seem to be emitting duplicates and no

Re: Is it possible to create a global variable for every executors of same bolt

2015-08-31 Thread Matthias J. Sax
I would not do it this way... If you don't provide a custom scheduler, you don't know if both executors will be deployed to the same worker JVM (actually, the changes are almost zero that this happens...). (Furthermore, you need to do proper synchronization between both executors accessing the sam

Re: Tasks are not starting

2015-09-02 Thread Matthias J. Sax
Without any exception/error message it is hard to tell. What is your cluster setup - Hardware, ie, number of cores per node? - How many node/supervisor are available? - Configured number of workers for the topology? - What is the number of task for each spout/bolt? - What is the number o

Re: Tasks are not starting

2015-09-02 Thread Matthias J. Sax
4 to 41 > > Thanks, > Nick > > 2015-09-02 15:42 GMT-04:00 Matthias J. Sax <mailto:mj...@apache.org>>: > > Without any exception/error message it is hard to tell. > > What is your cluster setup > - Hardware, ie, number of cores per node? >

Re: Tasks are not starting

2015-09-02 Thread Matthias J. Sax
rker > latencies are much lower than the inter-worker latencies? > > Thanks, > Nick > > 2015-09-02 16:27 GMT-04:00 Matthias J. Sax <mailto:mj...@apache.org>>: > > So (for each node) you have 4 cores available for 1 supervisor JVM, 2 > worker JVMs that e

Re: Tasks are not starting

2015-09-03 Thread Matthias J. Sax
thing? Is it just a coincidence that happened in my experiments? > > Thank you, > Nick > > > > 2015-09-02 17:38 GMT-04:00 Matthias J. Sax <mailto:mj...@apache.org>>: > > I agree. The load is not high. > > About higher latencies. How many ackers did you

Re: Nimbus and UI stops without error

2015-09-08 Thread Matthias J. Sax
There is no timeout or anything similar. It must be something different. -Matthias On 09/08/2015 09:18 AM, Chandrashekhar Kotekar wrote: > Hi, > > I have 5 node storm cluster. On one of the node I am running Nimbus, > Supervisor and UI as well. I have observed that Storm nimbus and UI > shuts do

Re: UIs ack statistics are not updated

2015-09-09 Thread Matthias J. Sax
There is no such thing for Bolts. The call to Spout.ack(...) happens after Storm retrieved all acks of all (transitively) anchored tuples. Let's say you have spout -> bolt1 -> bolt2 Spout emit t1 which is processed by bolt1. bolt1 emits t2 (with anchor t1) and acks t1. => there will be no call t

Re: Incorrect values in Storm UI

2015-09-10 Thread Matthias J. Sax
Hi, "emitted" is the number of output tuples the spout/bolt produces, while "transfered" is the number if tuples that get shipped to consumers. If you have two consumer bolts for a single spout/bolt, the "transfered" count will be twice the "emitted" count, because each emitted tuple gets transfer

Re: How to increase the input rate of tuples

2015-09-10 Thread Matthias J. Sax
Hi, You can simple read the file directly in your Spout. This is an implementation that reads multiple files concurrently (with respect to a timestamp attribute that is included in the input record -- of course you can simplify the code if you don't have a timestamp attribute and just want to read

Re: Significance of boolean direct Output Fields declare (boolean direct, Fields fields)

2015-09-11 Thread Matthias J. Sax
Hi Ankur, If you declare a direct stream (setting the flag to true), you need to emit tuples via collector.directEmit(...) methods (collector.emit(...) is not allowed for direct streams). Those methods require to specify the consumer task ID that should receive the tuple. Furthermore, when co

Re: How to increase the input rate of tuples

2015-09-15 Thread Matthias J. Sax
em? > > Thanks again, > Nick > > On Thu, Sep 10, 2015 at 5:15 PM, Matthias J. Sax <mailto:mj...@apache.org>> wrote: > > Hi, > > You can simple read the file directly in your Spout. This is an > implementation that reads multiple files concu

Re: Easy way to compute the complete latency

2015-09-24 Thread Matthias J. Sax
If your clocks are not synchronized you will have a hard time... Why can you not use NTP? The only other idea I have, would be to use a custom scheduler that ensure that all spout and sink instances are running on the same machine, such that they can access the same local clock. But it is not clea

Re: Starting and stopping storm

2015-09-28 Thread Matthias J. Sax
Hi, as always: it depends. ;) Storm itself clear ups its own resources just fine. However, if the running topology needs to clean-up/release resources before it is shut down, Storm is not of any help. Even if there is a Spout/Bolt cleanup() method, Storm does not guarantee that it will be called.

Re: Starting and stopping storm

2015-09-29 Thread Matthias J. Sax
Let me break this down in a more simple fashion. > > I have a Storm Cluster named "The Quiet Storm" ;) here is what it > consists of: > > ** > Server ZK1: Running Zookeeper > Server ZK2: Running Zookeeper > Server ZK3: Running Zookeeper &

Re: Field Group Hash Computation

2015-09-29 Thread Matthias J. Sax
If you can use "partial key grouping" depends on your use case. Think careful before you apply it... Maybe you want to read the research paper about it. It clearly describes when you can use it and when not: https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-bala

Re: Field Group Hash Computation

2015-09-30 Thread Matthias J. Sax
> of my field groups to one bolt thereby causing it to be a > bottleneck. Since I emit string, I guess the hash is on > ArrayList(str1,str2...).hashcode(). This hashcode is coming out same > for different string combinations... > > Thanks > Kashyap &g

Re: Field Group Hash Computation

2015-10-01 Thread Matthias J. Sax
> > Thanks you! > > Kashyap > > On Sep 30, 2015 5:14 AM, "Matthias J. Sax" <mailto:mj...@apache.org>> wrote: > > Yes. That's right. > > "Values" extends ArrayList and does not overwrite .hashCode(). > > -Matthias >

Re: Emit events from hook

2015-10-01 Thread Matthias J. Sax
Hi, I never tried it, but as you add a hook in Bolt.prepare() you should be able to create your own hook class that takes the OutputCollector as constructor parameter and use it later on when the hook is called. -Matthias On 10/01/2015 12:23 AM, Raymond Conn wrote: > Hi all, > Just wanted to fo

Re: Worker Log showing Error java.lang.ClassNotFoundException

2015-10-15 Thread Matthias J. Sax
This question was on SO, too, and got answered already: https://stackoverflow.com/questions/33157303/apache-storm-classnotfoundexception-when-deploying-jar-to-remotecluster/33158308#33158308 On 10/15/2015 09:45 PM, Ankur Garg wrote: > Hi , > > I am trying to deploy my topology bundled as a fa

Re: Shutting and Starting Storm Cluster

2015-10-16 Thread Matthias J. Sax
I would recommend to write a (bash) script. Personally, I use the following assembly of scripts to start/stop a cluster: The two main scripts are "start-storm.sh" and "stop-storm.sh". All other are just helpers. -Matthias cat start-storm.sh > #!/bin/bash > > tools=/home/tools > > # check if

Storm at Stackoverflow

2015-10-21 Thread Matthias J. Sax
Hi, currently, there are two tags (apache-storm and storm) used on SO. I just suggested "apache-storm" to be the main tag and "storm" to be a synonym for it. This enables that all questions get tagged with a unique tag. Old and new questions get re-tag from storm to apache-storm automatically if t

Re: Emitting tuple to upstream node using direct grouping

2015-11-05 Thread Matthias J. Sax
You need to specify a cyclic dataflow: > builder.setSpout("spout", ...); > builder.setBolt("bolt1", ...).directGrouping("spout").directGrouping("bolt1"); > builder.setBolt("bolt2, ...).directGropuing("bolt1"); You can use the default stream. -Matthias On 11/04/2015 09:06 PM, Nathan Leung wrote:

Re: Reading N subfolder from a given directory

2015-11-05 Thread Matthias J. Sax
Hi Raisul, you would need to do something like this: > BoltDeclarer agrBolt = builder.setBolt("mux-segment", new > aggreaterProcess(framerate), 1) > for (int i = 0; i < directoryInarr.length; i++) { > agrBolt = argBolt.fieldsGrouping(boltName+Integer.toString(i), > "stream_"+Integer.toStrin

Re: How to test custom Kryo serializer locally?

2015-11-09 Thread Matthias J. Sax
In local mode, tuples are passed in memory from operator to operator without going through the serialization stack. I would recommend to write a couple of unit tests for your serializer. -Matthias On 11/09/2015 10:13 AM, Stephen Powis wrote: > Sorry I can't answer your question, but if you've wr

Re: Where are emitted tuples queued?

2015-11-23 Thread Matthias J. Sax
There are input and output buffer for each operator. So it could sit on the sender or on the receiver side. Have a look here for more details: http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/ If this is not enough information, just ask again. -Matthias

Re: Global variable in storm

2015-12-07 Thread Matthias J. Sax
Hi, a globally shared state is not supported by Storm. This is true, for many (maybe even all?) large-scale distributed systems, because a shared global stated does not scale! However, if you just want to know the number of processed tuples, you do not need a globally shared state. Actually, Stor

Re: Storm Scheduler

2015-12-08 Thread Matthias J. Sax
Hi Rudraneel, find an good tutorial here: https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/ Btw: the scheduler is called periodically. -Matthias On 12/09/2015 12:54 AM, Rudraneel chakraborty wrote: > Hello , > > I am trying to develop a custom storm schedu

Re: Strom Started project Git issue

2015-12-20 Thread Matthias J. Sax
It should be > git clone https://github.com/apache/storm.git -Matthias On 12/19/2015 04:26 AM, Thirumeni, Sripriya wrote: > > > Trying to get started on the Strom starter project getting the following > error. Can you help. > > > > https://github.com/apache/storm/tree/master/examples/stor

Re: couldn't extract resources

2015-12-22 Thread Matthias J. Sax
Can you provide more information? Your code? StackTrace? The information you provide is way too limited to give any help. Just one thing: "supervisor still hasn't start" indicated that something goes wrong when instantiating a Spout or Bolt. But it is unclear what does go wrong. Can you run your

Re: topology.ackers set to 0, still acking in downstream bolts?

2015-12-23 Thread Matthias J. Sax
Hi, if you want to disable fault-tolerance, you need to emit tuples in your spout without message IDs -> collector.emit(new Values(...)); // no messageId provided The parameter "topology.acker.executors" sets the number of ackers you want to use in your topology. I am not sure, if setting it to

Re: host for zookeeper

2015-12-26 Thread Matthias J. Sax
Are you trying to set up distributed mode on a single machine? You might need to add a line to /etc/hosts 127.0.0.1 NameOfYourMachine Using "localhost" as name for 127.0.0.1 confuses Storm... -Matthias On 12/25/2015 09:18 PM, sam mohel wrote: > i have in /etc/hosts > #192.168.x.xloca

Re: streamId is null for input Tuple in Bolt

2015-12-28 Thread Matthias J. Sax
Hi, message IDs are only used if you enable fault-tolerance, ie, assign an ID to each tuple in your spouts. You do not assign an ID: "id: {}" >> 6520 [Thread-14-UserContextBolt] INFO b.s.d.executor - Processing >> received message FOR 3 TUPLE: source: *GenericEventSpout*:1, stream: >> *GameEven

Re: how to remove storm ?

2016-01-02 Thread Matthias J. Sax
I guess you downloaded the binary and extracted it into a certain folder. Just delete this folder to uninstall Storm. -Matthias On 01/02/2016 12:48 AM, sam mohel wrote: > i know that question is simple for you but i'm new to storm followed the > instruction to install it , but if i want to upgrad

Re: topology.ackers set to 0, still acking in downstream bolts?

2016-01-04 Thread Matthias J. Sax
to be processed, you can emit them as > unanchored tuples. Since they're not anchored to any spout tuples, they > won't cause any spout tuples to fail if they aren't acked. > > --John > > > > > On Wed, Dec 23, 2015 at 5:01 PM, Matthias J. Sax <mailto:mj

Re: Failed to bind to: 0.0.0.0/0.0.0.0:6703

2016-01-04 Thread Matthias J. Sax
I doubt it is a port problem. 0.0.0.0 is *no* valid IP address. Check your IP configuration. -Matthias On 01/04/2016 04:15 PM, Derek Dagit wrote: >> org.jboss.netty.channel.ChannelException: Failed to bind to: >> 0.0.0.0/0.0.0.0:6703 > > > If you see this, you can use a tool like lsof to find

Re: grouping seeking help

2016-01-05 Thread Matthias J. Sax
If you know the number of words before hand, you can set the parallelism accordingly. Furthermore, some custom grouping should allow you to sent a single tuple to each instance. If you producer has only one instance, a simple shuffle grouping would do the trick. If you don't know the number of wor

Re: Guaranteed message processing with fan-out and caching: any best practices?

2016-01-10 Thread Matthias J. Sax
It is absolutely ok what you are doing. No need to worry about anything. If your tuples really time out, you can increase the timeout via TOPOLOGY_MESSAGE_TIMEOUT_SECS (the default value is 30 seconds). -Matthias On 01/10/2016 06:50 PM, John Yost wrote: > Hi Everyone, > > I have a topology tha

Re: what should be in /etc/hosts ?

2016-01-10 Thread Matthias J. Sax
Hi, if I remember correctly, Storm is pseudo-distibuted mode has problems to deal with "localhost" as machine name. Thus, you need to extend /etc/hosts with the real name of you local machine: 127.0.0.1 localhost 127.0.0.1 nameOfLocalMachine # add this line -Matthias On 01/

Re: Firefox can't establish a connection to the server at nameofmachine:8000.

2016-01-11 Thread Matthias J. Sax
You need to start up logviewer first: bin/storm logviewer -Matthias On 01/11/2016 09:46 AM, researcher cs wrote: > when i clicked port number in storm ui i got the following message > > Unable to connect > Firefox can't establish a connection to the server at nameofmachine:8000. signature.a

  1   2   >