Re: Worker out of memory brings down entire Topology

2015-10-23 Thread Javier Gonzalez
there, but don't think that matters at all. > > On Thu, Oct 22, 2015 at 3:54 PM, Javier Gonzalez <jagon...@gmail.com> > wrote: > >> How many workers do you have configured? Is it possible that your whole >> topology is running within that worker? >> On Oct 22, 2015 6:1

Re: Worker out of memory brings down entire Topology

2015-10-22 Thread Javier Gonzalez
How many workers do you have configured? Is it possible that your whole topology is running within that worker? On Oct 22, 2015 6:15 PM, "Dillian Murphey" wrote: > We have one worker than keeps giving us some problems. First it was out > of memory issues. We're

Re: Storm vs Spark Streaming Tech Evaluation

2015-10-21 Thread Javier Gonzalez
Clojure is, AFAIK, a functional language. ;) On Oct 20, 2015 11:42 PM, "padma priya chitturi" wrote: > Very nice post :) Storm is very good in terms of the capabilities it has. > The only thing is they could have provided API in Scala as well as Python. > Also, when

Re: Does Storm work with Spring

2015-10-19 Thread Javier Gonzalez
ich spout, but if u want to >>>>>>>> go >>>>>>>> with good design, I say just write a small wrapper to read some json >>>>>>>> where >>>>>>>> u can define ur bolts and spouts and use that to build topology (u can

Re: same jar does work well on online storm

2015-10-13 Thread Javier Gonzalez
With no further information, I would suggest checking if there was any error in the submission (the output of the storm jar command). If it did say "topology sumbitted", then check worker logs and see if there's any init errors crashing your workers. On Oct 11, 2015 11:05 PM, "Yang Nian"

Re: zookeeper datadir !

2015-10-10 Thread Javier Gonzalez
Change it. /tmp is not where you want to keep any application data. On Oct 10, 2015 7:54 AM, "researcher cs" wrote: > i'm new to storm and zookeeper just i want to ask about Data Dir in > zoo.cfg > i wrote /tmp/zookeeper > but i read in a site that data dir not to be

Re: zookeeper datadir !

2015-10-10 Thread Javier Gonzalez
, 2015 at 1:17 PM, researcher cs <prog.researc...@gmail.com> wrote: > ok , then i change it to var for example ? or is there any specific dir > for it ? > > On Sat, Oct 10, 2015 at 4:42 PM, Javier Gonzalez <jagon...@gmail.com> > wrote: > >> Change it. /tmp is not whe

Re: Does Storm work with Spring

2015-10-10 Thread Javier Gonzalez
> public void open(Map conf, TopologyContext context,SpoutOutputCollector >>> collector) { >>> >>> LOG.info("Inside the open Method for RabbitListner Spout"); >>> >>> inputManager = (InputQueueManagerImpl) ctx >>> .getBean(InputQueueMa

Re: Does Storm work with Spring

2015-10-09 Thread Javier Gonzalez
IIRC, only if everything you use in your spouts and bolts is serializable. On Oct 6, 2015 11:29 PM, "Ankur Garg" wrote: > Hi Ravi , > > I was able to make an Integration with Spring but the problem is that I > have to autowire for every bolt and spout . That means that even

Re: Exception Stack Trace for Local Cluster

2015-10-06 Thread Javier Gonzalez
If you mean your local desktop machine, you probably need to configure your logging correctly. If you mean running a topology with local submitter in a dev server... Why? :) just run a 1 node storm cluster if you want to do that On Oct 6, 2015 2:07 PM, "Ankur Garg" wrote:

Re: Approach to parallelism

2015-10-05 Thread Javier Gonzalez
much appreciated--thanks! :) > > --John > > > > On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <jagon...@gmail.com> > wrote: > >> I would suggest sticking with a single worker per machine. It makes >> memory allocation easier and it makes inter-component communi

Re: FieldsGrouping at KafkaSpout

2015-10-05 Thread Javier Gonzalez
If I'm reading this correctly, I think you're not getting the result you want - having all tuples with a given key processed in the same bolt2 instance. If you want to have all messages of a given key to be processed in the same Bolt2, you need to do fields grouping from bolt1 to bolt2. By doing

Re: Approach to parallelism

2015-10-05 Thread Javier Gonzalez
find the following links useful: >>> >>> https://gist.github.com/mrflip/5958028 >>> >>> https://wassermelonemann.wordpress.com/2014/01/22/tuning-storm-topologies/ >>> Talk: >>> >>> http://demo.ooyala.com/player.html?width=640=360=Q1eXg5NzpKqUUzBm5W

Re: FieldsGrouping at KafkaSpout

2015-10-05 Thread Javier Gonzalez
ll then forward 'em to Bolt > 2. > > Please confirm if this seems logical and that it should work. I think it > should, but I may be missing something. > > Thanks! :) > > --John > > On Mon, Oct 5, 2015 at 9:20 AM, Javier Gonzalez <jagon...@gmail.com> >

Re: Approach to parallelism

2015-10-03 Thread Javier Gonzalez
I would suggest sticking with a single worker per machine. It makes memory allocation easier and it makes inter-component communication much more efficient. Configure the executors with your parallelism hints to take advantage of all your availabe CPU cores. Regards, JG On Sat, Oct 3, 2015 at

Re: need help in Storm.yaml

2015-09-21 Thread Javier Gonzalez
This is all the configuration options you can set in the storm.yaml file. Of interest to you are the drpc.* keys: https://storm.apache.org/javadoc/apidocs/constant-values.html#backtype.storm.Config Regards, Javier On Mon, Sep 21, 2015 at 7:42 PM, researcher cs wrote:

Re: Implementing multiple(about 1000) storm topologies

2015-09-21 Thread Javier Gonzalez
Off the top of my head, I would: - have a Storm topology ready listening on kafka. If you have a few minutes between kafka event and delivery of processed input to clients, I would rather not waste time starting up the topology. - Not implement 1000 topologies. That's at least 1000 jvms. Is the

Re: Emitting Custom Object as Tuple from spout

2015-09-16 Thread Javier Gonzalez
You emit like this: collector.emit(new Values(yourBeanInstanceHere)); You just need to wrap it in a Values object. Regards, Javier. On Sep 16, 2015 9:37 AM, "Ankur Garg" wrote: > Hi , > > I am new to apache Storm . To understand it I was looking at storm > examples

Re: exception in submitting topology

2015-09-15 Thread Javier Gonzalez
, "researcher cs" <prog.researc...@gmail.com> wrote: > Sorry for my question , i'm beginner , how can i check my supervisors and > worker logs > > On Tue, Sep 15, 2015 at 8:11 AM, Javier Gonzalez <jagon...@gmail.com> > wrote: > >> Check your supervisor and work

Re: Including dependencies in a Storm topology jar

2015-09-14 Thread Javier Gonzalez
We use the shaded jar maven plugin. Just make sure that you mark storm as scope provided (so that you don't get a duplicate storm jar error) and exclude any RSA/DSA/SF signature files from the manifest folder (so that you don't get failed signature check errors). Regards, Javier On Sep 14, 2015

Re: UIs ack statistics are not updated

2015-09-09 Thread Javier Gonzalez
If I am reading your code correctly, it seems you're emitting from the spout without id - therefore, your acking efforts are not being used. You need to do something like: Object id= ; _collector.emit(id,tuple); Regards, Javier On Sep 8, 2015 3:19 PM, "Nick R. Katsipoulakis"

Re: Removing duplicates

2015-08-31 Thread Javier Gonzalez
Storm itself offers nothing towards this. Where to fix it depends on how expensive it is for you. If you can just introduce a new bolt in your topology without a terrible penalty in resources or processing throughput, I would do it that way. You don't have to modify anything other than the

Re: Single message processed by multiple threads

2015-08-27 Thread Javier Gonzalez
Is your message source thread safe? It is possible for four spout threads to read the same message from the same source if the source does not guarantee uniqueness across multiple clients. On Aug 27, 2015 9:59 AM, Ganesh Chandrasekaran gchandraseka...@wayfair.com wrote: Hi all, I am

Re: Storm UI doubts

2015-08-22 Thread Javier Gonzalez
PM, Javier Gonzalez jagon...@gmail.com wrote: Hi Nithesh, Mind that the storm metrics are gathered by sampling, so it isn't unusual that the counts are slightly off. I think the default is 5% sampling, it can be tweaked to more in storm.yaml, but it will impact your performance if you ramp

Re: Storm UI doubts

2015-08-21 Thread Javier Gonzalez
Hi Nithesh, Mind that the storm metrics are gathered by sampling, so it isn't unusual that the counts are slightly off. I think the default is 5% sampling, it can be tweaked to more in storm.yaml, but it will impact your performance if you ramp it up. Regards, Javier

Re: Spring / Trident (NotSerializableException: org.springframework)

2015-08-21 Thread Javier Gonzalez
We had issues with Spring and Storm. What we did is the following: don't do anything on the constructor. Perhaps pass a String with the location of the Spring configuration file. In the prepare or open method (for bolts and spouts respectively) initialize a context with the context file location

Coordination between spouts

2015-08-18 Thread Javier Gonzalez
Hi, I need to coordinate between two spouts, and could use the group's insight into this. The scenario: - One spout (let's call it event spout) receives the incoming data stream. - One spout is the control spout, which will receive messages from a different stream, that can impact the way that

Re: Ack not being called

2015-08-17 Thread Javier Gonzalez
How many ackers have you got configured when you submit your topology? On Aug 17, 2015 5:57 PM, Stuart Perks stup...@hotmail.com wrote: Hi I am attempting to run guaranteed message processing but ACK is not being called. Post on stack overflow if you prefer answer there.

Re: Spout activate/deactivate

2015-08-14 Thread Javier Gonzalez
it. On receiving signal it can block and unblock appropriately the nextTuple() call On Fri, Aug 14, 2015 at 5:34 AM Javier Gonzalez jagon...@gmail.com wrote: I'm trying to ensure everything is processed for coordination with an external system. Therefore, on a given signal, I have to stop the spout

Re: 200 bolts to 5 bolts--decreases throughput

2015-08-14 Thread Javier Gonzalez
You will have a detrimental effect to wiring in boltB, even if it does nothing but ack. Every tuple you have processed from A has to travel to a B bolt, and the ack has to travel back. You could try modifying the number of ackers, and playing with the number of A and B bolts. How many workers do

Re: Spout activate/deactivate

2015-08-14 Thread Javier Gonzalez
, Aug 14, 2015 at 12:05 PM Javier Gonzalez jagon...@gmail.com wrote: I was thinking of using another spout as control channel, and from that spout manipulate the original spout to cause the nextTuple method to not call the next message from the incoming queue (but not block so that acks can

Re: 200 bolts to 5 bolts--decreases throughput

2015-08-14 Thread Javier Gonzalez
! --John On Fri, Aug 14, 2015 at 2:59 PM, Javier Gonzalez jagon...@gmail.com wrote: You will have a detrimental effect to wiring in boltB, even if it does nothing but ack. Every tuple you have processed from A has to travel to a B bolt, and the ack has to travel back. You could try modifying

Re: about worker.childopts

2015-08-14 Thread Javier Gonzalez
Use mem min and max in child opts, so that the low work topologies have little initial memory, but the high volume ones can grow accordingly. On Aug 14, 2015 5:44 AM, jinhong lu lujinho...@gmail.com wrote: Hi, I have got a storm cluster of about 20 machines, 40G mem,24 core. But my cluster

Re: Spout activate/deactivate

2015-08-14 Thread Javier Gonzalez
a subset of those Spouts? $ storm help deactivate Syntax: [storm deactivate topology-name] Deactivates the specified topology's spouts. On Thu, Aug 13, 2015 at 2:12 PM Javier Gonzalez jagon...@gmail.com wrote: On a more broader term, can you share the strategies you've used to pause

Re: Spout activate/deactivate

2015-08-13 Thread Javier Gonzalez
On a more broader term, can you share the strategies you've used to pause (not emit anything else into the topology and not read anything else from the data source) a topology's spouts? Thanks, Javier On Aug 13, 2015 2:53 PM, Javier Gonzalez jagon...@gmail.com wrote: Hi, I have a use case

Spout activate/deactivate

2015-08-13 Thread Javier Gonzalez
Hi, I have a use case where I would need to stop a spout from emitting for a period of time. I'm looking at the activate /deactivate methods, but there's not much information apart from the API and the java base classes have empty implementations. Can anybody shed any insight on how those work?

Re: how to use storm to identify the missing records in data stream

2015-08-11 Thread Javier Gonzalez
Just to make sure I'm understanding correctly: Do you have a single stream of sequential ids or multiple streams that need to be interpolated? Do you receive a stream of ids and emit a stream of timestamped ids? On Aug 11, 2015 5:34 PM, Alec Lee alec.in...@gmail.com wrote: Hello, all Here I

Re: Storm with dynamo - How to increase the throughput?

2015-07-07 Thread Javier Gonzalez
Try the following: - 1 worker per machine (to minimize inter-jvm messaging) and adjust childopts so it takes as much memory as you can without bringing down the machine - as many threads as available cpu cores and no more (to avoid thread context switching) That should give you some reduction of

Re: how to ensure storm not write message twice to local file

2015-07-03 Thread Javier Gonzalez
You need to keep track of state in your bolt before writing. Yes, it is indeed quite a chore. For exactly once, particularly when coordinating with external systems such as databases or output queues, Storm is, ah, not exactly the best fit. Unless keeping track of processed events to avoid

Re: When does nimbus decides that an executor is not alive

2015-06-28 Thread Javier Gonzalez
It could be that heavy usage of an executor's machine prevents the executor from communicating with nimbus, hence it appears dead to nimbus, even though it's still working. I think we saw something like this some time during our PoC development, and it was fixed by allocating more memory to our

Re: When does nimbus decides that an executor is not alive

2015-06-28 Thread Javier Gonzalez
things about Java GC. Thank you for your time. Regards, Nick 2015-06-28 13:02 GMT-04:00 Javier Gonzalez jagon...@gmail.com: Perhaps you could put explicit GC logs in the childopts so that you see if you have GC grinding in the jvm running the worker that gets disconnected. I suggested

Re: When does nimbus decides that an executor is not alive

2015-06-28 Thread Javier Gonzalez
Javier Gonzalez jagon...@gmail.com: It could be that heavy usage of an executor's machine prevents the executor from communicating with nimbus, hence it appears dead to nimbus, even though it's still working. I think we saw something like this some time during our PoC development

Re: Flushing a topology?

2015-06-27 Thread Javier Gonzalez
of time. When the HashSet size becomes 0, returns success. Of course, if flush periods are back-to-back, then start of the next flush period is same as the end of previous period. Thanks, Satish On Wed, Jun 24, 2015 at 4:20 PM, Javier Gonzalez jagon...@gmail.com wrote: Hi Satish

Re: Flushing a topology?

2015-06-24 Thread Javier Gonzalez
that tuple has completely traversed the topology. Isn't that sufficient? On Tue, Jun 23, 2015 at 10:50 PM, Javier Gonzalez jagon...@gmail.com wrote: Hi, Question: how would you implement a flush in a topology: sending a special message to the topology that will in time return with a message

Flushing a topology?

2015-06-23 Thread Javier Gonzalez
Hi, Question: how would you implement a flush in a topology: sending a special message to the topology that will in time return with a message that says everything up to the flush message has finished traversing the topology? (does that make sense?) Regards, Javier

Re: Different Bolt on Different Servers - How to Configure in a single topology

2015-06-22 Thread Javier Gonzalez
- run nimbus in one machine - run supervisor in all four - specify four workers when creating/configuring the topology - submit topology to nimbus. This should result in the topology elements being distributed among all available servers. If you require specific pairings (e.g. spout MUST be in

Re: Storm message sequence

2015-06-09 Thread Javier Gonzalez
Mind that the at least once guarantees applies only to regular processing (i.e. storm will replay tuples that time out). Re-emitting when one of the bolts fails explicitly is your responsibility (on the spout code). On Tue, Jun 9, 2015 at 5:49 PM, Andrew Xor andreas.gramme...@gmail.com wrote:

Re: Question on Parallelsim

2015-06-08 Thread Javier Gonzalez
I would say, configure so that your total parallelism matches the number of cores available (i.e. if you have a topology with X spouts, Y boltAs and Z boltBs, make it so that X+Y+Z = cores available). And one worker per machine, inter-JVM communications are expensive. When you have more bolts and

Re: is this performance number reasonable??

2015-06-05 Thread Javier Gonzalez
Couple of things I'd suggest to check: 1.- Perhaps your data is skewed, i.e. the hash function sends the bulk of the messages to a single bolt? Check in the storm UI the number of executed tuples in each bolt. If this is the case, then all the paralelism you can set won't give you gains. You'd

Re: How Storm Process tuples atleast once?

2015-06-05 Thread Javier Gonzalez
Hi, If it's a custom Spout (i.e. you wrote it) it's completely up to you to re-emit in the event of a failed tuple. You get whatever ID you sent down the topology when you emitted from the spout, so make sure you are able to somehow get the message back from that ID to re-emit. Regards, JG On

Re: Use Java 7 in Apache Storm

2015-06-01 Thread Javier Gonzalez
You only have to make sure JAVA_HOME and your path point to the java8 installation for every storm process (nimbus and supervisor) you start. I've had no problem using storm with Java8. The only times I've had problems is when someone changes JAVA_HOME to java6 or 7 and then storm jar throws the

Re: [DISCUSS] Drop Support for Java 1.6 in Storm 0.10.0

2015-06-01 Thread Javier Gonzalez
No objection here. I work at a big company where upgrades move fast like glaciers ;) and even we are up to java7. Regards, JG On Mon, Jun 1, 2015 at 2:37 PM, P. Taylor Goetz ptgo...@gmail.com wrote: CC user@ I’d like to poll the community about the possibility of dropping support for Java

Re: Best way of scaling with a single spout

2015-05-14 Thread Javier Gonzalez
think the answer to your question hinges off of this statement: “ I believe the farming out of the processing to different nodes is hurting our performance. What makes you believe this? From: Javier Gonzalez jagon...@gmail.com Reply-To: user@storm.apache.org user@storm.apache.org Date

Re: Cleanup method not called for the BaseBasicBolt when the topology is killed

2015-05-14 Thread Javier Gonzalez
Isn't the cleanup method guaranteed to be called only while running as local topology? On May 13, 2015 9:20 AM, Jeffery Maass maas...@gmail.com wrote: Bolts which implement IBolt have a method called cleanup() which is called by the Storm framework.

Best way of scaling with a single spout

2015-05-09 Thread Javier Gonzalez
Hi, I'm currently approaching the design of an application that will have a single source of data from AMPS (high speed pub-sub system like Kafka). We are currently facing the issue that the spout is much faster than the bolts, and I believe the farming out of the processing to different nodes is

Re: Best way of scaling with a single spout

2015-05-09 Thread Javier Gonzalez
and this will give you the capability to use multiple spouts to read from the same topic. Supun.. On Sat, May 9, 2015 at 4:57 PM, Javier Gonzalez jagon...@gmail.com wrote: Hi, I'm currently approaching the design of an application that will have a single source of data from AMPS (high speed pub

Re: Store state data in zookeeper from a spout or bolt

2015-04-18 Thread Javier Gonzalez
It can be done, with the curator api. We did it in the middle of a PoC a month ago or so, to store some history that would be needed to detect if an incoming event was already processed. It performed well. Unfortunately, I can't share any code as that goes against my contract. I think what we did

Re: Multilang Spout's nextTuple() Too Many Emits at Once?

2015-04-07 Thread Javier Gonzalez
Had a similar experience - too many emits would jam the spout and it would never get around to processing the acks received from the bolts. We fixed it by introducing artificial 1ms sleep in the spout processing so that there was enough idle capacity to run the acks. I doubt that's the better

Re: Exactly once transactions and storm

2015-03-03 Thread Javier Gonzalez
also tell you which constraint was violated, you can ignore the unique constraint violations and ack back so the spout will stop retrying. Its not clean but should work. Thanks Parth From: Javier Gonzalez jagon...@gmail.com Reply-To: user@storm.apache.org user@storm.apache.org Date

Re: Exactly once transactions and storm

2015-03-03 Thread Javier Gonzalez
conditional insert/update and if you can use this feature. Thanks Parth From: Javier Gonzalez jagon...@gmail.com Reply-To: user@storm.apache.org user@storm.apache.org Date: Tuesday, March 3, 2015 at 10:43 AM To: user@storm.apache.org user@storm.apache.org Subject: Re: Exactly once

Exactly once transactions and storm

2015-03-03 Thread Javier Gonzalez
Hi guys, We're looking at storm to solve a message processing scenario that needs to be horizontally scalable for high projected volume. The use case goes like this: 1.- receive messages from external source. 2.- generate a set of messages from this external input, based on rules. 3.-