there, but don't think that matters at all.
>
> On Thu, Oct 22, 2015 at 3:54 PM, Javier Gonzalez <jagon...@gmail.com>
> wrote:
>
>> How many workers do you have configured? Is it possible that your whole
>> topology is running within that worker?
>> On Oct 22, 2015 6:1
How many workers do you have configured? Is it possible that your whole
topology is running within that worker?
On Oct 22, 2015 6:15 PM, "Dillian Murphey" wrote:
> We have one worker than keeps giving us some problems. First it was out
> of memory issues. We're
Clojure is, AFAIK, a functional language. ;)
On Oct 20, 2015 11:42 PM, "padma priya chitturi"
wrote:
> Very nice post :) Storm is very good in terms of the capabilities it has.
> The only thing is they could have provided API in Scala as well as Python.
> Also, when
ich spout, but if u want to
>>>>>>>> go
>>>>>>>> with good design, I say just write a small wrapper to read some json
>>>>>>>> where
>>>>>>>> u can define ur bolts and spouts and use that to build topology (u can
With no further information, I would suggest checking if there was any
error in the submission (the output of the storm jar command).
If it did say "topology sumbitted", then check worker logs and see if
there's any init errors crashing your workers.
On Oct 11, 2015 11:05 PM, "Yang Nian"
Change it. /tmp is not where you want to keep any application data.
On Oct 10, 2015 7:54 AM, "researcher cs" wrote:
> i'm new to storm and zookeeper just i want to ask about Data Dir in
> zoo.cfg
> i wrote /tmp/zookeeper
> but i read in a site that data dir not to be
, 2015 at 1:17 PM, researcher cs <prog.researc...@gmail.com>
wrote:
> ok , then i change it to var for example ? or is there any specific dir
> for it ?
>
> On Sat, Oct 10, 2015 at 4:42 PM, Javier Gonzalez <jagon...@gmail.com>
> wrote:
>
>> Change it. /tmp is not whe
> public void open(Map conf, TopologyContext context,SpoutOutputCollector
>>> collector) {
>>>
>>> LOG.info("Inside the open Method for RabbitListner Spout");
>>>
>>> inputManager = (InputQueueManagerImpl) ctx
>>> .getBean(InputQueueMa
IIRC, only if everything you use in your spouts and bolts is serializable.
On Oct 6, 2015 11:29 PM, "Ankur Garg" wrote:
> Hi Ravi ,
>
> I was able to make an Integration with Spring but the problem is that I
> have to autowire for every bolt and spout . That means that even
If you mean your local desktop machine, you probably need to configure your
logging correctly.
If you mean running a topology with local submitter in a dev server... Why?
:) just run a 1 node storm cluster if you want to do that
On Oct 6, 2015 2:07 PM, "Ankur Garg" wrote:
much appreciated--thanks! :)
>
> --John
>
>
>
> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez <jagon...@gmail.com>
> wrote:
>
>> I would suggest sticking with a single worker per machine. It makes
>> memory allocation easier and it makes inter-component communi
If I'm reading this correctly, I think you're not getting the result you
want - having all tuples with a given key processed in the same bolt2
instance.
If you want to have all messages of a given key to be processed in the same
Bolt2, you need to do fields grouping from bolt1 to bolt2. By doing
find the following links useful:
>>>
>>> https://gist.github.com/mrflip/5958028
>>>
>>> https://wassermelonemann.wordpress.com/2014/01/22/tuning-storm-topologies/
>>> Talk:
>>>
>>> http://demo.ooyala.com/player.html?width=640=360=Q1eXg5NzpKqUUzBm5W
ll then forward 'em to Bolt
> 2.
>
> Please confirm if this seems logical and that it should work. I think it
> should, but I may be missing something.
>
> Thanks! :)
>
> --John
>
> On Mon, Oct 5, 2015 at 9:20 AM, Javier Gonzalez <jagon...@gmail.com>
>
I would suggest sticking with a single worker per machine. It makes memory
allocation easier and it makes inter-component communication much more
efficient. Configure the executors with your parallelism hints to take
advantage of all your availabe CPU cores.
Regards,
JG
On Sat, Oct 3, 2015 at
This is all the configuration options you can set in the storm.yaml file.
Of interest to you are the drpc.* keys:
https://storm.apache.org/javadoc/apidocs/constant-values.html#backtype.storm.Config
Regards,
Javier
On Mon, Sep 21, 2015 at 7:42 PM, researcher cs
wrote:
Off the top of my head, I would:
- have a Storm topology ready listening on kafka. If you have a few minutes
between kafka event and delivery of processed input to clients, I would
rather not waste time starting up the topology.
- Not implement 1000 topologies. That's at least 1000 jvms. Is the
You emit like this:
collector.emit(new Values(yourBeanInstanceHere));
You just need to wrap it in a Values object.
Regards,
Javier.
On Sep 16, 2015 9:37 AM, "Ankur Garg" wrote:
> Hi ,
>
> I am new to apache Storm . To understand it I was looking at storm
> examples
, "researcher cs" <prog.researc...@gmail.com> wrote:
> Sorry for my question , i'm beginner , how can i check my supervisors and
> worker logs
>
> On Tue, Sep 15, 2015 at 8:11 AM, Javier Gonzalez <jagon...@gmail.com>
> wrote:
>
>> Check your supervisor and work
We use the shaded jar maven plugin. Just make sure that you mark storm as
scope provided (so that you don't get a duplicate storm jar error) and
exclude any RSA/DSA/SF signature files from the manifest folder (so that
you don't get failed signature check errors).
Regards,
Javier
On Sep 14, 2015
If I am reading your code correctly, it seems you're emitting from the
spout without id - therefore, your acking efforts are not being used. You
need to do something like:
Object id= ;
_collector.emit(id,tuple);
Regards,
Javier
On Sep 8, 2015 3:19 PM, "Nick R. Katsipoulakis"
Storm itself offers nothing towards this. Where to fix it depends on how
expensive it is for you. If you can just introduce a new bolt in your
topology without a terrible penalty in resources or processing throughput,
I would do it that way. You don't have to modify anything other than the
Is your message source thread safe? It is possible for four spout threads
to read the same message from the same source if the source does not
guarantee uniqueness across multiple clients.
On Aug 27, 2015 9:59 AM, Ganesh Chandrasekaran
gchandraseka...@wayfair.com wrote:
Hi all,
I am
PM, Javier Gonzalez jagon...@gmail.com
wrote:
Hi Nithesh,
Mind that the storm metrics are gathered by sampling, so it isn't unusual
that the counts are slightly off. I think the default is 5% sampling, it
can be tweaked to more in storm.yaml, but it will impact your performance
if you ramp
Hi Nithesh,
Mind that the storm metrics are gathered by sampling, so it isn't unusual
that the counts are slightly off. I think the default is 5% sampling, it
can be tweaked to more in storm.yaml, but it will impact your performance
if you ramp it up.
Regards,
Javier
We had issues with Spring and Storm. What we did is the following: don't do
anything on the constructor. Perhaps pass a String with the location of the
Spring configuration file. In the prepare or open method (for bolts and
spouts respectively) initialize a context with the context file location
Hi,
I need to coordinate between two spouts, and could use the group's insight
into this. The scenario:
- One spout (let's call it event spout) receives the incoming data stream.
- One spout is the control spout, which will receive messages from a
different stream, that can impact the way that
How many ackers have you got configured when you submit your topology?
On Aug 17, 2015 5:57 PM, Stuart Perks stup...@hotmail.com wrote:
Hi I am attempting to run guaranteed message processing but ACK is not
being called. Post on stack overflow if you prefer answer there.
it. On receiving
signal it can block and unblock appropriately the nextTuple() call
On Fri, Aug 14, 2015 at 5:34 AM Javier Gonzalez jagon...@gmail.com
wrote:
I'm trying to ensure everything is processed for coordination with an
external system. Therefore, on a given signal, I have to stop the spout
You will have a detrimental effect to wiring in boltB, even if it does
nothing but ack. Every tuple you have processed from A has to travel to a B
bolt, and the ack has to travel back.
You could try modifying the number of ackers, and playing with the number
of A and B bolts. How many workers do
, Aug 14, 2015 at 12:05 PM Javier Gonzalez jagon...@gmail.com
wrote:
I was thinking of using another spout as control channel, and from that
spout manipulate the original spout to cause the nextTuple method to not
call the next message from the incoming queue (but not block so that acks
can
!
--John
On Fri, Aug 14, 2015 at 2:59 PM, Javier Gonzalez jagon...@gmail.com
wrote:
You will have a detrimental effect to wiring in boltB, even if it does
nothing but ack. Every tuple you have processed from A has to travel to a B
bolt, and the ack has to travel back.
You could try modifying
Use mem min and max in child opts, so that the low work topologies have
little initial memory, but the high volume ones can grow accordingly.
On Aug 14, 2015 5:44 AM, jinhong lu lujinho...@gmail.com wrote:
Hi, I have got a storm cluster of about 20 machines, 40G mem,24 core.
But my cluster
a subset of those Spouts?
$ storm help deactivate
Syntax: [storm deactivate topology-name]
Deactivates the specified topology's spouts.
On Thu, Aug 13, 2015 at 2:12 PM Javier Gonzalez jagon...@gmail.com
wrote:
On a more broader term, can you share the strategies you've used to pause
On a more broader term, can you share the strategies you've used to pause
(not emit anything else into the topology and not read anything else from
the data source) a topology's spouts?
Thanks,
Javier
On Aug 13, 2015 2:53 PM, Javier Gonzalez jagon...@gmail.com wrote:
Hi,
I have a use case
Hi,
I have a use case where I would need to stop a spout from emitting for a
period of time. I'm looking at the activate /deactivate methods, but
there's not much information apart from the API and the java base classes
have empty implementations. Can anybody shed any insight on how those work?
Just to make sure I'm understanding correctly: Do you have a single stream
of sequential ids or multiple streams that need to be interpolated? Do you
receive a stream of ids and emit a stream of timestamped ids?
On Aug 11, 2015 5:34 PM, Alec Lee alec.in...@gmail.com wrote:
Hello, all
Here I
Try the following:
- 1 worker per machine (to minimize inter-jvm messaging) and adjust
childopts so it takes as much memory as you can without bringing down the
machine
- as many threads as available cpu cores and no more (to avoid thread
context switching)
That should give you some reduction of
You need to keep track of state in your bolt before writing. Yes, it is
indeed quite a chore. For exactly once, particularly when coordinating
with external systems such as databases or output queues, Storm is, ah, not
exactly the best fit. Unless keeping track of processed events to avoid
It could be that heavy usage of an executor's machine prevents the executor
from communicating with nimbus, hence it appears dead to nimbus, even
though it's still working. I think we saw something like this some time
during our PoC development, and it was fixed by allocating more memory to
our
things about Java GC.
Thank you for your time.
Regards,
Nick
2015-06-28 13:02 GMT-04:00 Javier Gonzalez jagon...@gmail.com:
Perhaps you could put explicit GC logs in the childopts so that you see
if you have GC grinding in the jvm running the worker that gets
disconnected. I suggested
Javier Gonzalez jagon...@gmail.com:
It could be that heavy usage of an executor's machine prevents the
executor from communicating with nimbus, hence it appears dead to nimbus,
even though it's still working. I think we saw something like this some
time during our PoC development
of time.
When the HashSet size becomes 0, returns success. Of course, if flush
periods are back-to-back, then start of the next flush period is same as
the end of previous period.
Thanks,
Satish
On Wed, Jun 24, 2015 at 4:20 PM, Javier Gonzalez jagon...@gmail.com
wrote:
Hi Satish
that tuple has completely traversed the topology. Isn't
that sufficient?
On Tue, Jun 23, 2015 at 10:50 PM, Javier Gonzalez jagon...@gmail.com
wrote:
Hi,
Question: how would you implement a flush in a topology: sending a
special message to the topology that will in time return with a message
Hi,
Question: how would you implement a flush in a topology: sending a
special message to the topology that will in time return with a message
that says everything up to the flush message has finished traversing the
topology? (does that make sense?)
Regards,
Javier
- run nimbus in one machine
- run supervisor in all four
- specify four workers when creating/configuring the topology
- submit topology to nimbus.
This should result in the topology elements being distributed among all
available servers.
If you require specific pairings (e.g. spout MUST be in
Mind that the at least once guarantees applies only to regular
processing (i.e. storm will replay tuples that time out). Re-emitting when
one of the bolts fails explicitly is your responsibility (on the spout
code).
On Tue, Jun 9, 2015 at 5:49 PM, Andrew Xor andreas.gramme...@gmail.com
wrote:
I would say, configure so that your total parallelism matches the number of
cores available (i.e. if you have a topology with X spouts, Y boltAs and Z
boltBs, make it so that X+Y+Z = cores available). And one worker per
machine, inter-JVM communications are expensive. When you have more bolts
and
Couple of things I'd suggest to check:
1.- Perhaps your data is skewed, i.e. the hash function sends the bulk of
the messages to a single bolt? Check in the storm UI the number of executed
tuples in each bolt. If this is the case, then all the paralelism you can
set won't give you gains. You'd
Hi,
If it's a custom Spout (i.e. you wrote it) it's completely up to you to
re-emit in the event of a failed tuple. You get whatever ID you sent down
the topology when you emitted from the spout, so make sure you are able to
somehow get the message back from that ID to re-emit.
Regards,
JG
On
You only have to make sure JAVA_HOME and your path point to the java8
installation for every storm process (nimbus and supervisor) you start.
I've had no problem using storm with Java8. The only times I've had
problems is when someone changes JAVA_HOME to java6 or 7 and then storm jar
throws the
No objection here. I work at a big company where upgrades move fast like
glaciers ;) and even we are up to java7.
Regards,
JG
On Mon, Jun 1, 2015 at 2:37 PM, P. Taylor Goetz ptgo...@gmail.com wrote:
CC user@
I’d like to poll the community about the possibility of dropping support
for Java
think the answer to your question hinges off of this statement:
“
I believe the farming out of the processing to different nodes is hurting
our performance.
What makes you believe this?
From: Javier Gonzalez jagon...@gmail.com
Reply-To: user@storm.apache.org user@storm.apache.org
Date
Isn't the cleanup method guaranteed to be called only while running as
local topology?
On May 13, 2015 9:20 AM, Jeffery Maass maas...@gmail.com wrote:
Bolts which implement IBolt have a method called cleanup() which is called
by the Storm framework.
Hi,
I'm currently approaching the design of an application that will have a
single source of data from AMPS (high speed pub-sub system like Kafka). We
are currently facing the issue that the spout is much faster than the
bolts, and I believe the farming out of the processing to different nodes
is
and this will
give you the capability to use multiple spouts to read from the same topic.
Supun..
On Sat, May 9, 2015 at 4:57 PM, Javier Gonzalez jagon...@gmail.com
wrote:
Hi,
I'm currently approaching the design of an application that will have a
single source of data from AMPS (high speed pub
It can be done, with the curator api. We did it in the middle of a PoC a
month ago or so, to store some history that would be needed to detect if an
incoming event was already processed. It performed well. Unfortunately, I
can't share any code as that goes against my contract. I think what we did
Had a similar experience - too many emits would jam the spout and it would
never get around to processing the acks received from the bolts. We fixed
it by introducing artificial 1ms sleep in the spout processing so that
there was enough idle capacity to run the acks. I doubt that's the better
also tell you which constraint was
violated, you can ignore the unique constraint violations and ack back so
the spout will stop retrying.
Its not clean but should work.
Thanks
Parth
From: Javier Gonzalez jagon...@gmail.com
Reply-To: user@storm.apache.org user@storm.apache.org
Date
conditional insert/update and if you can
use this feature.
Thanks
Parth
From: Javier Gonzalez jagon...@gmail.com
Reply-To: user@storm.apache.org user@storm.apache.org
Date: Tuesday, March 3, 2015 at 10:43 AM
To: user@storm.apache.org user@storm.apache.org
Subject: Re: Exactly once
Hi guys,
We're looking at storm to solve a message processing scenario that needs to
be horizontally scalable for high projected volume. The use case goes like
this:
1.- receive messages from external source.
2.- generate a set of messages from this external input, based on rules.
3.-
61 matches
Mail list logo