Storm is not your bottleneck. Check your Storm code to 1) ensure you're
parallelizing your writes and 2) you're batching writes to your external
resources if possible. Some quick napkin math shows you only doing 110
writes/s, which seems awfully low.
Michael Rose (@Xorlev <https:
Hi Kushan,
Depending on the Kafka spout you're using, it could be doing different
things when it failed. However, if it's running reliably, the Cassandra
insertion failures would have forced a replay from the spout until they had
completed.
Michael Rose (@Xorlev <https://twitt
g
> one message Map?
> Can you send the name of the bolts that have been processed the message to
> the final bolt, and in the final bolt to check if the the list of all
> processing bolts is the same?
>
> What is the best approach here?
>
> Thanks .
> Regards,
> Flor
It's another case of a streaming join. I've done this before, there aren't
too many gotchas, other than you need a datastructure which purges stale
unresolved joins beyond the tuple timeout time (I used a Guava cache for
this).
Michael Rose (@Xorlev <https://twitter.com/xorlev
There's no reason you couldn't. If you look in the archives there was
someone else who'd managed to do some video processing with Storm.
If you make things work, consider sharing a blog post -- that'd be really
great stuff! :)
Michael Rose (@Xorlev <https://twitter.com/x
the results downstream when the thread returns.
Just make sure that you're synchronizing your OutputCollector when emitting
from a multithreaded context.
Hope that helps.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcont
Worth clarifying for anyone else in this thread that a LBQ separating
production from consumption is not a default thing in Storm, it's something
we cooked up to prefetch elements from slow/batching resources.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platfo
essage ready for
processing.
An alternative to this is the
https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/spout/NothingEmptyEmitStrategy.java
if you do see an impact on your throughput--but I've never needed this.
Michael Rose (@Xorlev <https://twitter.com/x
her thread dealing with IO and
asynchronously feeding a concurrent data structure the spout can utilize.
For example, in our internal Amazon SQS client our IO thread continuously
fetches up to 10 messages per get and shoves them into a
LinkedBlockingQueue (until full, then it blocks the IO thread o
Run your producer code in another thread to fill a LBQ, poll that with
nextTuple instead.
You should never be blocking yourself inside a spout.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.
rkers
will receive an executor but the others will not.
It sounds like for your case, shuffle+parallelism is more than sufficient.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Wed, Jul
ot;b0" a 'RouterBolt', then having bolt b0
round-robin the received tuples between two streams, then have b1 and b2
shuffle over those streams instead.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
You need only run the existing releases on JDK 7 or 8.
On Jul 14, 2014 7:15 AM, "Haralds Ulmanis" wrote:
> Actually now I've customized a bit storm and recompiled as I needed some
> changes in it.
> But initially I just downloaded and run.
>
>
>
>
> On 14 July 2014 14:02, Adrianos Dadis wrote:
efore, the existing schedulers
are in Clojure. It's not impossible to do for sure, but like Andrew said it
might well just be easier to have separate clusters that share ZK clusters.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://ww
On serialization, make sure your custom classes are registered with Kryo
otherwise it may use Java serialization (slow)
On Jun 25, 2014 10:30 AM, "Robert Turner" wrote:
> Serialisation across workers might be your problem, if you can use the
> "localOrShuffle" grouping and arrange that the number
In a single worker, you don't incur serialization or network overhead.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Tue, Jun 17, 2014 at 11:09 PM, Romain Leroux
wrot
50
},
"mapped":{
"count":0,
"memoryUsed":0,
"totalCapacity":0
}
},
Do you have a similar bump in direct buffer counts?
Michael
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform
What kind of issues does Metrics have that leads you to recommend
HdrHistogram?
On Jun 16, 2014 6:57 PM, "Dan" wrote:
> Be careful when using Coda Hale's Metrics package when measuring latency.
> Consider using Gil Tene's
> High Dynamic Range Histogram instead:
>
> http://hdrhistogram.github.io/H
unity to halt and
reactivate)
Kill storm-topology-1
This works for us as we don't have any critical in-memory state that isn't
checkpointed to a persistent store, and the vast majority of our work can
be replayed safely.
Michael
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
A topology can run in as many workers as you assign at launch time, DRPC or
not.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Wed, Jun 11, 2014 at 1:08 PM, Nima Movafaghrad <
bolt's prepare methods. The first executor to start per JVM wins.
Storm has some gotchas (bolts are serialized, so do your init in
prepare()), but in general things that work in a normal Java application
will end up working in Storm.
Michael
Michael Rose (@Xorlev <https://twitter.co
Out of curiosity, what kind of changes have you been making?
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Tue, Jun 10, 2014 at 9:36 PM, Jason Jackson wrote:
> Hi Alex,
&g
Just as a side note, I've seen capacity numbers >6. The calculation for
capacity is somewhat flawed, and does not represent a true percentage
capacity, merely a relative measure next to your other bolts.
Maybe that's something we can improve, I'll log a JIRA if there isn'
#9 - 5pts
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Mon, Jun 9, 2014 at 2:49 PM, Bryan Stone <
bryan.st...@synapse-wireless.com> wrote:
> #9 – 5 pts
>
> *B
If your game server is just running business logic, a totally stateless set
of servers is really the way to go.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Sun, Jun 8, 2014 at 9:07 P
You can have a loop on a different stream. It's not always the best thing
to do (deadlock possibilities from buffers) but we have a production
topology that has that kind of pattern. In our case, one bolt acts as a
coordinator for recursive search.
Michael Rose (@Xorlev <https://twi
lized) {
// do stuff
initialized = true;
}
}
}
Until there's a set of lifecycle hooks, that's about as good as I've cared
to make it.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http
ing a task hook for
init code (e.g. properties / Guice). Check out BaseTaskHook, it's easily
extendible and can be included pretty easily too:
stormConfig.put(Config.TOPOLOGY_AUTO_TASK_HOOKS,
Lists.newArrayList(MyTaskHook.class.getName()));
Michael Rose (@Xorlev <https://twitter.com/x
Congrats!
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Thu, May 29, 2014 at 3:01 PM, Derek Dagit wrote:
> Welcome Michael!
>
> --
> Derek
>
>
> On 5/29
Do you have GC logging turned on? With a 60GB heap I could pretty easily
see stop-the-world GCs taking longer than the session timeout.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On
We use upstart. Supervisord would also work. Just anything to keep an eye
on it and restart it if it dies (a very rare occurrence).
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On
No reason why you couldn't do it, but as far as I know it hasn't been done
before. You can send any kind of serializable data into a topology. You'd
probably need to emit frames from the spout.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engine
#9 - 3
#5 - 2
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Fri, May 16, 2014 at 10:51 AM, Benjamin Black wrote:
> #11 - 5 pts
>
>
> On Fri, May 16, 2014 at 7:43
ios).
For development, there shouldn't be an issues foregoing supervision.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Fri, May 2, 2014 at 12:41 PM, P. Taylor Goetz wrote:
&g
In AWS, we're fans of c1.xlarges, m3.xlarges, and c3.2xlarges, but have
seen Storm successfully run on cheaper hardware.
Our Nimbus server is usually bored on a m1.large.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcont
We don't use /etc/hosts mapping, we only use hostnames / ips.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Tue, Apr 29, 2014 at 8:29 AM, Derek Dagit wrote:
> I have not tr
delegation
of tuples.
You could really abuse Storm if you wanted and use it as a distributed
application container with threadpools, I've done it. But you're really
going to see a better experience out of a webservice if it's live-mode
requests.
Michael Rose (@Xorlev <https:/
That's correct, the stream will be duplicated in the above case.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Fri, Apr 25, 2014 at 12:44 PM, David Novogrodsky <
david.n
You could set the args in the StormConfig, then they'll show up in the UI
per-topology.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Thu, Apr 17, 2014 at 7:16 PM, Cody A. Ra
Yes. Ultimately, that runs the main method of MyTopology, so just like any
other main method you get String[] args.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Thu, Apr 17, 201
It's much more efficient to deserialize it once then pass around POJOs.
JSON serialization is slow compared to Kryo. Our topologies tend to take in
JSON, then emit JSON to external systems at later phases, but all
intermediate stages are POJOs.
Michael Rose (@Xorlev <https://twitter.co
messageId is any unique identifier of the message, such that when ack is
called on your spout you're returned the identifier to then mark the work
as complete in the source in the case it supports replay.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer,
if an event is
available (and meets your criteria). Storm will handle rate limiting the
spouts with sleeps.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Tue, Mar 18, 2014 at 5:14 PM
ar to what Ooyala did, customized
for our specific needs, it's an excellent pattern.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Mon, Mar 17, 2014 at 6:21 PM, Chris Bedford wrote
(Config.TOPOLOGY_AUTO_TASK_HOOKS, listOfHooks);
OR
In prepare(), topologyContext.addTaskTook()
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Wed, Mar 5, 2014 at 1:12 PM, Brian O'Neill wrot
+1, localOrShuffle will be a winner, as long as it's evenly distributing
work. If 1 tuple could say produce a variable 1-100 resultant tuples (and
these results were expensive enough to process, e.g. IO), it might well be
worth shuffling vs. localShuffling.
Michael Rose (@Xorlev &
What kind of comparisons are you looking for? How they functionally work?
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Wed, Mar 5, 2014 at 9:52 AM, Roberto Coluccio wrote:
&
t yourself (ssh as storm, cd storm/daemon, supervise .) and seeing what
kind of errors you see.
Are your disks perhaps filled?
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Mon, M
The fact that the process is being killed constantly is a red flag. Also,
why are you running it as a client VM?
Check your nimbus.log to see why it's restarting.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcont
I'm not seeing too much to substantiate that. What size heap are you
running, and is it near filled? Perhaps attach VisualVM and check for GC
activity.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich..
Can you do a thread dump and pastebin it? It's a nice first step to figure
this out.
I just checked on our Nimbus and while it's on a larger machine, it's using
<1% CPU. Also look in your logs for any clues.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
I'd recommend just using one Zookeeper instance if they're on the same
physical host. There's no reason why a development ZK ensemble needs 3
nodes.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
m
Right now we're having slow, off-heap memory leaks, unknown if these are
linked to Netty (yet). When the workers inevitably get OOMed, the topology
will rarely recover gracefully with similar Netty timeouts. Sounds like
we'll be heading back to 0mq.
Michael Rose (@Xorlev <https:
Are you running Zookeeper on the same machine as the Nimbus box?
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak wrote:
> This is the
d. In
either case, t's pushing the management, verification, and reestablishment
of broken connections into the pool (which is also why we have 1 extra conn
-- for when a conn is tied up running a validation query or is being
reestablished).
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
bolt
instances.
supervisor.worker.start.timeout.secs 120
supervisor.worker.timeout.secs 60
I'd try tuning your worker start timeout here. Try setting it up to 300s
and (again) ensuring your prepare method only initializes expensive
resources once, then shares them between instances in the JVM.
Mi
We've done this with SLF4j and Guava as well without issues.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Thu, Feb 6, 2014 at 3:03 PM, Mark Greene wrote:
> We had this
What version of ZeroMQ are you running?
You should be running 2.1.7 with nathan's provided fork of JZMQ.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Fri, Jan 10, 2014 a
Post your code. Even Dev mode is far faster for us.
On Jan 10, 2014 8:44 AM, "Klausen Schaefersinho"
wrote:
> > I've benched storm at 1.8 million tuples per second on a big (24 core)
> box using local or shuffle grouping between a spout and bolt.
> Production or development mode?
>
> > you're on
ure what you mean
'round-robin fashion to distribute load' -- the ShuffleGrouping will
partition work across tasks on an even basis.
3) Yes, in storm.yaml, supervisor.slots.ports. By default it'll run with 4
slots per machine. See
https://github.com/nathanmarz/storm/blob/master
goes missing somewhere along the line, fail() will be called
after a timeout. If you kill the acker that tuple was tracked with, it's
then up to the message queue or other impl to be able to replay that
message.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engin
Generally speaking, I don't know of many services that work exceedingly
well over a WAN.
Can you not do processing at each location and forward it on with a queue
that isn't adverse to WAN links?
On Dec 29, 2013 10:03 AM, "Derrick Karimi" wrote:
> Hello,
>
> I have a requirement for real time da
You are not guaranteed that tuples make it, only that if they go missing or
processing fails it will replayed from the spout "at least once execution"
On Dec 29, 2013 4:47 AM, "Michal Singer" wrote:
> Hi, I am trying to test the guaranteeing of messages on storm:
>
>
>
> I have two nodes:
>
> 1.
Check firewall settings and hosts.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Fri, Dec 27, 2013 at 6:51 AM, 程训焘 wrote:
> Hi, all,
>
> I am having a problem about
ns and in the
> prepare method I will initialize the spring context.
>
> This way, the bolts will call other spring beans which are not bolts and
> initialized in spring. But of course this is a very limited solution.
>
>
>
>
>
> *From:* Michael Rose [mailto:mich..
Make a base spring bolt, in your prepare method inject the members. That's
the best I've come up with, as prepare happens server side whereas topology
config and static initializers happen at deploy time client side.
On Dec 25, 2013 7:51 AM, "Michal Singer" wrote:
> Hi, I am trying to understand
The statistics are sampled, but in general should be +/- 20 tuples of where
they should be.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Mon, Dec 23, 2013 at 9:53 PM, churly lin
The issue is you've multiplied ticktuplems by 1000 vs dividing. So you're
probably not waiting long enough ;)
If you need an alternate spout, see storm contrib. There's a halfway decent
one.
On Dec 23, 2013 7:58 AM, "Adrian Mocanu" wrote:
> If anyone is interested, I’ve decided to make my own T
https://gist.github.com/Xorlev/8058947
This is the...gist...of it. :) Hope this helps!
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Fri, Dec 20, 2013 at 10:36 AM, Pete Carlson w
with moderately sized
heaps.
I can't say I'd think the overhead is too much more to have extra workers
if you're doing shuffles or fields grouping most of the time anyways.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http
bolt 1 completed, bolt 2 outstanding
}
Bolt2 (IRichBolt / BaseRichBolt):
def execute(tuple2: Tuple) {
_collector.emit(tuple2, new Values("foo")) ---> If this is the end, you
don't need this. Otherwise, acker knows that boltN must ack to complete
_collector.ack(tuple2) --> ac
You add it as a task hook, e.x.
Scala:
config.put(Config.TOPOLOGY_AUTO_TASK_HOOKS,
List(classOf[MetricsStormHooks].getName).asJava)
Java:
List> taskHooks = new ArrayList<>();
taskHooks.add(MetricsStormHooks.class.getName());
config.put(Config.TOPOLOGY_AUTO_TASK_HOOKS, taskHooks);
Mic
using more bolt instances. If your
IOs can be a little more variable, we've found it better to have a pool per
bolt and run less bolts. This way, an IO that's 3x longer won't slow down
other tuples. Of course, this is predicated on not saturating your
downstream service and a desire t
I believe when Jon says "log4j" he refers to log4j2. Log4j2 is yet another
successor to log4j, which claims to solve issues in logback. I wasn't able
to discern a difference without log4j's usage of Disruptor (3.x).
Michael Rose (@Xorlev <https://twitter.com/xorlev>
ng to send 300k
tuple/s/node, so we find the overhead of ackers to be a more than
acceptable cost.
Michael
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com
On Sun, Nov 10, 2013 at 12:1
75 matches
Mail list logo