Re: Running a long task in bolt prepare() method

Xiang Wang Sun, 28 Aug 2016 18:53:56 -0700

Hi All,

Thank you for your suggestions. Sounds like a good solution. I will check
it in detail.
Thanks again.


Best,
Xiang


-------------------------------
Xiang Wang, PhD Candidate
Database Research Group
School of Computer Science and Engineering
University of New South Wales
Sydney, Australia

On Fri, Aug 26, 2016 at 1:31 AM, Hart, James W. <[email protected]> wrote:

> Is each bolt instance on each worker deserializing and indexing the same
> data?  Basically is every bolt doing the same work?  How static or dynamic
> is the data? It seems like relatively or totally static data, if so the
> process of loading and indexing this should be moved out of storm
> completely.
>
>
>
> Also is the entire dataset your building needed by any bolt at any time?
> Or are only different pieces of it needed by different bolts that are
> segregated by field shuffle grouping?
>
>
>
> There are multiple ways to externalize this, but probably need in memory
> access speeds, so I’m thinking redis.  Also could consider MongoDB or
> Casandra for this.
>
>
>
> In my opinion doing anything more than trivial work in bolt startup is an
> anti-pattern in storm.  Also creating a tread in the bolt to do the work is
> an anti-pattern in storm.
>
>
>
>
>
> *From:* Simon Cooper [mailto:[email protected]]
> *Sent:* Wednesday, August 24, 2016 5:17 AM
>
> *To:* [email protected]
> *Subject:* RE: Running a long task in bolt prepare() method
>
>
>
> We’re decompressing and deserializing several hundreds-of-megabytes files
> containing data (statistical classifier definitions, mostly) that the bolt
> needs to do its thing. The bolt can’t process events without deserializing
> and indexing the data in those files, which could take anything up to
> several minutes. This can’t easily be farmed out to an external service,
> due to various processing and infrastructure limitations
>
>
>
> SimonC
>
>
>
> *From:* Hart, James W. [mailto:[email protected] <[email protected]>]
> *Sent:* 23 August 2016 15:04
> *To:* [email protected]
> *Subject:* RE: Running a long task in bolt prepare() method
>
>
>
> Can you elaborate on what kind work is being done at startup?
>
>
>
> If you are building some kind of cacheable lookup data, I would build that
> elsewhere in a persistent cache, like redis, and then fetch and access it
> through redis.
>
>
>
> *From:* Simon Cooper [mailto:[email protected]
> <[email protected]>]
> *Sent:* Tuesday, August 23, 2016 9:36 AM
> *To:* [email protected]
> *Subject:* RE: Running a long task in bolt prepare() method
>
>
>
> We’ve got a similar issue, where the prepare() takes a long time (could be
> up to several minutes), and the bolt can’t process tuples until that is
> completed. The topology seems to send in tuples before the prepare is
> completed, and things go wrong
>
>
>
> We’re having to implement our own mechanism for notification – an external
> way for the bolt to report to the spout that it is ready. This is also an
> issue on multi-worker topologies where one of the workers goes down, is
> recreated, and it’s several minutes before it can process tuples.
>
>
>
> It would be good if there was a way for storm to deal with this, so we
> don’t have to implement our own back-channel back to the spout…
>
>
>
> SimonC
>
>
>
> *From:* Andrea Gazzarini [mailto:[email protected] <[email protected]>]
> *Sent:* 23 August 2016 13:08
> *To:* [email protected]
> *Subject:* Re: Running a long task in bolt prepare() method
>
>
>
> Not sure if there's a "built-in" approach in Storm for doint that. After
> make sure there isn't,  I'd do the following
>
>    - I'd start such long task asynchronously in the prepare method and
>    I'd register a callback
>    - if the execute method logic depends on the completion of such task,
>    I'd use a basic state pattern with two states ON/OFF (where the off state
>    is basically a NullObject). The callback would be responsible to switch
>    the bolt state from OFF (initial state) to ON (working state)
>
> Best,
> Andrea
>
> On 23/08/16 09:12, Xiang Wang wrote:
>
> Hi All,
>
>
>
> I am trying to do some long-time initialisation task in bolt prepare()
> method in local mode.
>
>
>
> I always got error like this:
>
> *WARN  o.a.s.s.o.a.z.s.p.FileTxnLog - fsync-ing the write ahead log in
> SyncThread:0 took 1197ms which will adversely effect operation latency. See
> the ZooKeeper troubleshooting guide*
>
>
>
> And then the task fails.
>
>
>
> Could anyone tell me how to fix this problem? Or is it a good practice to
> run long-time task in prepare() method? If not, what is supposed to be the
> correct way to do it?
>
>
>
> Many thanks for your kind help.
>
>
>
> Best,
>
> Xiang
>
> -------------------------------
>
> Xiang Wang, PhD Candidate
>
> Database Research Group
>
> School of Computer Science and Engineering
>
> The University of New South Wales
>
> SYDNEY, AUSTRALIA
>
>
>
> This message, and any files/attachments transmitted together with it, is
> intended for the use only of the person (or persons) to whom it is
> addressed. It may contain information which is confidential and/or
> protected by legal privilege. Accordingly, any dissemination, distribution,
> copying or use of this message, or any part of it or anything sent together
> with it, other than by intended recipients, may constitute a breach of
> civil or criminal law and is hereby prohibited. Unless otherwise stated,
> any views expressed in this message are those of the person sending it and
> not the sender's employer. No responsibility, legal or otherwise, of
> whatever nature, is accepted as to the accuracy of the contents of this
> message or for the completeness of the message as received. Anyone who is
> not the intended recipient of this message is advised to make no use of it
> and is requested to contact Featurespace Limited as soon as possible. Any
> recipient of this message who has knowledge or suspects that it may have
> been the subject of unauthorised interception or alteration is also
> requested to contact Featurespace Limited.
>

Re: Running a long task in bolt prepare() method

Reply via email to