Hi All, Thank you for your suggestions. Sounds like a good solution. I will check it in detail. Thanks again.
Best, Xiang ------------------------------- Xiang Wang, PhD Candidate Database Research Group School of Computer Science and Engineering University of New South Wales Sydney, Australia On Fri, Aug 26, 2016 at 1:31 AM, Hart, James W. <[email protected]> wrote: > Is each bolt instance on each worker deserializing and indexing the same > data? Basically is every bolt doing the same work? How static or dynamic > is the data? It seems like relatively or totally static data, if so the > process of loading and indexing this should be moved out of storm > completely. > > > > Also is the entire dataset your building needed by any bolt at any time? > Or are only different pieces of it needed by different bolts that are > segregated by field shuffle grouping? > > > > There are multiple ways to externalize this, but probably need in memory > access speeds, so I’m thinking redis. Also could consider MongoDB or > Casandra for this. > > > > In my opinion doing anything more than trivial work in bolt startup is an > anti-pattern in storm. Also creating a tread in the bolt to do the work is > an anti-pattern in storm. > > > > > > *From:* Simon Cooper [mailto:[email protected]] > *Sent:* Wednesday, August 24, 2016 5:17 AM > > *To:* [email protected] > *Subject:* RE: Running a long task in bolt prepare() method > > > > We’re decompressing and deserializing several hundreds-of-megabytes files > containing data (statistical classifier definitions, mostly) that the bolt > needs to do its thing. The bolt can’t process events without deserializing > and indexing the data in those files, which could take anything up to > several minutes. This can’t easily be farmed out to an external service, > due to various processing and infrastructure limitations > > > > SimonC > > > > *From:* Hart, James W. [mailto:[email protected] <[email protected]>] > *Sent:* 23 August 2016 15:04 > *To:* [email protected] > *Subject:* RE: Running a long task in bolt prepare() method > > > > Can you elaborate on what kind work is being done at startup? > > > > If you are building some kind of cacheable lookup data, I would build that > elsewhere in a persistent cache, like redis, and then fetch and access it > through redis. > > > > *From:* Simon Cooper [mailto:[email protected] > <[email protected]>] > *Sent:* Tuesday, August 23, 2016 9:36 AM > *To:* [email protected] > *Subject:* RE: Running a long task in bolt prepare() method > > > > We’ve got a similar issue, where the prepare() takes a long time (could be > up to several minutes), and the bolt can’t process tuples until that is > completed. The topology seems to send in tuples before the prepare is > completed, and things go wrong > > > > We’re having to implement our own mechanism for notification – an external > way for the bolt to report to the spout that it is ready. This is also an > issue on multi-worker topologies where one of the workers goes down, is > recreated, and it’s several minutes before it can process tuples. > > > > It would be good if there was a way for storm to deal with this, so we > don’t have to implement our own back-channel back to the spout… > > > > SimonC > > > > *From:* Andrea Gazzarini [mailto:[email protected] <[email protected]>] > *Sent:* 23 August 2016 13:08 > *To:* [email protected] > *Subject:* Re: Running a long task in bolt prepare() method > > > > Not sure if there's a "built-in" approach in Storm for doint that. After > make sure there isn't, I'd do the following > > - I'd start such long task asynchronously in the prepare method and > I'd register a callback > - if the execute method logic depends on the completion of such task, > I'd use a basic state pattern with two states ON/OFF (where the off state > is basically a NullObject). The callback would be responsible to switch > the bolt state from OFF (initial state) to ON (working state) > > Best, > Andrea > > On 23/08/16 09:12, Xiang Wang wrote: > > Hi All, > > > > I am trying to do some long-time initialisation task in bolt prepare() > method in local mode. > > > > I always got error like this: > > *WARN o.a.s.s.o.a.z.s.p.FileTxnLog - fsync-ing the write ahead log in > SyncThread:0 took 1197ms which will adversely effect operation latency. See > the ZooKeeper troubleshooting guide* > > > > And then the task fails. > > > > Could anyone tell me how to fix this problem? Or is it a good practice to > run long-time task in prepare() method? If not, what is supposed to be the > correct way to do it? > > > > Many thanks for your kind help. > > > > Best, > > Xiang > > ------------------------------- > > Xiang Wang, PhD Candidate > > Database Research Group > > School of Computer Science and Engineering > > The University of New South Wales > > SYDNEY, AUSTRALIA > > > > This message, and any files/attachments transmitted together with it, is > intended for the use only of the person (or persons) to whom it is > addressed. It may contain information which is confidential and/or > protected by legal privilege. Accordingly, any dissemination, distribution, > copying or use of this message, or any part of it or anything sent together > with it, other than by intended recipients, may constitute a breach of > civil or criminal law and is hereby prohibited. Unless otherwise stated, > any views expressed in this message are those of the person sending it and > not the sender's employer. No responsibility, legal or otherwise, of > whatever nature, is accepted as to the accuracy of the contents of this > message or for the completeness of the message as received. Anyone who is > not the intended recipient of this message is advised to make no use of it > and is requested to contact Featurespace Limited as soon as possible. Any > recipient of this message who has knowledge or suspects that it may have > been the subject of unauthorised interception or alteration is also > requested to contact Featurespace Limited. >
