Also, it is usually better to hit once the database and fetch a batch of tuples, that in turn will be serialized (inside the Hibernate context), then splitted in bolts and so forth.
On Fri, May 15, 2015 at 11:03 AM Mason Yu <[email protected]> wrote: > > Blocking is the antithesis to scaling and high performance. > > Mason Yu Jr. > Principal Architect > Big Data Architects, LLC. > > 著名的孫子 > > On Fri, May 15, 2015 at 9:54 AM, Fan Jiang <[email protected]> wrote: > >> Yes, Enno is right about JDBC. Because JDBC is blocking in nature and >> JDBC operations could be frequently performed when you are working on a >> RDBMS in Java, limiting them will potentially improve the topology's >> throughput. >> >> Fan >> >> 2015-05-15 9:32 GMT-04:00 Enno Shioji <[email protected]>: >> >> JDBC drivers have no facility to make asynchronous requests, so the >>> thread that's calling it has to wait until the IO call finishes, before >>> doing anything else. This can be wasteful if there is useful work that >>> could have been done in the mean time. >>> >>> Especially in case of storm, the thread that calls the tasks can be >>> shared by multiple tasks (depending on the configuration), in which case >>> there is *probably* useful work that can be done which can't be, because >>> the thread is "blocked". >>> >>> This is not specific to JDBC. Also it's not obvious if you are better >>> off by not blocking; e.g. if there is no work that can be done with the >>> thread anyways, you can end up decreasing the overall performance with the >>> additional overhead. >>> >>> On Fri, May 15, 2015 at 1:56 PM, Jeffery Maass <[email protected]> >>> wrote: >>> >>>> Fan: >>>> >>>> Why are you singling out JDBC operations to avoid? What is it about >>>> them that is especially "blocking"? >>>> >>>> Thank you for your time! >>>> >>>> +++++++++++++++++++++ >>>> Jeff Maass <[email protected]> >>>> linkedin.com/in/jeffmaass >>>> stackoverflow.com/users/373418/maassql >>>> +++++++++++++++++++++ >>>> >>>> >>>> On Thu, May 14, 2015 at 9:41 AM, Fan Jiang <[email protected]> wrote: >>>> >>>>> One thing to note is that you should try to avoid JDBC operations in a >>>>> bolt, as they may block the bolt and affect the topology's performance. >>>>> Try >>>>> to do the database access asynchronously, or create a separate thread for >>>>> JDBC operations. >>>>> >>>>> 2015-05-14 10:30 GMT-04:00 Mason Yu <[email protected]>: >>>>> >>>>> Interesting..... Hibernate hooks inside a J2ee container or Spring >>>>>> which requires a specific OR mapping to a 20th century RDBMS. >>>>>> Storm works in a Linux distributed environment which does not >>>>>> need a RDBMS. RDBMS's do not work in a distributed environment. >>>>>> >>>>>> Mason Yu Jr. >>>>>> CEO >>>>>> Big Data Architects, LLC. >>>>>> >>>>>> 著名的孫子 >>>>>> >>>>>> On Thu, May 14, 2015 at 9:58 AM, Stephen Powis <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> [image: Boxbe] <https://www.boxbe.com/overview> This message is >>>>>>> eligible for Automatic Cleanup! ([email protected]) Add cleanup >>>>>>> rule >>>>>>> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DuAW1cNLhRjzzoTybJZlWM4edzt3m9fQiQ%252Fotr%252BLEu3ac0GIlaQyl%252Be4UagkWlTiCY%252Bvq8KXOkzkzNY0pSkyJzvKKJyQv%252BXceuaA%252FuExYRw6YS1o2s1%252FImPAjQkHSXt%252FvWesPubbzFPmMWCDCtBIJEA%253D%253D%26key%3D%252BXRs6Dx5fQJ4FB57cniXG9YH1MKQnFQnIVYEqegbWGo%253D&tc_serial=21328751243&tc_rand=1774350433&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> >>>>>>> | More info >>>>>>> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=21328751243&tc_rand=1774350433&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> >>>>>>> >>>>>>> Hello everyone! >>>>>>> >>>>>>> I'm currently toying around with a prototype built ontop of Storm >>>>>>> and have been running into some not so easy going while trying to work >>>>>>> with >>>>>>> Hibernate and storm. I was hoping to get input on if this is just a >>>>>>> case >>>>>>> of "I'm doing it wrong" or maybe get some useful tips. >>>>>>> >>>>>>> In my prototype, I have a need to fan out a single tuple to several >>>>>>> bolts which do data retrieval from our database in parallel, which then >>>>>>> get >>>>>>> merged back into a single stream. These data retrieval bolts all find >>>>>>> various hibernate entities and pass them along to the merge bolt. We've >>>>>>> written a kryo serializer that converts from the hibernate entities into >>>>>>> POJOs, which get sent to the merge bolt in tuples. Once all the tuples >>>>>>> get >>>>>>> to the merge bolt, it collects them all into a single tuple and passes >>>>>>> it >>>>>>> downstream to a bolt which does processing using the entities. >>>>>>> >>>>>>> So it looks something like this. >>>>>>> >>>>>>> ---- (retrieve bolt a) ---- >>>>>>> / ---- (retrieve bolt b) ----\ >>>>>>> /------(retrieve bolt c) -----\ >>>>>>> --- (split bolt)------(retrieve bolt d)-------(merge bolt) ----- >>>>>>> (processing bolt) >>>>>>> >>>>>>> So dealing with detaching the hibernate entities from the session to >>>>>>> serialize them, and then further downstream when we want to work with >>>>>>> the >>>>>>> entities again, we have to reattach them to a new session....this seems >>>>>>> kind of awkward. >>>>>>> >>>>>>> Does doing the above make sense? Has anyone attempted to do the >>>>>>> above? Any tips or things we should watch out for? Basically looking >>>>>>> for >>>>>>> any kind of input for this use case. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Sincerely, >>>>> Fan Jiang >>>>> >>>>> IT Developer at RENCI >>>>> [email protected] >>>>> >>>> >>>> >>> >> >> >> -- >> Sincerely, >> Fan Jiang >> >> IT Developer at RENCI >> [email protected] >> > >
