Also, it is usually better to hit once the database and fetch a batch of
tuples, that in turn will be serialized (inside the Hibernate context),
then splitted in bolts and so forth.



On Fri, May 15, 2015 at 11:03 AM Mason Yu <[email protected]> wrote:

>
>     Blocking is the antithesis to scaling and high performance.
>
>     Mason Yu Jr.
>     Principal Architect
>     Big Data Architects, LLC.
>
> 著名的孫子
>
> On Fri, May 15, 2015 at 9:54 AM, Fan Jiang <[email protected]> wrote:
>
>> Yes, Enno is right about JDBC. Because JDBC is blocking in nature and
>> JDBC operations could be frequently performed when you are working on a
>> RDBMS in Java, limiting them will potentially improve the topology's
>> throughput.
>>
>> Fan
>>
>> 2015-05-15 9:32 GMT-04:00 Enno Shioji <[email protected]>:
>>
>> JDBC drivers have no facility to make asynchronous requests, so the
>>> thread that's calling it has to wait until the IO call finishes, before
>>> doing anything else. This can be wasteful if there is useful work that
>>> could have been done in the mean time.
>>>
>>> Especially in case of storm, the thread that calls the tasks can be
>>> shared by multiple tasks (depending on the configuration), in which case
>>> there is *probably* useful work that can be done which can't be, because
>>> the thread is "blocked".
>>>
>>> This is not specific to JDBC. Also it's not obvious if you are better
>>> off by not blocking; e.g. if there is no work that can be done with the
>>> thread anyways, you can end up decreasing the overall performance with the
>>> additional overhead.
>>>
>>> On Fri, May 15, 2015 at 1:56 PM, Jeffery Maass <[email protected]>
>>> wrote:
>>>
>>>> Fan:
>>>>
>>>> Why are you singling out JDBC operations to avoid?  What is it about
>>>> them that is especially "blocking"?
>>>>
>>>> Thank you for your time!
>>>>
>>>> +++++++++++++++++++++
>>>> Jeff Maass <[email protected]>
>>>> linkedin.com/in/jeffmaass
>>>> stackoverflow.com/users/373418/maassql
>>>> +++++++++++++++++++++
>>>>
>>>>
>>>> On Thu, May 14, 2015 at 9:41 AM, Fan Jiang <[email protected]> wrote:
>>>>
>>>>> One thing to note is that you should try to avoid JDBC operations in a
>>>>> bolt, as they may block the bolt and affect the topology's performance. 
>>>>> Try
>>>>> to do the database access asynchronously, or create a separate thread for
>>>>> JDBC operations.
>>>>>
>>>>> 2015-05-14 10:30 GMT-04:00 Mason Yu <[email protected]>:
>>>>>
>>>>> Interesting.....  Hibernate hooks inside a J2ee container or Spring
>>>>>> which requires a specific OR mapping to a 20th century RDBMS.
>>>>>> Storm works in a Linux distributed environment which does not
>>>>>> need a RDBMS.  RDBMS's do not work in a distributed environment.
>>>>>>
>>>>>> Mason Yu Jr.
>>>>>> CEO
>>>>>> Big Data Architects, LLC.
>>>>>>
>>>>>> 著名的孫子
>>>>>>
>>>>>> On Thu, May 14, 2015 at 9:58 AM, Stephen Powis <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>>  [image: Boxbe] <https://www.boxbe.com/overview> This message is
>>>>>>> eligible for Automatic Cleanup! ([email protected]) Add cleanup
>>>>>>> rule
>>>>>>> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DuAW1cNLhRjzzoTybJZlWM4edzt3m9fQiQ%252Fotr%252BLEu3ac0GIlaQyl%252Be4UagkWlTiCY%252Bvq8KXOkzkzNY0pSkyJzvKKJyQv%252BXceuaA%252FuExYRw6YS1o2s1%252FImPAjQkHSXt%252FvWesPubbzFPmMWCDCtBIJEA%253D%253D%26key%3D%252BXRs6Dx5fQJ4FB57cniXG9YH1MKQnFQnIVYEqegbWGo%253D&tc_serial=21328751243&tc_rand=1774350433&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>>>>>>> | More info
>>>>>>> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=21328751243&tc_rand=1774350433&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>>>>>>>
>>>>>>> Hello everyone!
>>>>>>>
>>>>>>> I'm currently toying around with a prototype built ontop of Storm
>>>>>>> and have been running into some not so easy going while trying to work 
>>>>>>> with
>>>>>>> Hibernate and storm.  I was hoping to get input on if this is just a 
>>>>>>> case
>>>>>>> of "I'm doing it wrong" or maybe get some useful tips.
>>>>>>>
>>>>>>> In my prototype, I have a need to fan out a single tuple to several
>>>>>>> bolts which do data retrieval from our database in parallel, which then 
>>>>>>> get
>>>>>>> merged back into a single stream.  These data retrieval bolts all find
>>>>>>> various hibernate entities and pass them along to the merge bolt.  We've
>>>>>>> written a kryo serializer that converts from the hibernate entities into
>>>>>>> POJOs, which get sent to the merge bolt in tuples.  Once all the tuples 
>>>>>>> get
>>>>>>> to the merge bolt, it collects them all into a single tuple and passes 
>>>>>>> it
>>>>>>> downstream to a bolt which does processing using the entities.
>>>>>>>
>>>>>>> So it looks something like this.
>>>>>>>
>>>>>>>                       ---- (retrieve bolt a) ----
>>>>>>>                     / ---- (retrieve bolt b) ----\
>>>>>>>                    /------(retrieve bolt c) -----\
>>>>>>> --- (split bolt)------(retrieve bolt d)-------(merge bolt) -----
>>>>>>> (processing bolt)
>>>>>>>
>>>>>>> So dealing with detaching the hibernate entities from the session to
>>>>>>> serialize them, and then further downstream when we want to work with 
>>>>>>> the
>>>>>>> entities again, we have to reattach them to a new session....this seems
>>>>>>> kind of awkward.
>>>>>>>
>>>>>>> Does doing the above make sense?  Has anyone attempted to do the
>>>>>>> above?  Any tips or things we should watch out for?  Basically looking 
>>>>>>> for
>>>>>>> any kind of input for this use case.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely,
>>>>> Fan Jiang
>>>>>
>>>>> IT Developer at RENCI
>>>>> [email protected]
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Sincerely,
>> Fan Jiang
>>
>> IT Developer at RENCI
>> [email protected]
>>
>
>

Reply via email to