Re: Synchronized Access to ZooCache Causing Threads to Block

Josh Elser Wed, 12 Feb 2014 20:41:21 -0800

Sick! Thanks for sharing -- feedback is always welcome and appreciated.


On 2/12/14, 8:44 PM, Ariel Valentin wrote:

Josh,

We experimented with 1.5.1 today; our load test numbers seem to indicate a 10x 
performance improvement over 1.5.0 on a single JVM. We are running additional 
experiments over the next few days to see what happens when we move to multiple 
JVMs. Stay tuned.

Thanks,
Ariel
---
Sent from my mobile device. Please excuse any errors.

On Feb 12, 2014, at 6:01 PM, Josh Elser <[email protected]> wrote:

Also, for completeness: I filed ACCUMULO-2362 to work on concurrent accesses to 
the same instance in the same JVM.

Also, I misspoke earlier: much of the lock contention comes out of the Tables 
class, not from the Instance. ZooCache keeps a static map of instance to 
ZooCache which are used by a wide breadth of API calls.

On 2/12/14, 3:58 PM, Josh Elser wrote:
ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably
never cleaned up the branch after I finished the ticket.

I believe John Vines started looking at using Curator, but I think he
decided in the end that there wasn't significant gains to be had by
using it. I'm sure he commented on the ticket he had for it.

On 2/12/14, 3:56 PM, Ariel Valentin wrote:
Is the 1833 branch going to be part of 1.5.1?
I recall reading somewhere that there was interest in using Curator to
ameliorate working with zookeeper. Is that still part of the release
roadmap?

Thanks,
Ariel
---
Sent from my mobile device. Please excuse any errors.

On Feb 12, 2014, at 3:13 PM, Josh Elser <[email protected]> wrote:

Great, that helps. Thanks for the info, Ariel!

I think this might be an area we want to revisit in later versions of
Accumulo to make the client API implementations a little more robust
and supportive of concurrent usage.

On 2/12/14, 3:10 PM, Ariel Valentin wrote:
Josh,

The symptom is that we hit a point where a single server seems
"unresponsive" but we do not see anything unusual going on in that
machine and it seems idol. No heavy CPU, no I/O wait, low load average;
however when we add additional instances of the JVM our capacity seems
to increase linearly.

Based on thread dumps and profiler stats it appears that under "heavy"
load most of our threads are blocked trying to access ZooCache.


Ariel Valentin
e-mail: [email protected] <mailto:[email protected]>
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect


On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:

    Didn't mean to ask about the subject matter, but how you were using
    the API. Are you actually seeing contention on ZooCache?


    On 2/12/14, 1:19 PM, Ariel Valentin wrote:

        Sorry but I am not at liberty to be specific about our business
        problem.

        Typical usage is multiple clients writing data to tables, which
        scan to
        avoid duplicate entries.

        Ariel Valentin
        e-mail: [email protected]
<mailto:[email protected]>
        <mailto:ariel@arielvalentin.__com
<mailto:[email protected]>>
        website: http://blog.arielvalentin.com
        skype: ariel.s.valentin
        twitter: arielvalentin
        linkedin: http://www.linkedin.com/__profile/view?id=8996534
        <http://www.linkedin.com/profile/view?id=8996534>
        ------------------------------__---------
        *simplicity *communication
        *feedback *courage *respect


        On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>>
wrote:

             Also, I forgot this part before:

             The ZooCache instance that's used *typically* comes
from the
             Instance object that your Connector was created from.
In other
             words, if you create multiple Instances
(ZooKeeperInstance,
             usually), you can get multiple ZooCaches which means that
        concurrent
             calls to methods off of those objects should not block one
        another
             (createScanner off of connector1 from instance1 should not
        block
             createScanner off of connector2 from instance2).

             That should be something quick you can play with if you so
        desire.


             On 2/12/14, 9:57 AM, Josh Elser wrote:

                 Yep, you'll likely also block on BatchScanner,
anything in
                 TableOperations, and a host of other things.

                 For scanners, there's likely a standing
recommendation to
                 amortize the
                 use of those objects (if you want to look up 5 range,
        don't make 5
                 scanners).

                 Creating a cache per member in the work would likely
        require
                 some kind
                 of paxos implementation to provide consistency
which is
        highly
                 undesirable.

                 One thing I'm curious about is the impact of removing
        ZooCache
                 altogether from things like the client api and see
what
        happens.
                 I don't
                 have a good way to measure that impact off the top of
        my head
                 though.

                 Anyways, is this causing you problems in your usage of
        the api?
                 Could
                 you elaborate a bit more on the specifics?

                 On Feb 12, 2014 4:48 AM, "Ariel Valentin"
                 <[email protected]
        <mailto:[email protected]>
        <mailto:ariel@arielvalentin.__com
<mailto:[email protected]>>
                 <mailto:ariel@arielvalentin.
        <mailto:ariel@arielvalentin.>____com

                 <mailto:ariel@arielvalentin.__com
        <mailto:[email protected]>>>> wrote:

                      I have run into a problem related to
        ACCUMULO-1833, which
                 appears to
                      have addressed the issue for
        MutliTableBatchWriter; however
                 I am
                      seeing this issue on the scanner side also:

                      394750-"http-/192.168.220.196
<http://192.168.220.196>
                 <http://192.168.220.196>:____8080-35" daemon prio=10

                      tid=0x00007f3108038000 nid=0x538a waiting for
        monitor entry
                      [0x00007f31287d1000]

                      394878:   java.lang.Thread.State: BLOCKED (on
        object monitor)

                      394933- at



org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)



                      395012- - waiting to lock <0x00000000fa64f5b8> (a
                 java.lang.Class
                      for
org.apache.accumulo.fate.____zookeeper.ZooCache)

                      395120- at



org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)


                      395196- at



org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)


                      395267- at



org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)


                      395346- at



org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)


                      395421- at



org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)


                      395510- at



org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)



                      I have not spent enough time reasoning about the
        code to
                 understand
                      all of the nuances but I am interested in knowing
        if there
                 are any
                      mitigating strategies for dealing with this
thread
                 contention e.g.
                      would creating a cache entry for each member of
        the Zookeeper
                      ensemble help relieve the strain? use multiple
                 classloaders? or is
                      my only option to spawn multiple JVMs?

                      Thanks,

                      Ariel Valentin
                      e-mail: [email protected]
        <mailto:[email protected]>
                 <mailto:ariel@arielvalentin.__com
        <mailto:[email protected]>>
                 <mailto:ariel@arielvalentin.
        <mailto:ariel@arielvalentin.>____com
        <mailto:ariel@arielvalentin.__com
<mailto:[email protected]>>>


                      website: http://blog.arielvalentin.com
                      skype: ariel.s.valentin
                      twitter: arielvalentin
                      linkedin:
        http://www.linkedin.com/____profile/view?id=8996534
        <http://www.linkedin.com/__profile/view?id=8996534>
                 <http://www.linkedin.com/__profile/view?id=8996534
        <http://www.linkedin.com/profile/view?id=8996534>>
                      ------------------------------____---------

                      *simplicity *communication
                      *feedback *courage *respect

Re: Synchronized Access to ZooCache Causing Threads to Block

Reply via email to