On 2/12/14, 3:10 PM, Ariel Valentin wrote:
Josh,
The symptom is that we hit a point where a single server seems
"unresponsive" but we do not see anything unusual going on in that
machine and it seems idol. No heavy CPU, no I/O wait, low load average;
however when we add additional instances of the JVM our capacity seems
to increase linearly.
Based on thread dumps and profiler stats it appears that under "heavy"
load most of our threads are blocked trying to access ZooCache.
Ariel Valentin
e-mail: [email protected] <mailto:[email protected]>
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect
On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:
Didn't mean to ask about the subject matter, but how you were using
the API. Are you actually seeing contention on ZooCache?
On 2/12/14, 1:19 PM, Ariel Valentin wrote:
Sorry but I am not at liberty to be specific about our business
problem.
Typical usage is multiple clients writing data to tables, which
scan to
avoid duplicate entries.
Ariel Valentin
e-mail: [email protected] <mailto:[email protected]>
<mailto:ariel@arielvalentin.__com <mailto:[email protected]>>
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/__profile/view?id=8996534
<http://www.linkedin.com/profile/view?id=8996534>
------------------------------__---------
*simplicity *communication
*feedback *courage *respect
On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
<[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:
Also, I forgot this part before:
The ZooCache instance that's used *typically* comes from the
Instance object that your Connector was created from. In other
words, if you create multiple Instances (ZooKeeperInstance,
usually), you can get multiple ZooCaches which means that
concurrent
calls to methods off of those objects should not block one
another
(createScanner off of connector1 from instance1 should not
block
createScanner off of connector2 from instance2).
That should be something quick you can play with if you so
desire.
On 2/12/14, 9:57 AM, Josh Elser wrote:
Yep, you'll likely also block on BatchScanner, anything in
TableOperations, and a host of other things.
For scanners, there's likely a standing recommendation to
amortize the
use of those objects (if you want to look up 5 range,
don't make 5
scanners).
Creating a cache per member in the work would likely
require
some kind
of paxos implementation to provide consistency which is
highly
undesirable.
One thing I'm curious about is the impact of removing
ZooCache
altogether from things like the client api and see what
happens.
I don't
have a good way to measure that impact off the top of
my head
though.
Anyways, is this causing you problems in your usage of
the api?
Could
you elaborate a bit more on the specifics?
On Feb 12, 2014 4:48 AM, "Ariel Valentin"
<[email protected]
<mailto:[email protected]>
<mailto:ariel@arielvalentin.__com <mailto:[email protected]>>
<mailto:ariel@arielvalentin.
<mailto:ariel@arielvalentin.>____com
<mailto:ariel@arielvalentin.__com
<mailto:[email protected]>>>> wrote:
I have run into a problem related to
ACCUMULO-1833, which
appears to
have addressed the issue for
MutliTableBatchWriter; however
I am
seeing this issue on the scanner side also:
394750-"http-/192.168.220.196 <http://192.168.220.196>
<http://192.168.220.196>:____8080-35" daemon prio=10
tid=0x00007f3108038000 nid=0x538a waiting for
monitor entry
[0x00007f31287d1000]
394878: java.lang.Thread.State: BLOCKED (on
object monitor)
394933- at
org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
395012- - waiting to lock <0x00000000fa64f5b8> (a
java.lang.Class
for org.apache.accumulo.fate.____zookeeper.ZooCache)
395120- at
org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
395196- at
org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
395267- at
org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
395346- at
org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
395421- at
org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
395510- at
org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
I have not spent enough time reasoning about the
code to
understand
all of the nuances but I am interested in knowing
if there
are any
mitigating strategies for dealing with this thread
contention e.g.
would creating a cache entry for each member of
the Zookeeper
ensemble help relieve the strain? use multiple
classloaders? or is
my only option to spawn multiple JVMs?
Thanks,
Ariel Valentin
e-mail: [email protected]
<mailto:[email protected]>
<mailto:ariel@arielvalentin.__com
<mailto:[email protected]>>
<mailto:ariel@arielvalentin.
<mailto:ariel@arielvalentin.>____com
<mailto:ariel@arielvalentin.__com <mailto:[email protected]>>>
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin:
http://www.linkedin.com/____profile/view?id=8996534
<http://www.linkedin.com/__profile/view?id=8996534>
<http://www.linkedin.com/__profile/view?id=8996534
<http://www.linkedin.com/profile/view?id=8996534>>
------------------------------____---------
*simplicity *communication
*feedback *courage *respect