Hi Stack -

Do you have any advice on what to look for (or how to sort it) when I do
lsof or netstat? A glance at it doesn't show any "standouts" but then
I'm not entirely sure what to look for. I see lots of connections to
various nodes in the cluster, from any given node, but I suppose that's
quite normal. Ganglia offers no clues either. It's pretty uniform for
all graphs across all servers.

-geoff

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Stack
Sent: Monday, September 12, 2011 9:12 PM
To: [email protected]
Cc: Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
Subject: Re: scanner deadlock?

No slow datanode in your cluster?

When stuff is slow, can you figure who all are trying to talk to?

St.Ack

On Mon, Sep 12, 2011 at 8:37 PM, Geoff Hendrey <[email protected]>
wrote:
> OK Guys -
>
> We upgraded to 90.4, and made all the suggested config changes. The
only
> thing we have not done yet, but will try soon, is switching from
OpenJDK
> to the HotSpot JVM. Unfortunately, the problem recurs exactly as
before.
> We will test with the HotSpot JVM shortly.
>
> -geoff
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Jean-Daniel Cryans
> Sent: Monday, September 12, 2011 11:44 AM
> To: [email protected]
> Subject: Re: scanner deadlock?
>
>> I thought that as long as I specified neither -client nor -server,
> that
>> Server Class detection would automatically invoke the "-server"
> option.
>>
>>
>
http://download.oracle.com/javase/6/docs/technotes/guides/vm/server-clas
>> s.html
>>
>> We are running 12-core AMD Opteron which is AMD64, so according to
the
>> guide above, -server is selected automatically. Please let me know if
>> I've misunderstood this. We *definitely* want to be running hotspot!
>
> It's two different JVMs, not a matter of using -client or -server
> (which are just different configurations). What you are running is:
>
> http://openjdk.java.net/
>
> What most people run is:
>
> http://www.oracle.com/us/technologies/java/index.html
>
>>
>> Regarding GC: we are generating GC logs for namenode, datanode,
master
>> and regionserver. We do see long GC from time to time. In fact, I
> played
>> with the mslab option, but didn't find significant improvement. We've
>> seen times on the order of a minute in these logs, and have found no
> way
>> around it (spent countless days and nights experimenting with
> different
>> GC parameters, mslab, different heap sizes, etc).
>
> Sometimes it's just a matter of how much data you have in flight.
> That's why I mentioned scanner pre-caching (set via Scan.setCaching),
> because it can potentially load a lot of rows into the RS's heap. More
> concurrent scanners means also more data loaded into memory.
>
> Are you also inserting at the same time? What's your write buffer
size?
>
> The discussion in this jira could be relevant:
> https://issues.apache.org/jira/browse/HBASE-3813
>
> A temporary fix got committed in 0.90.3 to make
> ipc.server.max.queue.size configurable.
>
> J-D
>

Reply via email to