> HBase and HDFS are not made to span datacenters.

Yes, I know. I think the correct terminology is 3 availability zones
within the same region (ping between them is about 1ms)

> You could paste the thread dump

This is the full stacktrace:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode):

"LruBlockCache.EvictionThread" daemon prio=10 tid=0x0000000052cc8000
nid=0x2859 in Object.wait() [0x0000000044510000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000781b79cf0> (a
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)
        at java.lang.Object.wait(Object.java:485)
        at 
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread.run(LruBlockCache.java:519)
        - locked <0x0000000781b79cf0> (a
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread)

"1692369893@qtp-1688716382-1 - Acceptor0
[email protected]:60030" prio=10 tid=0x00002aaab064e000
nid=0x2841 runnable [0x0000000042cf8000]
   java.lang.Thread.State: RUNNABLE
        at 
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:724)
        at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

"DestroyJavaVM" prio=10 tid=0x00000000520e2000 nid=0x2808 waiting on
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Low Memory Detector" daemon prio=10 tid=0x000000005216f800 nid=0x2811
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x000000005216d800 nid=0x2810
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x0000000052167800 nid=0x280f
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x0000000052165800 nid=0x280e
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x0000000052145000 nid=0x280d in
Object.wait() [0x0000000041de9000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007800a43e8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
        - locked <0x00000007800a43e8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x0000000052143000 nid=0x280c
in Object.wait() [0x0000000041ce8000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007800a3618> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x00000007800a3618> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x000000005213c000 nid=0x280b runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00000000520f5000
nid=0x2809 runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00000000520f7000
nid=0x280a runnable

"VM Periodic Task Thread" prio=10 tid=0x000000005217a000 nid=0x2812
waiting on condition

JNI global references: 974

Heap
 PSYoungGen      total 560640K, used 480899K [0x00000007d5560000,
0x0000000800000000, 0x0000000800000000)
  eden space 490368K, 83% used
[0x00000007d5560000,0x00000007ee66ca20,0x00000007f3440000)
  from space 70272K, 99% used
[0x00000007f3440000,0x00000007f78d4590,0x00000007f78e0000)
  to   space 110464K, 0% used
[0x00000007f9420000,0x00000007f9420000,0x0000000800000000)
 PSOldGen        total 266368K, used 155632K [0x0000000780000000,
0x0000000790420000, 0x00000007d5560000)
  object space 266368K, 58% used
[0x0000000780000000,0x00000007897fc3c0,0x0000000790420000)
 PSPermGen       total 33216K, used 18585K [0x000000077ae00000,
0x000000077ce70000, 0x0000000780000000)
  object space 33216K, 55% used
[0x000000077ae00000,0x000000077c0267c0,0x000000077ce70000)

Regards,
Bogdan


On Wed, Sep 21, 2011 at 7:06 PM, Stack <[email protected]> wrote:
> On Wed, Sep 21, 2011 at 1:07 AM, Bogdan Ghidireac <[email protected]> wrote:
>> I have an HBase 0.90.3 fleet with more than 100 hosts distributed
>> across 3 datacenters.
>> Yesterday, we experienced some network issues
>> for several minutes and the region servers from one data center lost
>> the connectivity with the namenode. They started the shutdown sequence
>> but about 20 hosts were unable to complete it successfully. This is
>> bad for us because we have to restart them manually.
>>
>
> HBase and HDFS are not made to span datacenters.
>
>> Is my assumption correct? Should I open a JIRA?
>>
>
> You could paste the thread dump but before doing anything, I'd suggest
> you change your layout so cluster runs inside a single datacenter.
>
> St.Ack
>

Reply via email to