Re: Lost regions question

2013-04-12 Thread Ted Yu
Brennon:
Have you run hbck to diagnose the problem ?

Since the issue might have involved hdfs, browsing DataNode log(s) may
provide some clue as well.

What hadoop version are you using ?

Cheers

On Thu, Apr 11, 2013 at 10:58 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:

 When you say that the parent regions got reopened does that mean that you
 did not lose any data(any data could not be read).  The reason am asking is
 if after the parent got split into daughters and the data was written to
 daughters and if the daughters related files could not be opened you could
 have ended up in not able to read the data.

 Some logs could tell us what made the parent to get reopened rather than
 daughters.  Another thing i would like to ask is was the cluster brought
 down abruptly by killing the RS.

 Which version of HBase?

 Regards
 Ram




 On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church bren...@getjar.com
 wrote:

  Hello,
 
  I had an interesting problem come up recently.  We have a few thousand
  regions across 8 datanode/regionservers.  I made a change, increasing the
  heap size for hadoop from 128M to 2048M which ended up bringing the
 cluster
  to a complete halt after about 1 hour.  I reverted back to 128M and
 turned
  things back on again but didn't realize at the time that I came up with 9
  fewer regions than I started.  Upon further investigation, I found that
 all
  9 missing regions were from splits that occurred while the cluster was
  running after making the heap change and before it came to a halt.  There
  was a 10th regions (5 splits involved in total) that managed to get
  recovered.  The really odd thing is that in the case of the other 9
  regions, the original parent regions, which as far as I can tell in the
  logs were deleted, were re-opened upon restarting things once again.  The
  daughter regions were gone.  Interestingly, I found the orphaned
 datablocks
  still intact, and in at least some cases have been able to extract the
 data
  from them and will hopefully re-add it to the tables.
 
  My question is this.  Does anyone know based on the rather muddled
  description I've given above, what could have possibly happened here?  My
  best guess is that the bad state that hdfs was in caused some critical
  component of the split process to be missed, which resulted a reference
 to
  the parent regions sticking around and losing the references to the
  daughter regions.
 
  Thanks for any insight you can provide.
 
  --Brennon
 
 
 
 



Re: Error while doing multi get from HBase

2013-04-12 Thread anand nalya
Hi Ted,

The region servers are not loaded. It is showing 5% CPU usage. The datanode
is showing around 50% CPU utilization. disk IO is aroung 7Mbps.

There is nothing noticeable in GC log.

Thanks,
Anand


On 12 April 2013 02:56, Ted Yu yuzhih...@gmail.com wrote:

 How loaded were the region servers when the query was running ?

 Did you check GC log ?

 Thanks

 On Thu, Apr 11, 2013 at 8:23 AM, anand nalya anand.na...@gmail.com
 wrote:

  Hi,
 
  I'm using HBase 0.94.5 with thrift server. I'm trying to get the rows
 from
  HBase using
  org.apache.hadoop.hbase.thrift.generated.Hbase.Client.getRows(ByteBuffer,
  ListByteBuffer, MapByteBuffer, ByteBuffer) but it is giving results
  very slowly (around 2 mins for 100 rows). For larger number of records,
  there is no response.
 
  I've two region server and a total of 128 regions. Total data size is
  around 250GB (250 million records) uniformly distributed across regions.
 
  Regionserver only show the following in its log:
 
  2013-04-11 19:53:44,535 ERROR
  org.apache.hadoop.hbase.regionserver.HRegionServer:
  org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
  multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
 version=1,
  client version=29, methodsFingerPrint=-1368823753 from
  192.168.145.195:52277after 74994 ms, since caller disconnected
  at
 
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3664)
  at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4576)
  at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4549)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2042)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3516)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
  2013-04-11 19:53:46,121 ERROR
  org.apache.hadoop.hbase.regionserver.HRegionServer:
  org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
  multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
 version=1,
  client version=29, methodsFingerPrint=-1368823753 from
  192.168.145.195:52277after 76580 ms, since caller disconnected
  at
 
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3664)
  at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4576)
  at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4549)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2042)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3516)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 
 
  Any idea what might be wrong here?
 
  Thanks,
  Anand
 



Re: Error while doing multi get from HBase

2013-04-12 Thread anand nalya
Hi Azuryy,

I'm using the default cache size of 100 for scanner. For mutigets, I've
tried with 1 (13ms), 10(356ms), 100(1135ms), 1000(4330ms), and
1(17744ms) keys. Normal workload will be around 1 keys at a time.

Are there any optimization that can be done for multigets. Is HBase a good
candidate for usecase?

Thanks,
Anand


On 12 April 2013 19:17, anand nalya a.na...@computer.org wrote:

 Hi Ted,

 The region servers are not loaded. It is showing 5% CPU usage. The
 datanode is showing around 50% CPU utilization. disk IO is aroung 7Mbps.

 There is nothing noticeable in GC log.

 Thanks,
 Anand


 On 12 April 2013 02:56, Ted Yu yuzhih...@gmail.com wrote:

 How loaded were the region servers when the query was running ?

 Did you check GC log ?

 Thanks

 On Thu, Apr 11, 2013 at 8:23 AM, anand nalya anand.na...@gmail.com
 wrote:

  Hi,
 
  I'm using HBase 0.94.5 with thrift server. I'm trying to get the rows
 from
  HBase using
 
 org.apache.hadoop.hbase.thrift.generated.Hbase.Client.getRows(ByteBuffer,
  ListByteBuffer, MapByteBuffer, ByteBuffer) but it is giving results
  very slowly (around 2 mins for 100 rows). For larger number of records,
  there is no response.
 
  I've two region server and a total of 128 regions. Total data size is
  around 250GB (250 million records) uniformly distributed across regions.
 
  Regionserver only show the following in its log:
 
  2013-04-11 19:53:44,535 ERROR
  org.apache.hadoop.hbase.regionserver.HRegionServer:
  org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
  multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
 version=1,
  client version=29, methodsFingerPrint=-1368823753 from
  192.168.145.195:52277after 74994 ms, since caller disconnected
  at
 
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3664)
  at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4576)
  at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4549)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2042)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3516)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
  2013-04-11 19:53:46,121 ERROR
  org.apache.hadoop.hbase.regionserver.HRegionServer:
  org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
  multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
 version=1,
  client version=29, methodsFingerPrint=-1368823753 from
  192.168.145.195:52277after 76580 ms, since caller disconnected
  at
 
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3664)
  at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4576)
  at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4549)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2042)
  at
 
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3516)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 
 
  Any idea what might be wrong here?
 
  Thanks,
  Anand
 





Re: Error while doing multi get from HBase

2013-04-12 Thread Azuryy Yu
and whats your block cache size?

there are two possible reasons:
1. result is too big
2. GC options are not optimized.

can you paste your gc options here?

--Send from my Sony mobile.
On Apr 12, 2013 9:53 PM, anand nalya a.na...@computer.org wrote:

 Hi Azuryy,

 I'm using the default cache size of 100 for scanner. For mutigets, I've
 tried with 1 (13ms), 10(356ms), 100(1135ms), 1000(4330ms), and
 1(17744ms) keys. Normal workload will be around 1 keys at a time.

 Are there any optimization that can be done for multigets. Is HBase a good
 candidate for usecase?

 Thanks,
 Anand


 On 12 April 2013 19:17, anand nalya a.na...@computer.org wrote:

  Hi Ted,
 
  The region servers are not loaded. It is showing 5% CPU usage. The
  datanode is showing around 50% CPU utilization. disk IO is aroung 7Mbps.
 
  There is nothing noticeable in GC log.
 
  Thanks,
  Anand
 
 
  On 12 April 2013 02:56, Ted Yu yuzhih...@gmail.com wrote:
 
  How loaded were the region servers when the query was running ?
 
  Did you check GC log ?
 
  Thanks
 
  On Thu, Apr 11, 2013 at 8:23 AM, anand nalya anand.na...@gmail.com
  wrote:
 
   Hi,
  
   I'm using HBase 0.94.5 with thrift server. I'm trying to get the rows
  from
   HBase using
  
 
 org.apache.hadoop.hbase.thrift.generated.Hbase.Client.getRows(ByteBuffer,
   ListByteBuffer, MapByteBuffer, ByteBuffer) but it is giving
 results
   very slowly (around 2 mins for 100 rows). For larger number of
 records,
   there is no response.
  
   I've two region server and a total of 128 regions. Total data size is
   around 250GB (250 million records) uniformly distributed across
 regions.
  
   Regionserver only show the following in its log:
  
   2013-04-11 19:53:44,535 ERROR
   org.apache.hadoop.hbase.regionserver.HRegionServer:
   org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
   multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
  version=1,
   client version=29, methodsFingerPrint=-1368823753 from
   192.168.145.195:52277after 74994 ms, since caller disconnected
   at
  
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3664)
   at
  org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4576)
   at
  org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4549)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2042)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3516)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
  
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
  
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at
  
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
   at
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
   2013-04-11 19:53:46,121 ERROR
   org.apache.hadoop.hbase.regionserver.HRegionServer:
   org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
   multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
  version=1,
   client version=29, methodsFingerPrint=-1368823753 from
   192.168.145.195:52277after 76580 ms, since caller disconnected
   at
  
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3664)
   at
  org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4576)
   at
  org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4549)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2042)
   at
  
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3516)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
  
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
  
  
 
 

Re: Error while doing multi get from HBase

2013-04-12 Thread anand nalya
the block cache size is 0.25

Each row holds around 2KB data, so size should not be an issue at least
till the number of records is less than 1000.

Also,
HBASE_HEAPSIZE=8000
HBASE_OPTS=-XX:+UseConcMarkSweepGC
HBASE_REGIONSERVER_OPTS=-Xmx4g -Xms4g -Xmn256m -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps

On 12 April 2013 19:30, Azuryy Yu azury...@gmail.com wrote:

 and whats your block cache size?

 there are two possible reasons:
 1. result is too big
 2. GC options are not optimized.

 can you paste your gc options here?

 --Send from my Sony mobile.
 On Apr 12, 2013 9:53 PM, anand nalya a.na...@computer.org wrote:

  Hi Azuryy,
 
  I'm using the default cache size of 100 for scanner. For mutigets, I've
  tried with 1 (13ms), 10(356ms), 100(1135ms), 1000(4330ms), and
  1(17744ms) keys. Normal workload will be around 1 keys at a time.
 
  Are there any optimization that can be done for multigets. Is HBase a
 good
  candidate for usecase?
 
  Thanks,
  Anand
 
 
  On 12 April 2013 19:17, anand nalya a.na...@computer.org wrote:
 
   Hi Ted,
  
   The region servers are not loaded. It is showing 5% CPU usage. The
   datanode is showing around 50% CPU utilization. disk IO is aroung
 7Mbps.
  
   There is nothing noticeable in GC log.
  
   Thanks,
   Anand
  
  
   On 12 April 2013 02:56, Ted Yu yuzhih...@gmail.com wrote:
  
   How loaded were the region servers when the query was running ?
  
   Did you check GC log ?
  
   Thanks
  
   On Thu, Apr 11, 2013 at 8:23 AM, anand nalya anand.na...@gmail.com
   wrote:
  
Hi,
   
I'm using HBase 0.94.5 with thrift server. I'm trying to get the
 rows
   from
HBase using
   
  
  org.apache.hadoop.hbase.thrift.generated.Hbase.Client.getRows(ByteBuffer,
ListByteBuffer, MapByteBuffer, ByteBuffer) but it is giving
  results
very slowly (around 2 mins for 100 rows). For larger number of
  records,
there is no response.
   
I've two region server and a total of 128 regions. Total data size
 is
around 250GB (250 million records) uniformly distributed across
  regions.
   
Regionserver only show the following in its log:
   
2013-04-11 19:53:44,535 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting
 call
multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
   version=1,
client version=29, methodsFingerPrint=-1368823753 from
192.168.145.195:52277after 74994 ms, since caller disconnected
at
   
   
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3664)
at
   org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4576)
at
   org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4549)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2042)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3516)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   
   
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
   
   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
   
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at
   
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
2013-04-11 19:53:46,121 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting
 call
multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
   version=1,
client version=29, methodsFingerPrint=-1368823753 from
192.168.145.195:52277after 76580 ms, since caller disconnected
at
   
   
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
at
   
   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
at
   
   
  
 
 

Re: Error while doing multi get from HBase

2013-04-12 Thread Azuryy Yu
your CMS is not tuned, please find how tune CMS on java web site.

then Xmn is too small, this is not suitable for frequent multi get. please
change Xmn=1g

--Send from my Sony mobile.
On Apr 12, 2013 10:29 PM, anand nalya a.na...@computer.org wrote:

 the block cache size is 0.25

 Each row holds around 2KB data, so size should not be an issue at least
 till the number of records is less than 1000.

 Also,
 HBASE_HEAPSIZE=8000
 HBASE_OPTS=-XX:+UseConcMarkSweepGC
 HBASE_REGIONSERVER_OPTS=-Xmx4g -Xms4g -Xmn256m -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

 On 12 April 2013 19:30, Azuryy Yu azury...@gmail.com wrote:

  and whats your block cache size?
 
  there are two possible reasons:
  1. result is too big
  2. GC options are not optimized.
 
  can you paste your gc options here?
 
  --Send from my Sony mobile.
  On Apr 12, 2013 9:53 PM, anand nalya a.na...@computer.org wrote:
 
   Hi Azuryy,
  
   I'm using the default cache size of 100 for scanner. For mutigets, I've
   tried with 1 (13ms), 10(356ms), 100(1135ms), 1000(4330ms), and
   1(17744ms) keys. Normal workload will be around 1 keys at a
 time.
  
   Are there any optimization that can be done for multigets. Is HBase a
  good
   candidate for usecase?
  
   Thanks,
   Anand
  
  
   On 12 April 2013 19:17, anand nalya a.na...@computer.org wrote:
  
Hi Ted,
   
The region servers are not loaded. It is showing 5% CPU usage. The
datanode is showing around 50% CPU utilization. disk IO is aroung
  7Mbps.
   
There is nothing noticeable in GC log.
   
Thanks,
Anand
   
   
On 12 April 2013 02:56, Ted Yu yuzhih...@gmail.com wrote:
   
How loaded were the region servers when the query was running ?
   
Did you check GC log ?
   
Thanks
   
On Thu, Apr 11, 2013 at 8:23 AM, anand nalya anand.na...@gmail.com
 
wrote:
   
 Hi,

 I'm using HBase 0.94.5 with thrift server. I'm trying to get the
  rows
from
 HBase using

   
  
 org.apache.hadoop.hbase.thrift.generated.Hbase.Client.getRows(ByteBuffer,
 ListByteBuffer, MapByteBuffer, ByteBuffer) but it is giving
   results
 very slowly (around 2 mins for 100 rows). For larger number of
   records,
 there is no response.

 I've two region server and a total of 128 regions. Total data size
  is
 around 250GB (250 million records) uniformly distributed across
   regions.

 Regionserver only show the following in its log:

 2013-04-11 19:53:44,535 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting
  call
 multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
version=1,
 client version=29, methodsFingerPrint=-1368823753 from
 192.168.145.195:52277after 74994 ms, since caller disconnected
 at


   
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
 at


   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3723)
 at


   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3643)
 at


   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3626)
 at


   
  
 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3664)
 at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4576)
 at
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4549)
 at


   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2042)
 at


   
  
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3516)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at


   
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at


   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at


   
  
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
 at

   
  
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 2013-04-11 19:53:46,121 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting
  call
 multi(org.apache.hadoop.hbase.client.MultiAction@49ac272), rpc
version=1,
 client version=29, methodsFingerPrint=-1368823753 from
 192.168.145.195:52277after 76580 ms, since caller disconnected
 at


   
  
 
 

Re: Region stuck in transition

2013-04-12 Thread Fabien Chung
Hello,

I'm  facing some troubles and I don't knwo how to figure out. The master
hbase crashed couple days ago. I restarted it, but until now, the -ROOT- region
is stucked in transition.
I tried to restart the service, delete it/ remove all the file in habse
folder. But the regions still is in transition.

Do you have an idea why ?


*Edit : I tried as well to create the service on another node, but the
result is still the same : region in transition for a while.*


Regards


2013/4/12 Fabien Chung fabien.ch...@ysance.com

 Hello,

 I'm  facing some troubles and I don't knwo how to figure out. The master
 hbase crashed couple days ago. I restarted it, but until now, the -ROOT- 
 region
 is stucked in transition.
 I tried to restart the service, delete it/ remove all the file in habse
 folder. But the regions still is in transition.

 Do you have an idea why ?

 Regards

 --
 *CHUNG Fabien

 *




-- 
Chung Fabien



EFREI Promo 2013
Tel : 06 48 03 54 92


Re: Region stuck in transition

2013-04-12 Thread Kevin O'dell
Hi Fabien,

  How are you doing today?  Have you tried shutting down HBase and going to
the zkCli and deleting the znode for unassigned regions and then restarting
HBase.  It sounds to me like you may have a corrupt state.

On Fri, Apr 12, 2013 at 8:50 AM, Fabien Chung chung.fab...@gmail.comwrote:

 Hello,

 I'm  facing some troubles and I don't knwo how to figure out. The master
 hbase crashed couple days ago. I restarted it, but until now, the -ROOT-
 region
 is stucked in transition.
 I tried to restart the service, delete it/ remove all the file in habse
 folder. But the regions still is in transition.

 Do you have an idea why ?


 *Edit : I tried as well to create the service on another node, but the
 result is still the same : region in transition for a while.*


 Regards


 2013/4/12 Fabien Chung fabien.ch...@ysance.com

  Hello,
 
  I'm  facing some troubles and I don't knwo how to figure out. The master
  hbase crashed couple days ago. I restarted it, but until now, the -ROOT-
 region
  is stucked in transition.
  I tried to restart the service, delete it/ remove all the file in habse
  folder. But the regions still is in transition.
 
  Do you have an idea why ?
 
  Regards
 
  --
  *CHUNG Fabien
 
  *
 



 --
 Chung Fabien



 EFREI Promo 2013
 Tel : 06 48 03 54 92




-- 
Kevin O'Dell
Systems Engineer, Cloudera


Re: Region stuck in transition

2013-04-12 Thread Ted Yu
Can you pastebin master log ?

What version of hbase are you using ?

Thanks

On Apr 12, 2013, at 5:50 AM, Fabien Chung chung.fab...@gmail.com wrote:

 Hello,
 
 I'm  facing some troubles and I don't knwo how to figure out. The master
 hbase crashed couple days ago. I restarted it, but until now, the -ROOT- 
 region
 is stucked in transition.
 I tried to restart the service, delete it/ remove all the file in habse
 folder. But the regions still is in transition.
 
 Do you have an idea why ?
 
 
 *Edit : I tried as well to create the service on another node, but the
 result is still the same : region in transition for a while.*
 
 
 Regards
 
 
 2013/4/12 Fabien Chung fabien.ch...@ysance.com
 
 Hello,
 
 I'm  facing some troubles and I don't knwo how to figure out. The master
 hbase crashed couple days ago. I restarted it, but until now, the -ROOT- 
 region
 is stucked in transition.
 I tried to restart the service, delete it/ remove all the file in habse
 folder. But the regions still is in transition.
 
 Do you have an idea why ?
 
 Regards
 
 --
 *CHUNG Fabien
 
 *
 
 
 
 -- 
 Chung Fabien
 
 
 
 EFREI Promo 2013
 Tel : 06 48 03 54 92


Re: Region stuck in transition

2013-04-12 Thread Ameya Kantikar
We have faced similar issue before (We are on 0.94.2), the way I resolved
this is by doing this:

in hbase shell,

assign region_name

Hope this works for you.




On Fri, Apr 12, 2013 at 8:22 AM, Ted Yu yuzhih...@gmail.com wrote:

 Can you pastebin master log ?

 What version of hbase are you using ?

 Thanks

 On Apr 12, 2013, at 5:50 AM, Fabien Chung chung.fab...@gmail.com wrote:

  Hello,
 
  I'm  facing some troubles and I don't knwo how to figure out. The master
  hbase crashed couple days ago. I restarted it, but until now, the -ROOT-
 region
  is stucked in transition.
  I tried to restart the service, delete it/ remove all the file in habse
  folder. But the regions still is in transition.
 
  Do you have an idea why ?
 
 
  *Edit : I tried as well to create the service on another node, but the
  result is still the same : region in transition for a while.*
 
 
  Regards
 
 
  2013/4/12 Fabien Chung fabien.ch...@ysance.com
 
  Hello,
 
  I'm  facing some troubles and I don't knwo how to figure out. The master
  hbase crashed couple days ago. I restarted it, but until now, the
 -ROOT- region
  is stucked in transition.
  I tried to restart the service, delete it/ remove all the file in habse
  folder. But the regions still is in transition.
 
  Do you have an idea why ?
 
  Regards
 
  --
  *CHUNG Fabien
 
  *
 
 
 
  --
  Chung Fabien
 
 
 
  EFREI Promo 2013
  Tel : 06 48 03 54 92



hbase-0.94.6.1 balancer issue

2013-04-12 Thread Samir Ahmic
Hi, all

I'm evaluating hbase-0.94.6.1 and i have 48 regions on 2 node cluster. I
was restarting on of RSs and after that tried to balance cluster by running
balancer from shell. After running command regions were not distributed to
second RS and i found this line i master log:

2013-04-12 16:45:15,589 INFO org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=2 *regions=1
*average=0.5
mostloaded=1 leastloaded=0

This look like to me that wrong number of regions is reported by balancer
and that cause of  skipping load balancing . In hbase shell i see all 48
tables that i have and everything else looks fine.

Did someone else see this type of behavior ? Did something changed around
balancer in hbase-0.94.6.1 ?

Regards
Samir


Re: Lost regions question

2013-04-12 Thread Brennon Church

Hello,

We lost the data when the parent regions got reopened.  My guess, and 
it's only that, is that the regions were  essentially empty when they 
started up again in these cases.  We definitely lost data from the tables.


I've looked through the hdfs and hbase logs and can't find any obvious 
difference between a successful split and these failed ones.  All steps 
show up the same in all cases.  After the handled split message that 
listed the parent and daughter regions, the next reference is to the 
parent regions once again as hbase is started back up after the 
failure.  No further reference to the daughters is made.


I couldn't cleanly shut several of the regionservers down, so they were 
abruptly killed, yes.


HBase version is 0.92.0, and hadoop is 1.0.1.

Thanks.

--Brennon

On 4/11/13 10:58 PM, ramkrishna vasudevan wrote:

When you say that the parent regions got reopened does that mean that you
did not lose any data(any data could not be read).  The reason am asking is
if after the parent got split into daughters and the data was written to
daughters and if the daughters related files could not be opened you could
have ended up in not able to read the data.

Some logs could tell us what made the parent to get reopened rather than
daughters.  Another thing i would like to ask is was the cluster brought
down abruptly by killing the RS.

Which version of HBase?

Regards
Ram




On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church bren...@getjar.com wrote:


Hello,

I had an interesting problem come up recently.  We have a few thousand
regions across 8 datanode/regionservers.  I made a change, increasing the
heap size for hadoop from 128M to 2048M which ended up bringing the cluster
to a complete halt after about 1 hour.  I reverted back to 128M and turned
things back on again but didn't realize at the time that I came up with 9
fewer regions than I started.  Upon further investigation, I found that all
9 missing regions were from splits that occurred while the cluster was
running after making the heap change and before it came to a halt.  There
was a 10th regions (5 splits involved in total) that managed to get
recovered.  The really odd thing is that in the case of the other 9
regions, the original parent regions, which as far as I can tell in the
logs were deleted, were re-opened upon restarting things once again.  The
daughter regions were gone.  Interestingly, I found the orphaned datablocks
still intact, and in at least some cases have been able to extract the data
from them and will hopefully re-add it to the tables.

My question is this.  Does anyone know based on the rather muddled
description I've given above, what could have possibly happened here?  My
best guess is that the bad state that hdfs was in caused some critical
component of the split process to be missed, which resulted a reference to
the parent regions sticking around and losing the references to the
daughter regions.

Thanks for any insight you can provide.

--Brennon








Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Jean-Marc Spaggiari
Hi Samir,

Regions are balancer per table.

So if you have 48 regions within the same table, it should be split about
24 on each server.

But if you have 48 tables with 1 region each, the for each table, the
balancer will see only 1 region and will display the message you saw.

Have you looked at the UI? What do you have in it? Can you please confirm
if yo uhave 48 tables or 1 table?

Thanks,

JM


2013/4/12 Samir Ahmic ahmic.sa...@gmail.com

 Hi, all

 I'm evaluating hbase-0.94.6.1 and i have 48 regions on 2 node cluster. I
 was restarting on of RSs and after that tried to balance cluster by running
 balancer from shell. After running command regions were not distributed to
 second RS and i found this line i master log:

 2013-04-12 16:45:15,589 INFO org.apache.hadoop.hbase.master.LoadBalancer:
 Skipping load balancing because balanced cluster; servers=2 *regions=1
 *average=0.5
 mostloaded=1 leastloaded=0

 This look like to me that wrong number of regions is reported by balancer
 and that cause of  skipping load balancing . In hbase shell i see all 48
 tables that i have and everything else looks fine.

 Did someone else see this type of behavior ? Did something changed around
 balancer in hbase-0.94.6.1 ?

 Regards
 Samir



Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Jean-Marc Spaggiari
Hi Samir,

Since regions are balanced per table, as soon as you will have more than
one region in your table, balancer will start to balance the regions over
the servers.

You can split some of those tables and will you start to see HBase balance
them. This is normal behavior for 0.94. I don't know for versions before
that.

Also, are you sure you need 48 tables? And not less tables with more CFs?

JM

2013/4/12 Samir Ahmic ahmic.sa...@gmail.com

 Hi, JM

 I have 48 tables and as you said it is 1 region per table since i did not
 reach splitting limit yet. So this is normal behavior  in 0.94.6.1 version
 ?  And at what point balancer will start redistribute regions to second
 server ?

 Thanks
 Samir


 On Fri, Apr 12, 2013 at 6:06 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi Samir,
 
  Regions are balancer per table.
 
  So if you have 48 regions within the same table, it should be split about
  24 on each server.
 
  But if you have 48 tables with 1 region each, the for each table, the
  balancer will see only 1 region and will display the message you saw.
 
  Have you looked at the UI? What do you have in it? Can you please confirm
  if yo uhave 48 tables or 1 table?
 
  Thanks,
 
  JM
 
 
  2013/4/12 Samir Ahmic ahmic.sa...@gmail.com
 
   Hi, all
  
   I'm evaluating hbase-0.94.6.1 and i have 48 regions on 2 node cluster.
 I
   was restarting on of RSs and after that tried to balance cluster by
  running
   balancer from shell. After running command regions were not distributed
  to
   second RS and i found this line i master log:
  
   2013-04-12 16:45:15,589 INFO
 org.apache.hadoop.hbase.master.LoadBalancer:
   Skipping load balancing because balanced cluster; servers=2 *regions=1
   *average=0.5
   mostloaded=1 leastloaded=0
  
   This look like to me that wrong number of regions is reported by
 balancer
   and that cause of  skipping load balancing . In hbase shell i see all
 48
   tables that i have and everything else looks fine.
  
   Did someone else see this type of behavior ? Did something changed
 around
   balancer in hbase-0.94.6.1 ?
  
   Regards
   Samir
  
 



Re: Lost regions question

2013-04-12 Thread ramkrishna vasudevan
Oh..sorry to hear that .  But i think it should be there in the system but
not allowing you to access.  We should be able to bring it back.

One set of logs that would be of interest is that of the RS and master when
the split happened.

And the main thing would be that when you restarted your cluster and the
Master again came back. That is where the system does some self
rectification after it sees if there were some partial splits.

Regards
Ram


On Fri, Apr 12, 2013 at 9:34 PM, Brennon Church bren...@getjar.com wrote:

 Hello,

 We lost the data when the parent regions got reopened.  My guess, and it's
 only that, is that the regions were  essentially empty when they started up
 again in these cases.  We definitely lost data from the tables.

 I've looked through the hdfs and hbase logs and can't find any obvious
 difference between a successful split and these failed ones.  All steps
 show up the same in all cases.  After the handled split message that listed
 the parent and daughter regions, the next reference is to the parent
 regions once again as hbase is started back up after the failure.  No
 further reference to the daughters is made.

 I couldn't cleanly shut several of the regionservers down, so they were
 abruptly killed, yes.

 HBase version is 0.92.0, and hadoop is 1.0.1.

 Thanks.

 --Brennon


 On 4/11/13 10:58 PM, ramkrishna vasudevan wrote:

 When you say that the parent regions got reopened does that mean that you
 did not lose any data(any data could not be read).  The reason am asking
 is
 if after the parent got split into daughters and the data was written to
 daughters and if the daughters related files could not be opened you could
 have ended up in not able to read the data.

 Some logs could tell us what made the parent to get reopened rather than
 daughters.  Another thing i would like to ask is was the cluster brought
 down abruptly by killing the RS.

 Which version of HBase?

 Regards
 Ram




 On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church bren...@getjar.com
 wrote:

  Hello,

 I had an interesting problem come up recently.  We have a few thousand
 regions across 8 datanode/regionservers.  I made a change, increasing the
 heap size for hadoop from 128M to 2048M which ended up bringing the
 cluster
 to a complete halt after about 1 hour.  I reverted back to 128M and
 turned
 things back on again but didn't realize at the time that I came up with 9
 fewer regions than I started.  Upon further investigation, I found that
 all
 9 missing regions were from splits that occurred while the cluster was
 running after making the heap change and before it came to a halt.  There
 was a 10th regions (5 splits involved in total) that managed to get
 recovered.  The really odd thing is that in the case of the other 9
 regions, the original parent regions, which as far as I can tell in the
 logs were deleted, were re-opened upon restarting things once again.  The
 daughter regions were gone.  Interestingly, I found the orphaned
 datablocks
 still intact, and in at least some cases have been able to extract the
 data
 from them and will hopefully re-add it to the tables.

 My question is this.  Does anyone know based on the rather muddled
 description I've given above, what could have possibly happened here?  My
 best guess is that the bad state that hdfs was in caused some critical
 component of the split process to be missed, which resulted a reference
 to
 the parent regions sticking around and losing the references to the
 daughter regions.

 Thanks for any insight you can provide.

 --Brennon








Re: Lost regions question

2013-04-12 Thread Ted Yu
Brennon:
Can you try hbck to see if the problem is repaired ?

Thanks

On Fri, Apr 12, 2013 at 9:27 AM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:

 Oh..sorry to hear that .  But i think it should be there in the system but
 not allowing you to access.  We should be able to bring it back.

 One set of logs that would be of interest is that of the RS and master when
 the split happened.

 And the main thing would be that when you restarted your cluster and the
 Master again came back. That is where the system does some self
 rectification after it sees if there were some partial splits.

 Regards
 Ram


 On Fri, Apr 12, 2013 at 9:34 PM, Brennon Church bren...@getjar.com
 wrote:

  Hello,
 
  We lost the data when the parent regions got reopened.  My guess, and
 it's
  only that, is that the regions were  essentially empty when they started
 up
  again in these cases.  We definitely lost data from the tables.
 
  I've looked through the hdfs and hbase logs and can't find any obvious
  difference between a successful split and these failed ones.  All steps
  show up the same in all cases.  After the handled split message that
 listed
  the parent and daughter regions, the next reference is to the parent
  regions once again as hbase is started back up after the failure.  No
  further reference to the daughters is made.
 
  I couldn't cleanly shut several of the regionservers down, so they were
  abruptly killed, yes.
 
  HBase version is 0.92.0, and hadoop is 1.0.1.
 
  Thanks.
 
  --Brennon
 
 
  On 4/11/13 10:58 PM, ramkrishna vasudevan wrote:
 
  When you say that the parent regions got reopened does that mean that
 you
  did not lose any data(any data could not be read).  The reason am asking
  is
  if after the parent got split into daughters and the data was written to
  daughters and if the daughters related files could not be opened you
 could
  have ended up in not able to read the data.
 
  Some logs could tell us what made the parent to get reopened rather than
  daughters.  Another thing i would like to ask is was the cluster brought
  down abruptly by killing the RS.
 
  Which version of HBase?
 
  Regards
  Ram
 
 
 
 
  On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church bren...@getjar.com
  wrote:
 
   Hello,
 
  I had an interesting problem come up recently.  We have a few thousand
  regions across 8 datanode/regionservers.  I made a change, increasing
 the
  heap size for hadoop from 128M to 2048M which ended up bringing the
  cluster
  to a complete halt after about 1 hour.  I reverted back to 128M and
  turned
  things back on again but didn't realize at the time that I came up
 with 9
  fewer regions than I started.  Upon further investigation, I found that
  all
  9 missing regions were from splits that occurred while the cluster was
  running after making the heap change and before it came to a halt.
  There
  was a 10th regions (5 splits involved in total) that managed to get
  recovered.  The really odd thing is that in the case of the other 9
  regions, the original parent regions, which as far as I can tell in the
  logs were deleted, were re-opened upon restarting things once again.
  The
  daughter regions were gone.  Interestingly, I found the orphaned
  datablocks
  still intact, and in at least some cases have been able to extract the
  data
  from them and will hopefully re-add it to the tables.
 
  My question is this.  Does anyone know based on the rather muddled
  description I've given above, what could have possibly happened here?
  My
  best guess is that the bad state that hdfs was in caused some critical
  component of the split process to be missed, which resulted a reference
  to
  the parent regions sticking around and losing the references to the
  daughter regions.
 
  Thanks for any insight you can provide.
 
  --Brennon
 
 
 
 
 
 



Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Samir Ahmic
Thanks for explaining Jean-Marc,

We are using 0.90.4 for very long time and balancing was based on total
number of regions.That is why i was surprised with balancer log on 0.94.
Well i'm more ops guy then dev i handle what other develop :)

Regards


On Fri, Apr 12, 2013 at 6:24 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Samir,

 Since regions are balanced per table, as soon as you will have more than
 one region in your table, balancer will start to balance the regions over
 the servers.

 You can split some of those tables and will you start to see HBase balance
 them. This is normal behavior for 0.94. I don't know for versions before
 that.

 Also, are you sure you need 48 tables? And not less tables with more CFs?

 JM

 2013/4/12 Samir Ahmic ahmic.sa...@gmail.com

  Hi, JM
 
  I have 48 tables and as you said it is 1 region per table since i did not
  reach splitting limit yet. So this is normal behavior  in 0.94.6.1
 version
  ?  And at what point balancer will start redistribute regions to second
  server ?
 
  Thanks
  Samir
 
 
  On Fri, Apr 12, 2013 at 6:06 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
   Hi Samir,
  
   Regions are balancer per table.
  
   So if you have 48 regions within the same table, it should be split
 about
   24 on each server.
  
   But if you have 48 tables with 1 region each, the for each table, the
   balancer will see only 1 region and will display the message you saw.
  
   Have you looked at the UI? What do you have in it? Can you please
 confirm
   if yo uhave 48 tables or 1 table?
  
   Thanks,
  
   JM
  
  
   2013/4/12 Samir Ahmic ahmic.sa...@gmail.com
  
Hi, all
   
I'm evaluating hbase-0.94.6.1 and i have 48 regions on 2 node
 cluster.
  I
was restarting on of RSs and after that tried to balance cluster by
   running
balancer from shell. After running command regions were not
 distributed
   to
second RS and i found this line i master log:
   
2013-04-12 16:45:15,589 INFO
  org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=2
 *regions=1
*average=0.5
mostloaded=1 leastloaded=0
   
This look like to me that wrong number of regions is reported by
  balancer
and that cause of  skipping load balancing . In hbase shell i see all
  48
tables that i have and everything else looks fine.
   
Did someone else see this type of behavior ? Did something changed
  around
balancer in hbase-0.94.6.1 ?
   
Regards
Samir
   
  
 



Re: Lost regions question

2013-04-12 Thread Brennon Church
hbck does show the hdfs files there without associated regions.  I 
probably could have recovered had I noticed just after this happened, 
but given that we've been running like this for over a week, and that 
there is the potential for collisions between the missing and new data, 
I'm probably just going to manually reinsert it all using the hdfs files.


Hadoop version is 1.0.1, btw.

Thanks.

--Brennon

On 4/11/13 11:05 PM, Ted Yu wrote:

Brennon:
Have you run hbck to diagnose the problem ?

Since the issue might have involved hdfs, browsing DataNode log(s) may
provide some clue as well.

What hadoop version are you using ?

Cheers

On Thu, Apr 11, 2013 at 10:58 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:


When you say that the parent regions got reopened does that mean that you
did not lose any data(any data could not be read).  The reason am asking is
if after the parent got split into daughters and the data was written to
daughters and if the daughters related files could not be opened you could
have ended up in not able to read the data.

Some logs could tell us what made the parent to get reopened rather than
daughters.  Another thing i would like to ask is was the cluster brought
down abruptly by killing the RS.

Which version of HBase?

Regards
Ram




On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church bren...@getjar.com
wrote:


Hello,

I had an interesting problem come up recently.  We have a few thousand
regions across 8 datanode/regionservers.  I made a change, increasing the
heap size for hadoop from 128M to 2048M which ended up bringing the

cluster

to a complete halt after about 1 hour.  I reverted back to 128M and

turned

things back on again but didn't realize at the time that I came up with 9
fewer regions than I started.  Upon further investigation, I found that

all

9 missing regions were from splits that occurred while the cluster was
running after making the heap change and before it came to a halt.  There
was a 10th regions (5 splits involved in total) that managed to get
recovered.  The really odd thing is that in the case of the other 9
regions, the original parent regions, which as far as I can tell in the
logs were deleted, were re-opened upon restarting things once again.  The
daughter regions were gone.  Interestingly, I found the orphaned

datablocks

still intact, and in at least some cases have been able to extract the

data

from them and will hopefully re-add it to the tables.

My question is this.  Does anyone know based on the rather muddled
description I've given above, what could have possibly happened here?  My
best guess is that the bad state that hdfs was in caused some critical
component of the split process to be missed, which resulted a reference

to

the parent regions sticking around and losing the references to the
daughter regions.

Thanks for any insight you can provide.

--Brennon









Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Jean-Daniel Cryans
Samir,

When you say And at what point balancer will start redistribute regions to
second server, do you mean that when you look at the master's web UI you
see that one region server has 0 region? That would be a problem. Else,
that line you posted in your original message should be repeated for each
table, and globally the regions should all be correctly distributed...
unless there's an edge case where when you have only tables with 1 region
it puts them all on the same server :)

Thx,

J-D


On Fri, Apr 12, 2013 at 12:37 PM, Samir Ahmic ahmic.sa...@gmail.com wrote:

 Thanks for explaining Jean-Marc,

 We are using 0.90.4 for very long time and balancing was based on total
 number of regions.That is why i was surprised with balancer log on 0.94.
 Well i'm more ops guy then dev i handle what other develop :)

 Regards


 On Fri, Apr 12, 2013 at 6:24 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi Samir,
 
  Since regions are balanced per table, as soon as you will have more than
  one region in your table, balancer will start to balance the regions over
  the servers.
 
  You can split some of those tables and will you start to see HBase
 balance
  them. This is normal behavior for 0.94. I don't know for versions before
  that.
 
  Also, are you sure you need 48 tables? And not less tables with more CFs?
 
  JM
 
  2013/4/12 Samir Ahmic ahmic.sa...@gmail.com
 
   Hi, JM
  
   I have 48 tables and as you said it is 1 region per table since i did
 not
   reach splitting limit yet. So this is normal behavior  in 0.94.6.1
  version
   ?  And at what point balancer will start redistribute regions to second
   server ?
  
   Thanks
   Samir
  
  
   On Fri, Apr 12, 2013 at 6:06 PM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org wrote:
  
Hi Samir,
   
Regions are balancer per table.
   
So if you have 48 regions within the same table, it should be split
  about
24 on each server.
   
But if you have 48 tables with 1 region each, the for each table, the
balancer will see only 1 region and will display the message you saw.
   
Have you looked at the UI? What do you have in it? Can you please
  confirm
if yo uhave 48 tables or 1 table?
   
Thanks,
   
JM
   
   
2013/4/12 Samir Ahmic ahmic.sa...@gmail.com
   
 Hi, all

 I'm evaluating hbase-0.94.6.1 and i have 48 regions on 2 node
  cluster.
   I
 was restarting on of RSs and after that tried to balance cluster by
running
 balancer from shell. After running command regions were not
  distributed
to
 second RS and i found this line i master log:

 2013-04-12 16:45:15,589 INFO
   org.apache.hadoop.hbase.master.LoadBalancer:
 Skipping load balancing because balanced cluster; servers=2
  *regions=1
 *average=0.5
 mostloaded=1 leastloaded=0

 This look like to me that wrong number of regions is reported by
   balancer
 and that cause of  skipping load balancing . In hbase shell i see
 all
   48
 tables that i have and everything else looks fine.

 Did someone else see this type of behavior ? Did something changed
   around
 balancer in hbase-0.94.6.1 ?

 Regards
 Samir

   
  
 



Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Jean-Marc Spaggiari
I have just created 50 tables and they got distributed on different nodes
(8) at the create time.

I ran the balancer manually and they are still correctly distributed all
over the cluster.

But Samir tried with only 2 nodes. I don't know if this might change the
results or not

JM.

2013/4/12 Jean-Daniel Cryans jdcry...@apache.org

 Samir,

 When you say And at what point balancer will start redistribute regions to
 second server, do you mean that when you look at the master's web UI you
 see that one region server has 0 region? That would be a problem. Else,
 that line you posted in your original message should be repeated for each
 table, and globally the regions should all be correctly distributed...
 unless there's an edge case where when you have only tables with 1 region
 it puts them all on the same server :)

 Thx,

 J-D


 On Fri, Apr 12, 2013 at 12:37 PM, Samir Ahmic ahmic.sa...@gmail.com
 wrote:

  Thanks for explaining Jean-Marc,
 
  We are using 0.90.4 for very long time and balancing was based on total
  number of regions.That is why i was surprised with balancer log on 0.94.
  Well i'm more ops guy then dev i handle what other develop :)
 
  Regards
 
 
  On Fri, Apr 12, 2013 at 6:24 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
   Hi Samir,
  
   Since regions are balanced per table, as soon as you will have more
 than
   one region in your table, balancer will start to balance the regions
 over
   the servers.
  
   You can split some of those tables and will you start to see HBase
  balance
   them. This is normal behavior for 0.94. I don't know for versions
 before
   that.
  
   Also, are you sure you need 48 tables? And not less tables with more
 CFs?
  
   JM
  
   2013/4/12 Samir Ahmic ahmic.sa...@gmail.com
  
Hi, JM
   
I have 48 tables and as you said it is 1 region per table since i did
  not
reach splitting limit yet. So this is normal behavior  in 0.94.6.1
   version
?  And at what point balancer will start redistribute regions to
 second
server ?
   
Thanks
Samir
   
   
On Fri, Apr 12, 2013 at 6:06 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:
   
 Hi Samir,

 Regions are balancer per table.

 So if you have 48 regions within the same table, it should be split
   about
 24 on each server.

 But if you have 48 tables with 1 region each, the for each table,
 the
 balancer will see only 1 region and will display the message you
 saw.

 Have you looked at the UI? What do you have in it? Can you please
   confirm
 if yo uhave 48 tables or 1 table?

 Thanks,

 JM


 2013/4/12 Samir Ahmic ahmic.sa...@gmail.com

  Hi, all
 
  I'm evaluating hbase-0.94.6.1 and i have 48 regions on 2 node
   cluster.
I
  was restarting on of RSs and after that tried to balance cluster
 by
 running
  balancer from shell. After running command regions were not
   distributed
 to
  second RS and i found this line i master log:
 
  2013-04-12 16:45:15,589 INFO
org.apache.hadoop.hbase.master.LoadBalancer:
  Skipping load balancing because balanced cluster; servers=2
   *regions=1
  *average=0.5
  mostloaded=1 leastloaded=0
 
  This look like to me that wrong number of regions is reported by
balancer
  and that cause of  skipping load balancing . In hbase shell i see
  all
48
  tables that i have and everything else looks fine.
 
  Did someone else see this type of behavior ? Did something
 changed
around
  balancer in hbase-0.94.6.1 ?
 
  Regards
  Samir
 

   
  
 



Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Ted Yu
Thanks for the replies, Jean-Marc.

HBASE-7060 is related to the scenario Samir described:
Region load balancing by table does not handle the case where a table's
region count is lower than the number of the RS in the cluster

It was fixed in 0.94.3

Cheers

On Fri, Apr 12, 2013 at 10:30 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 I have just created 50 tables and they got distributed on different nodes
 (8) at the create time.

 I ran the balancer manually and they are still correctly distributed all
 over the cluster.

 But Samir tried with only 2 nodes. I don't know if this might change the
 results or not

 JM.

 2013/4/12 Jean-Daniel Cryans jdcry...@apache.org

  Samir,
 
  When you say And at what point balancer will start redistribute regions
 to
  second server, do you mean that when you look at the master's web UI you
  see that one region server has 0 region? That would be a problem. Else,
  that line you posted in your original message should be repeated for each
  table, and globally the regions should all be correctly distributed...
  unless there's an edge case where when you have only tables with 1 region
  it puts them all on the same server :)
 
  Thx,
 
  J-D
 
 
  On Fri, Apr 12, 2013 at 12:37 PM, Samir Ahmic ahmic.sa...@gmail.com
  wrote:
 
   Thanks for explaining Jean-Marc,
  
   We are using 0.90.4 for very long time and balancing was based on total
   number of regions.That is why i was surprised with balancer log on
 0.94.
   Well i'm more ops guy then dev i handle what other develop :)
  
   Regards
  
  
   On Fri, Apr 12, 2013 at 6:24 PM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org wrote:
  
Hi Samir,
   
Since regions are balanced per table, as soon as you will have more
  than
one region in your table, balancer will start to balance the regions
  over
the servers.
   
You can split some of those tables and will you start to see HBase
   balance
them. This is normal behavior for 0.94. I don't know for versions
  before
that.
   
Also, are you sure you need 48 tables? And not less tables with more
  CFs?
   
JM
   
2013/4/12 Samir Ahmic ahmic.sa...@gmail.com
   
 Hi, JM

 I have 48 tables and as you said it is 1 region per table since i
 did
   not
 reach splitting limit yet. So this is normal behavior  in 0.94.6.1
version
 ?  And at what point balancer will start redistribute regions to
  second
 server ?

 Thanks
 Samir


 On Fri, Apr 12, 2013 at 6:06 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi Samir,
 
  Regions are balancer per table.
 
  So if you have 48 regions within the same table, it should be
 split
about
  24 on each server.
 
  But if you have 48 tables with 1 region each, the for each table,
  the
  balancer will see only 1 region and will display the message you
  saw.
 
  Have you looked at the UI? What do you have in it? Can you please
confirm
  if yo uhave 48 tables or 1 table?
 
  Thanks,
 
  JM
 
 
  2013/4/12 Samir Ahmic ahmic.sa...@gmail.com
 
   Hi, all
  
   I'm evaluating hbase-0.94.6.1 and i have 48 regions on 2 node
cluster.
 I
   was restarting on of RSs and after that tried to balance
 cluster
  by
  running
   balancer from shell. After running command regions were not
distributed
  to
   second RS and i found this line i master log:
  
   2013-04-12 16:45:15,589 INFO
 org.apache.hadoop.hbase.master.LoadBalancer:
   Skipping load balancing because balanced cluster; servers=2
*regions=1
   *average=0.5
   mostloaded=1 leastloaded=0
  
   This look like to me that wrong number of regions is reported
 by
 balancer
   and that cause of  skipping load balancing . In hbase shell i
 see
   all
 48
   tables that i have and everything else looks fine.
  
   Did someone else see this type of behavior ? Did something
  changed
 around
   balancer in hbase-0.94.6.1 ?
  
   Regards
   Samir
  
 

   
  
 



Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Samir Ahmic
Hi, J-D

Well at this moment i have that edge case with only one region per table:).
Like i said i was using 0.90 for  long time and regions were distributed
evenly  on all RSs regardless on region per table ratio. Here is
what confused me (like i said i have 2 nodes cluster distributed mode):

start-hbase --  tables(regions) are distributed evenlyon two RSs (As
expected)
stop one RS --- all tables(regions) are moved to remaining RS (as
expected)
start RS that was down --- run balancer --- LOG:2013-04-12 19:47:20,725
INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
because balanced cluster; servers=2 regions=1 average=0.5 mostloaded=1
leastloaded=0  all tables(regions) stayed  on one server (this is what
i did not expect ?) :)

Here is is part of status 'detailed' from shell ater i start RS that was
down and run balancer:

hbase(main):001:0 status 'detailed'
version 0.94.6.1
0 regionsInTransition
master coprocessors: []
2 live servers
172.17.33.2:60020 1365787755294
requestsPerSecond=0, numberOfOnlineRegions=0, usedHeapMB=38,
maxHeapMB=3487
172.17.33.3:60020 1365777858778
requestsPerSecond=0, numberOfOnlineRegions=49, usedHeapMB=53,
maxHeapMB=3487

So because i have 1 regions per table regions were not rebalances after
start RS  that was down?

Thanks
Samir


On Fri, Apr 12, 2013 at 7:17 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 Samir,

 When you say And at what point balancer will start redistribute regions to
 second server, do you mean that when you look at the master's web UI you
 see that one region server has 0 region? That would be a problem. Else,
 that line you posted in your original message should be repeated for each
 table, and globally the regions should all be correctly distributed...
 unless there's an edge case where when you have only tables with 1 region
 it puts them all on the same server :)

 Thx,

 J-D


 On Fri, Apr 12, 2013 at 12:37 PM, Samir Ahmic ahmic.sa...@gmail.com
 wrote:

  Thanks for explaining Jean-Marc,
 
  We are using 0.90.4 for very long time and balancing was based on total
  number of regions.That is why i was surprised with balancer log on 0.94.
  Well i'm more ops guy then dev i handle what other develop :)
 
  Regards
 
 
  On Fri, Apr 12, 2013 at 6:24 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
   Hi Samir,
  
   Since regions are balanced per table, as soon as you will have more
 than
   one region in your table, balancer will start to balance the regions
 over
   the servers.
  
   You can split some of those tables and will you start to see HBase
  balance
   them. This is normal behavior for 0.94. I don't know for versions
 before
   that.
  
   Also, are you sure you need 48 tables? And not less tables with more
 CFs?
  
   JM
  
   2013/4/12 Samir Ahmic ahmic.sa...@gmail.com
  
Hi, JM
   
I have 48 tables and as you said it is 1 region per table since i did
  not
reach splitting limit yet. So this is normal behavior  in 0.94.6.1
   version
?  And at what point balancer will start redistribute regions to
 second
server ?
   
Thanks
Samir
   
   
On Fri, Apr 12, 2013 at 6:06 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:
   
 Hi Samir,

 Regions are balancer per table.

 So if you have 48 regions within the same table, it should be split
   about
 24 on each server.

 But if you have 48 tables with 1 region each, the for each table,
 the
 balancer will see only 1 region and will display the message you
 saw.

 Have you looked at the UI? What do you have in it? Can you please
   confirm
 if yo uhave 48 tables or 1 table?

 Thanks,

 JM


 2013/4/12 Samir Ahmic ahmic.sa...@gmail.com

  Hi, all
 
  I'm evaluating hbase-0.94.6.1 and i have 48 regions on 2 node
   cluster.
I
  was restarting on of RSs and after that tried to balance cluster
 by
 running
  balancer from shell. After running command regions were not
   distributed
 to
  second RS and i found this line i master log:
 
  2013-04-12 16:45:15,589 INFO
org.apache.hadoop.hbase.master.LoadBalancer:
  Skipping load balancing because balanced cluster; servers=2
   *regions=1
  *average=0.5
  mostloaded=1 leastloaded=0
 
  This look like to me that wrong number of regions is reported by
balancer
  and that cause of  skipping load balancing . In hbase shell i see
  all
48
  tables that i have and everything else looks fine.
 
  Did someone else see this type of behavior ? Did something
 changed
around
  balancer in hbase-0.94.6.1 ?
 
  Regards
  Samir
 

   
  
 



Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Samir Ahmic
HBASE-7060 explains my case, i'm using 0.94.6.1 and looks like issue is
still present.

Thanks for replaying guys
Cheers:)


Re: hbase-0.94.6.1 balancer issue

2013-04-12 Thread Ted Yu
bq. looks like issue is still present.

I think the issue expressed in HBASE-7060 is slightly different:

bq. For example, the cluster has 100 RS, the table has 50 regions sitting
on one RS,

Note: one table had 50 regions

On Fri, Apr 12, 2013 at 11:17 AM, Samir Ahmic ahmic.sa...@gmail.com wrote:

 HBASE-7060 explains my case, i'm using 0.94.6.1 and looks like issue is
 still present.

 Thanks for replaying guys
 Cheers:)