Re: WARN add_table: Missing .regioninfo:.. No server address.. what to do?

2010-08-26 Thread Stack
On Wed, Aug 25, 2010 at 11:22 AM, Stuart Smith stu24m...@yahoo.com wrote:
 Just curious, though, (if it happens again) - assume the regions were invalid 
 - I don't know, maybe it was halfway through splitting something and died - 
 but say they're invalid.


(See if a failed MR task associated with the bad region.  You could
also tgz' the bad region and we can take a look at it for you.)

 Would the best thing to do in that case be a manual deletion of the hdfs 
 directories containing the invalid regions? What hbase handle that OK?


If its a 'bad' region, should be fine.  There'd be no holes in loaded
table.  But if its not...

 And a side question that ties a lot of my issues together - I finally have a 
 (somewhat) clean interface that moves the occasional too big file into hdfs, 
 and stores everything else into hbase - I built this up as a layer in java 
 with a metadata/filestore split in hbase (all file metadata is in hbase, 
 files are directed to hbase/hdfs based on size).

 Is there another project that does this? It seems too handy to be the first 
 time someone did this... Or does something like this always end up needing 
 domain-specific tweaks  interfaces?


I haven't heard of a project like this (though as you say, you can't
be the first... maybe you are though?)

 Because once you have huge cells in hbase, it really seems to be unhappy. 
 Especially when a good chunk of your tasks are done as M/R tasks or some 
 layer on top of M/R.


Yeah, I'd imagine so.  At least default configuration is set for cells
in the 0-50k or so size.  I'd imagine they'd need to be pulled around
some if cells are MBs.

 Or would this be a good project to open-source? Or pointless to do so?


Do it on github as Ted suggests.  It'll either flourish and then
you'll have to figure out how to support it or it'll wither when you
move on (add it to supporting projects on wiki so its easier for folks
to find?)

 I guess in the long-run hbase could absorb these requirements with some 
 tweaks of the file format, but I thought it could be nice to do this with a 
 little library layer on top.


You are a good man Stu,
St.Ack


 --- On Mon, 8/23/10, Stack st...@duboce.net wrote:

 From: Stack st...@duboce.net
 Subject: Re: WARN add_table: Missing .regioninfo:.. No server address.. what 
 to do?
 To: user@hbase.apache.org
 Date: Monday, August 23, 2010, 6:08 PM
 On Mon, Aug 23, 2010 at 1:35 PM,
 Stuart Smith stu24m...@yahoo.com
 wrote:
 
  Hmm... AFAICT, if the regioninfo files is gone from a
 region directory (and I looked on hdfs, and it is gone), the
 region is hosed.

 Is it a legit region?  Its wholesome looking with
 hfiles that make
 sense (non-zero)?  My guess is that the regions are
 incompletes and
 loadtable is not smart enough recognizing them as so.
 If you grep
 your master log for the region encoded name, do you find
 anything?
 Maybe this way you can figure its provenance?

 St.Ack








Re: WARN add_table: Missing .regioninfo:.. No server address.. what to do?

2010-08-26 Thread Stuart Smith
Hey,

Awesome. Well, this is a research project for work, so I have to ask the powers 
that be if it's OK to publish the plumbing parts.

It's really just plumbing though, so from the techy perspective it's not the 
interesting part. So hopefully I can sell it as such (selling my work to the 
boss as not interesting.. hmm... ;) ).

We'll see. I'm not an expert Java coder either, but, hopefully I can get it up 
and stimulate something...

Take care,
  -stu

--- On Thu, 8/26/10, Stack st...@duboce.net wrote:

 From: Stack st...@duboce.net
 Subject: Re: WARN add_table: Missing .regioninfo:.. No server address.. what 
 to do?
 To: user@hbase.apache.org
 Date: Thursday, August 26, 2010, 2:11 AM
 On Wed, Aug 25, 2010 at 11:22 AM,
 Stuart Smith stu24m...@yahoo.com
 wrote:
  Just curious, though, (if it happens again) - assume
 the regions were invalid - I don't know, maybe it was
 halfway through splitting something and died - but say
 they're invalid.
 
 
 (See if a failed MR task associated with the bad
 region.  You could
 also tgz' the bad region and we can take a look at it for
 you.)
 
  Would the best thing to do in that case be a manual
 deletion of the hdfs directories containing the invalid
 regions? What hbase handle that OK?
 
 
 If its a 'bad' region, should be fine.  There'd be no
 holes in loaded
 table.  But if its not...
 
  And a side question that ties a lot of my issues
 together - I finally have a (somewhat) clean interface that
 moves the occasional too big file into hdfs, and stores
 everything else into hbase - I built this up as a layer in
 java with a metadata/filestore split in hbase (all file
 metadata is in hbase, files are directed to hbase/hdfs based
 on size).
 
  Is there another project that does this? It seems too
 handy to be the first time someone did this... Or does
 something like this always end up needing domain-specific
 tweaks  interfaces?
 
 
 I haven't heard of a project like this (though as you say,
 you can't
 be the first... maybe you are though?)
 
  Because once you have huge cells in hbase, it really
 seems to be unhappy. Especially when a good chunk of your
 tasks are done as M/R tasks or some layer on top of M/R.
 
 
 Yeah, I'd imagine so.  At least default configuration
 is set for cells
 in the 0-50k or so size.  I'd imagine they'd need to
 be pulled around
 some if cells are MBs.
 
  Or would this be a good project to open-source? Or
 pointless to do so?
 
 
 Do it on github as Ted suggests.  It'll either
 flourish and then
 you'll have to figure out how to support it or it'll wither
 when you
 move on (add it to supporting projects on wiki so its
 easier for folks
 to find?)
 
  I guess in the long-run hbase could absorb these
 requirements with some tweaks of the file format, but I
 thought it could be nice to do this with a little library
 layer on top.
 
 
 You are a good man Stu,
 St.Ack
 
 
  --- On Mon, 8/23/10, Stack st...@duboce.net
 wrote:
 
  From: Stack st...@duboce.net
  Subject: Re: WARN add_table: Missing
 .regioninfo:.. No server address.. what to do?
  To: user@hbase.apache.org
  Date: Monday, August 23, 2010, 6:08 PM
  On Mon, Aug 23, 2010 at 1:35 PM,
  Stuart Smith stu24m...@yahoo.com
  wrote:
  
   Hmm... AFAICT, if the regioninfo files is
 gone from a
  region directory (and I looked on hdfs, and it is
 gone), the
  region is hosed.
 
  Is it a legit region?  Its wholesome looking
 with
  hfiles that make
  sense (non-zero)?  My guess is that the regions
 are
  incompletes and
  loadtable is not smart enough recognizing them as
 so.
  If you grep
  your master log for the region encoded name, do
 you find
  anything?
  Maybe this way you can figure its provenance?
 
  St.Ack
 
 
 
 
 
 
 






Out Of Memory on region servers upon bulk import

2010-08-26 Thread Martin Arnandze
Hi,
 I'm doing an experiment on an 8 node cluster, each of which has 6GB of RAM 
allocated to hbase region server. Basically, doing a bulk import processing 
large files, but some imports require to do gets and scans as well. In the 
master UI I see that the heap used gets very close to the 6GB limit, but I know 
hbase is eager for memory and will use the heap as much as possible.I use block 
caching. Looking at similar posts I see that modifying the handler count and 
memory store upper/ower limits may be key to solving this issue. Nevertheless I 
wanted to ask if there is a way to estimate the extra memory used by hbase that 
makes it crash and if there are other configuration settings I should be 
looking into to prevent OOME. The job runs correctly for some time but region 
servers eventually crash.

More information about the cluster:

- All nodes have 16GM total memory. 
- 7 nodes running region server (6GB) +  datanodes (1GB) + task trackers (1GB 
Heap).  Map reduce jobs running w/ 756MB tops each.
- 1 node running hbase master (2GB Heap allocated), namenode (4GB), Secondary 
Namenode (4GB), JobTracker (4GB) and Master (2GB). 
- 3 of the nodes have zookeeper running with 512MB Heap

Many thanks,
   Martin



2010-08-26 07:19:14,859 ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
table_import,8ded1642-1c52-444a-bfdc-43521b220714-9223370754627999807UbbWwFDcGatAe8OniLMUXoaVeEdOvSkqiwXfJgUxNlt0aosKXsWevrlra8QDbEvTZelj/jLyux8y\x0AcCBiLeHbqg==,1282792675254
java.lang.OutOfMemoryError: Java heap space
   at 
org.apache.hadoop.hbase.io.hfile.HFile$BlockIndex.readIndex(HFile.java:1538)
   at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:806)
   at 
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:273)
   at 
org.apache.hadoop.hbase.regionserver.StoreFile.init(StoreFile.java:129)
   at 
org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:410)
   at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221)
   at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1636)
   at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:321)
   at 
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1571)
   at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1538)
   at 
org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1458)
   at java.lang.Thread.run(Thread.java:619)


Re: Out Of Memory on region servers upon bulk import

2010-08-26 Thread Stack
On Thu, Aug 26, 2010 at 8:07 AM, Martin Arnandze marnan...@gmail.com wrote:
 Hi,
  I'm doing an experiment on an 8 node cluster, each of which has 6GB of RAM 
 allocated to hbase region server. Basically, doing a bulk import processing 
 large files,


How large?

Unless very large, it should not be OOMEing.

but some imports require to do gets and scans as well. In the master
UI I see that the heap used gets very close to the 6GB limit, but I
know hbase is eager for memory and will use the heap as much as
possible.I use block caching. Looking at similar posts I see that
modifying the handler count and memory store upper/ower limits may be
key to solving this issue. Nevertheless I wanted to ask if there is a
way to estimate the extra memory used by hbase that makes it crash and
if there are other configuration settings I should be looking into to
prevent OOME. The job runs correctly for some time but region servers
eventually crash.

 More information about the cluster:

 - All nodes have 16GM total memory.
 - 7 nodes running region server (6GB) +  datanodes (1GB) + task trackers (1GB 
 Heap).  Map reduce jobs running w/ 756MB tops each.

Good.  How many MR child tasks can run on each node concurrently?

 - 1 node running hbase master (2GB Heap allocated), namenode (4GB), Secondary 
 Namenode (4GB), JobTracker (4GB) and Master (2GB).
 - 3 of the nodes have zookeeper running with 512MB Heap

 Many thanks,
   Martin



Can we see the lines before the below is thrown?   Also, do a listing
(ls -r) on this region in hdfs and lets see if anything pops out about
files sizes, etc.  You'll need to manually map the below region name
to its encoded name to figure the region but the encoded name should
be earlier in the log.  You'll do something like:

bin/hbase fs -lsr /hbase/table_import/REGION_ENCODED_NAME

Thanks,
St.Ack




 2010-08-26 07:19:14,859 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
 table_import,8ded1642-1c52-444a-bfdc-43521b220714-9223370754627999807UbbWwFDcGatAe8OniLMUXoaVeEdOvSkqiwXfJgUxNlt0aosKXsWevrlra8QDbEvTZelj/jLyux8y\x0AcCBiLeHbqg==,1282792675254
 java.lang.OutOfMemoryError: Java heap space
       at 
 org.apache.hadoop.hbase.io.hfile.HFile$BlockIndex.readIndex(HFile.java:1538)
       at 
 org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:806)
       at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:273)
       at 
 org.apache.hadoop.hbase.regionserver.StoreFile.init(StoreFile.java:129)
       at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:410)
       at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221)
       at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1636)
       at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:321)
       at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1571)
       at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1538)
       at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1458)
       at java.lang.Thread.run(Thread.java:619)



Re: RegionServer can't recover after a failure

2010-08-26 Thread Stack
On Thu, Aug 26, 2010 at 8:16 AM, Andrey Timerbaev atimerb...@gmx.net wrote:
 Dear experts,

 Could you kindly suggest, how to help the RegionServer to complete
 initialization in the following situation:

 After a failure of one or RegionServers, which is running on a dedicated node 
 in
 a HBase/Hadoop cluster (HBase v.0.20.3), the RegionServer can't initialize
 available tables. The region server's log contains this exception:


You are running transactional hbase?  This is intentional I take it.

 After a look into HBase source code I found out, that the Table not created.
 Call createTable() message appears, if the HBaseBackedTransactionLogger is
 unable to find the __GLOBAL_TRX_LOG__ table. But I've got no idea, where the
 table should be, whether it is critical and what should I do in this 
 situation.

Me neither.  Let me poke the transactional fellows and see if they can
offer help.

Thanks,
St.Ack


Re: Out Of Memory on region servers upon bulk import

2010-08-26 Thread Martin Arnandze
I provide the answers below.
Thanks!
  Martin

On Aug 26, 2010, at 11:45 AM, Stack wrote:

 On Thu, Aug 26, 2010 at 8:07 AM, Martin Arnandze marnan...@gmail.com wrote:
 Hi,
  I'm doing an experiment on an 8 node cluster, each of which has 6GB of RAM 
 allocated to hbase region server. Basically, doing a bulk import processing 
 large files,
 
 
 How large?

about 10 million records each a few Kb.

 
 Unless very large, it should not be OOMEing.
 
 but some imports require to do gets and scans as well. In the master
 UI I see that the heap used gets very close to the 6GB limit, but I
 know hbase is eager for memory and will use the heap as much as
 possible.I use block caching. Looking at similar posts I see that
 modifying the handler count and memory store upper/ower limits may be
 key to solving this issue. Nevertheless I wanted to ask if there is a
 way to estimate the extra memory used by hbase that makes it crash and
 if there are other configuration settings I should be looking into to
 prevent OOME. The job runs correctly for some time but region servers
 eventually crash.
 
 More information about the cluster:
 
 - All nodes have 16GM total memory.
 - 7 nodes running region server (6GB) +  datanodes (1GB) + task trackers 
 (1GB Heap).  Map reduce jobs running w/ 756MB tops each.
 
 Good.  How many MR child tasks can run on each node concurrently? 

three mappers and two reducers

 
 - 1 node running hbase master (2GB Heap allocated), namenode (4GB), 
 Secondary Namenode (4GB), JobTracker (4GB) and Master (2GB).
 - 3 of the nodes have zookeeper running with 512MB Heap
 
 Many thanks,
   Martin
 
 
 
 Can we see the lines before the below is thrown?   Also, do a listing
 (ls -r) on this region in hdfs and lets see if anything pops out about
 files sizes, etc.  You'll need to manually map the below region name
 to its encoded name to figure the region but the encoded name should
 be earlier in the log.  You'll do something like:
 
 bin/hbase fs -lsr /hbase/table_import/REGION_ENCODED_NAME

/usr/lib/hadoop-0.20/bin/hadoop fs -lsr /hbase/table_import/1698505444
-rw-r--r--   3 hadoop supergroup   1450 2010-08-25 23:17 
/hbase/table_import/1698505444/.regioninfo
drwxr-xr-x   - hadoop supergroup  0 2010-08-26 11:07 
/hbase/table_importl/1698505444/fam
-rw-r--r--   3 hadoop supergroup4244491 2010-08-26 11:07 
/hbase/table_import/1698505444/fam/5785049964186428982
-rw-r--r--   3 hadoop supergroup  180147216 2010-08-26 06:09 
/hbase/table_import/1698505444/fam/705757673046090229


Previous log:

010-08-26 07:18:14,691 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
region 
table_import,f1cbb42c-b6ae-404d-800c-043da5409441-9223370754623831807WkmpwnRDmveKYzWEfw/tb4GpP9yHDl+/G7OCaZWEgrmGcW+XEF131YDTQwDqZsO93tDicdPcOdRq\x0AU7zDBqoxpA==,1282790086498/1451783432
 available; sequence id is 1518500874
2010-08-26 07:18:14,691 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: 
table_import,d1e50232ac85a0a965e48647de5dc6ce-92233707546248658079F073MJ/gGEEs6mwkLsY/lLH+QvHGVBhBavAz0HSPEEKY+NrjTTzHUJdPtuJ0lXqz2i2Qs2DmFkz\x0A5P2broA7Gg==,128278399
2010-08-26 07:18:14,691 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Creating region 
table_import,d1e50232ac85a0a965e48647de5dc6ce-92233707546248658079F073MJ/gGEEs6mwkLsY/lLH+QvHGVBhBavAz0HSPEEKY+NrjTTzHUJdPtuJ0lXqz2i2Qs2DmFkz\x0A5P2broA7Gg==,128278399,
 encoded=1510556231
2010-08-26 07:18:21,085 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
Cache Stats: Sizes: Total=958.38617MB (1004940736), Free=238.2MB 
(249864000), Max=1196.675MB (1254804736), Counts: Blocks=115717, 
Access=51364517, Hit=231796, Miss=51132721, Evictions=15, Evicted=218920, 
Ratios: Hit Ratio=0.45127649791538715%, Miss Ratio=99.54872131347656%, 
Evicted/Run=14594.6669921875
2010-08-26 07:18:27,659 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
loaded /hbase/table_import/1510556231/fam/2639910770219077750, 
isReference=false, sequence id=1518500860, length=200014693, 
majorCompaction=false
2010-08-26 07:18:35,188 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
region 
table_import,d1e50232ac85a0a965e48647de5dc6ce-92233707546248658079F073MJ/gGEEs6mwkLsY/lLH+QvHGVBhBavAz0HSPEEKY+NrjTTzHUJdPtuJ0lXqz2i2Qs2DmFkz\x0A5P2broA7Gg==,128278399/1510556231
 available; sequence id is 1518500861
2010-08-26 07:18:35,188 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: 
table_import,8ded1642-1c52-444a-bfdc-43521b220714-9223370754627999807UbbWwFDcGatAe8OniLMUXoaVeEdOvSkqiwXfJgUxNlt0aosKXsWevrlra8QDbEvTZelj/jLyux8y\x0AcCBiLeHbqg==,1282792675254
2010-08-26 07:18:35,189 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Creating region 
table_import,8ded1642-1c52-444a-bfdc-43521b220714-9223370754627999807UbbWwFDcGatAe8OniLMUXoaVeEdOvSkqiwXfJgUxNlt0aosKXsWevrlra8QDbEvTZelj/jLyux8y\x0AcCBiLeHbqg==,1282792675254,
 encoded=1698505444
2010-08-26 07:19:14,859 ERROR 

hbase

2010-08-26 Thread Witteveen, Tim
I'm going through the overiew-summary instructions for setting up and running 
hbase. Right now I'm running hbase in pseudo-distributed mode, and looking to 
go fully-distributed on 25 nodes. 

Every time I restart hbase, I get:
Couldn't start ZK at requested address of n, instead got: n+1. Aborting. 
Why? Because clients (eg shell) won't be able to find this ZK quorum

If I change the hbase.zookeeper.property.clientPort to the n+1 from the 
message it starts right up.  

Which file do I need to modify to keep this on one port, and what do I need to 
put into it? 

Is this something that should be added to the overview-summary page? 

Thanks,
TimW 


Re: hbase

2010-08-26 Thread Ted Yu
Use netstat to see who is occupying port n

Maybe HQuorumPeer wasn't stopped from previous run ?

On Thu, Aug 26, 2010 at 9:49 AM, Witteveen, Tim t...@pnl.gov wrote:

 I'm going through the overiew-summary instructions for setting up and
 running hbase. Right now I'm running hbase in pseudo-distributed mode, and
 looking to go fully-distributed on 25 nodes.

 Every time I restart hbase, I get:
 Couldn't start ZK at requested address of n, instead got: n+1.
 Aborting. Why? Because clients (eg shell) won't be able to find this ZK
 quorum

 If I change the hbase.zookeeper.property.clientPort to the n+1 from the
 message it starts right up.

 Which file do I need to modify to keep this on one port, and what do I need
 to put into it?

 Is this something that should be added to the overview-summary page?

 Thanks,
 TimW



RE: hbase

2010-08-26 Thread Witteveen, Tim
Thanks!  Netstat revealed I was running zookeeper twice.  

I stopped manually starting it, and things are working as expected.  

TimW 

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Thursday, August 26, 2010 9:56 AM
To: user@hbase.apache.org
Subject: Re: hbase

Use netstat to see who is occupying port n

Maybe HQuorumPeer wasn't stopped from previous run ?

On Thu, Aug 26, 2010 at 9:49 AM, Witteveen, Tim t...@pnl.gov wrote:

 I'm going through the overiew-summary instructions for setting up and
 running hbase. Right now I'm running hbase in pseudo-distributed mode, and
 looking to go fully-distributed on 25 nodes.

 Every time I restart hbase, I get:
 Couldn't start ZK at requested address of n, instead got: n+1.
 Aborting. Why? Because clients (eg shell) won't be able to find this ZK
 quorum

 If I change the hbase.zookeeper.property.clientPort to the n+1 from the
 message it starts right up.

 Which file do I need to modify to keep this on one port, and what do I need
 to put into it?

 Is this something that should be added to the overview-summary page?

 Thanks,
 TimW



Region splits in 0.89...

2010-08-26 Thread Vidhyashankar Venkataraman
My hbase table issued a mass split after I loaded regions with greater sizes 
than maxfilesize.. (my bad..)

Now, when I try accessing the master through the web interface, it just hangs...

And, if I scan the META, I get the parent regions set to offline.. And the 
child regions have random byte keys (that was part of the HBASE 2515).. But one 
thing that troubles me is that even the split child regions don't seem to have 
been assigned to any server according to META:  I am not sure if this is 
expected..


 DocDB,03617973,12column=info:regioninfo, 
timestamp=1282810230593, value=REGION = {NAME =
 82331439883.6a66354e2c587   
'DocDB,03617973,1282331439883.6a66354e2c58779278942a8e5cf152ca.'
 79278942a8e5cf152ca., STARTKEY = '03617973', 
ENDKEY = '03751972', ENCODED =
   
6a66354e2c58779278942a8e5cf152ca, OFFLINE = true, SPLIT = true, TABLE
   = {{NAME = 'DocDB', MAX_FILESIZE = 
'4294967296', FAMILIES = [{NAME =
'bigColumn', BLOOMFILTER = 'NONE', 
REPLICATION_SCOPE = '0', VERSIONS
  = '1', COMPRESSION = 'NONE', TTL = 
'2147483647', BLOCKSIZE = '1048576
  ', IN_MEMORY = 'false', BLOCKCACHE = 'false'}]}}
 DocDB,03617973,12column=info:server, 
timestamp=1282810230593, value=
 82331439883.6a66354e2c587
 79278942a8e5cf152ca.
 DocDB,03617973,12  column=info:serverstartcode, 
timestamp=1282810230593, value=
 82331439883.6a66354e2c587
 79278942a8e5cf152ca.
 DocDB,03617973,12column=info:splitA, 
timestamp=1282810230593, value=\x00\x10036849
 82331439883.6a66354e2c587
63\x00\x00\x00\x01*\xADr\xB4\x82FDocDB,03617973,1282810229890.57f
 79278942a8e5cf152ca.
2eed2d43cece270244a39233f5a5a.\x00\x1003617973\x00\x00\x00\x05\x0

5DocDB\x00\x00\x00\x00\x00\x03\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05fals
  
e\x00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x0CMAX_FILESIZE
   
\x00\x00\x00\x0A4294967296\x00\x00\x00\x01\x08\x09bigColumn\x00\x00\x00\x
   
08\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x04NONE\x00\x00\x00\x11REPLICAT
   
ION_SCOPE\x00\x00\x00\x010\x00\x00\x00\x0BCOMPRESSION\x00\x00\x00\x04NONE
   
\x00\x00\x00\x08VERSIONS\x00\x00\x00\x011\x00\x00\x00\x03TTL\x00\x00\x00\
   
x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x071048576\x00\x00\x00
   
\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\
   x05false0\xF1\xD4\xBB
 DocDB,03617973,12column=info:splitA_checked, 
timestamp=1282815236171, value=\x01
 82331439883.6a66354e2c587
 79278942a8e5cf152ca.
 DocDB,03617973,12   column=info:splitB, 
timestamp=1282810230593, value=\x00\x10037519
 82331439883.6a66354e2c587  
72\x00\x00\x00\x01*\xADr\xB4\x82FDocDB,03684963,1282810229890.5fe
 79278942a8e5cf152ca.  
1f7c4de1640642fb8454a1b8c623c.\x00\x1003684963\x00\x00\x00\x05\x0
   
5DocDB\x00\x00\x00\x00\x00\x03\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05fals

e\x00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x0CMAX_FILESIZE
 
\x00\x00\x00\x0A4294967296\x00\x00\x00\x01\x08\x09bigColumn\x00\x00\x00\x
   
08\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x04NONE\x00\x00\x00\x11REPLICAT
   
ION_SCOPE\x00\x00\x00\x010\x00\x00\x00\x0BCOMPRESSION\x00\x00\x00\x04NONE
   
\x00\x00\x00\x08VERSIONS\x00\x00\x00\x011\x00\x00\x00\x03TTL\x00\x00\x00\
   
x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x071048576\x00\x00\x00
   
\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\
   x05false\xB54N\x1E
 DocDB,03617973,12  column=info:splitB_checked, 
timestamp=1282815236219, value=\x01
 82331439883.6a66354e2c587
 79278942a8e5cf152ca.


Re: region servers crashing

2010-08-26 Thread Ryan Rawson
Without gc logs you cannot diagnose what you suspect are gc issues...
make sure you are logging and then check them out.  If you are running
a recent JVM you can use -XX:+PrintGCDateStamps and get better log
entries.

Also you cannot swap at all, even 1 page of swapping in a java process
can be killer.  Combined with the hypervisor stealing your CPU you can
have a lot of elapsed wall time with not very many cpu slices being
executed.  Consider vmstat and top to diagnose that one issue.

On the GC issue, the one setting you are using which is initiating
occupancy fraction is set kind of low. This means you will kick in the
GC once you hit 50% of your memory usage.  You might consider testing
with that set to a medium level, say 75% or so.

-ryan

On Thu, Aug 26, 2010 at 12:17 PM, Dmitry Chechik dmi...@tellapart.com wrote:
 Hi all,
 We're still seeing these crashes pretty frequently. Attached is the error
 from the regionserver logs as well as a GC dump of the last hour of the
 regionserver:
 2010-08-26 13:34:10,855 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
 157041ms, ten times longer than scheduled: 1
 2010-08-26 13:34:10,925 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
 148602ms, ten times longer than scheduled: 1000
 2010-08-26 13:34:10,925 WARN
 org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to
 master for 148602 milliseconds - retrying
 Since our workload is mostly scans in mapreduce, we've turned off block
 caching as per https://issues.apache.org/jira/browse/HBASE-2252 in case that
 had anything to do with it.
 We've also decreased NewSize and MaxNewSize and decreased
 CMSInitiatingOccupancyFraction, so our GC settings now are:
 -Xmx2000m
   -XX:+UseConcMarkSweepGC
   -XX:CMSInitiatingOccupancyFraction=50



   -XX:NewSize=32m



   -XX:MaxNewSize=32m



   -XX:+DoEscapeAnalysis
   -XX:+AggressiveOpts
   -verbose:gc
   -XX:+PrintGCDetails
   -XX:+PrintGCTimeStamp
 We're running with 2G of RAM.
 Is the solution here only to move to machines with more RAM, or are there
 other GC settings we should look at?
 Thanks,
 - Dmitry
 On Wed, Jul 14, 2010 at 4:39 PM, Dmitry Chechik dmi...@tellapart.com
 wrote:

 We're running with 1GB of heap space.
 Thanks all - we'll look into GC tuning some more.

 On Wed, Jul 14, 2010 at 3:47 PM, Jonathan Gray jg...@facebook.com wrote:

 This doesn't look like a clock skew issue.

 @Dmitry, while you should be running CMS, this is still a garbage
 collector and is still vulnerable to GC pauses.  There are additional
 configuration parameters to tune even more.

 How much heap are you running with on your RSs?  If you are hitting your
 servers with lots of load you should run with 4GB or more.

 Also, having ZK on the same servers as RS/DN is going to create problems
 if you're already hitting your IO limits.

 JG

  -Original Message-
  From: Arun Ramakrishnan [mailto:aramakrish...@languageweaver.com]
  Sent: Wednesday, July 14, 2010 3:33 PM
  To: user@hbase.apache.org
  Subject: RE: region servers crashing
 
  Had a problem that caused issues that looked like this.
 
   2010-07-12 15:10:03,299 WARN org.apache.hadoop.hbase.util.Sleeper: We
  slept
   86246ms, ten times longer than scheduled: 1000
 
  Our problem was with clock skew. We just had to make sure ntp was
  running on all machines and also the timezones detected on all the
  machines were the same.
 
  -Original Message-
  From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-
  Daniel Cryans
  Sent: Wednesday, July 14, 2010 3:11 PM
  To: user@hbase.apache.org
  Subject: Re: region servers crashing
 
  Dmitry,
 
  Your log shows this:
 
   2010-07-12 15:10:03,299 WARN org.apache.hadoop.hbase.util.Sleeper: We
  slept
   86246ms, ten times longer than scheduled: 1000
 
  This is a pause that lasted more than a minute, the process was in
  that state (GC, swapping, mix of all of them) for some reason and it
  was long enough to expire the ZooKeeper session (since from its point
  of view the region server stopped responding).
 
  The NPE is just a side-effect, it is caused by the huge pause.
 
  It's well worth upgrading, but it won't solve your pausing issues. I
  can only recommend closer monitoring, setting swappiness to 0 and
  giving more memory to HBase (if available).
 
  J-D
 
  On Wed, Jul 14, 2010 at 3:03 PM, Dmitry Chechik dmi...@tellapart.com
  wrote:
   Hi all,
   We've been having issues for a few days with HBase region servers
  crashing
   when under load from mapreduce jobs.
   There are a few different errors in the region server logs - I've
  attached a
   sample log of 4 different region servers crashing within an hour of
  each
   other.
   Some details:
   - This happens when a full table scan from a mapreduce is in
  progress.
   - We are running HBase 0.20.3, with a 16-slave cluster, on EC2.
   - Some of the region server errors are NPEs which look a lot
   like https://issues.apache.org/jira/browse/HBASE-2077. I'm not 

Re: region servers crashing

2010-08-26 Thread Ryan Rawson
Ok I didnt see your logs earlier - normally attachments are filtered
out and we use pastebin for logs.

I am not seeing any large pauses in your gc logs.  Not sure if the log
is complete enough or what...

-ryan

On Thu, Aug 26, 2010 at 12:32 PM, Ryan Rawson ryano...@gmail.com wrote:
 Without gc logs you cannot diagnose what you suspect are gc issues...
 make sure you are logging and then check them out.  If you are running
 a recent JVM you can use -XX:+PrintGCDateStamps and get better log
 entries.

 Also you cannot swap at all, even 1 page of swapping in a java process
 can be killer.  Combined with the hypervisor stealing your CPU you can
 have a lot of elapsed wall time with not very many cpu slices being
 executed.  Consider vmstat and top to diagnose that one issue.

 On the GC issue, the one setting you are using which is initiating
 occupancy fraction is set kind of low. This means you will kick in the
 GC once you hit 50% of your memory usage.  You might consider testing
 with that set to a medium level, say 75% or so.

 -ryan

 On Thu, Aug 26, 2010 at 12:17 PM, Dmitry Chechik dmi...@tellapart.com wrote:
 Hi all,
 We're still seeing these crashes pretty frequently. Attached is the error
 from the regionserver logs as well as a GC dump of the last hour of the
 regionserver:
 2010-08-26 13:34:10,855 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
 157041ms, ten times longer than scheduled: 1
 2010-08-26 13:34:10,925 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
 148602ms, ten times longer than scheduled: 1000
 2010-08-26 13:34:10,925 WARN
 org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to
 master for 148602 milliseconds - retrying
 Since our workload is mostly scans in mapreduce, we've turned off block
 caching as per https://issues.apache.org/jira/browse/HBASE-2252 in case that
 had anything to do with it.
 We've also decreased NewSize and MaxNewSize and decreased
 CMSInitiatingOccupancyFraction, so our GC settings now are:
 -Xmx2000m
   -XX:+UseConcMarkSweepGC
   -XX:CMSInitiatingOccupancyFraction=50



   -XX:NewSize=32m



   -XX:MaxNewSize=32m



   -XX:+DoEscapeAnalysis
   -XX:+AggressiveOpts
   -verbose:gc
   -XX:+PrintGCDetails
   -XX:+PrintGCTimeStamp
 We're running with 2G of RAM.
 Is the solution here only to move to machines with more RAM, or are there
 other GC settings we should look at?
 Thanks,
 - Dmitry
 On Wed, Jul 14, 2010 at 4:39 PM, Dmitry Chechik dmi...@tellapart.com
 wrote:

 We're running with 1GB of heap space.
 Thanks all - we'll look into GC tuning some more.

 On Wed, Jul 14, 2010 at 3:47 PM, Jonathan Gray jg...@facebook.com wrote:

 This doesn't look like a clock skew issue.

 @Dmitry, while you should be running CMS, this is still a garbage
 collector and is still vulnerable to GC pauses.  There are additional
 configuration parameters to tune even more.

 How much heap are you running with on your RSs?  If you are hitting your
 servers with lots of load you should run with 4GB or more.

 Also, having ZK on the same servers as RS/DN is going to create problems
 if you're already hitting your IO limits.

 JG

  -Original Message-
  From: Arun Ramakrishnan [mailto:aramakrish...@languageweaver.com]
  Sent: Wednesday, July 14, 2010 3:33 PM
  To: user@hbase.apache.org
  Subject: RE: region servers crashing
 
  Had a problem that caused issues that looked like this.
 
   2010-07-12 15:10:03,299 WARN org.apache.hadoop.hbase.util.Sleeper: We
  slept
   86246ms, ten times longer than scheduled: 1000
 
  Our problem was with clock skew. We just had to make sure ntp was
  running on all machines and also the timezones detected on all the
  machines were the same.
 
  -Original Message-
  From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-
  Daniel Cryans
  Sent: Wednesday, July 14, 2010 3:11 PM
  To: user@hbase.apache.org
  Subject: Re: region servers crashing
 
  Dmitry,
 
  Your log shows this:
 
   2010-07-12 15:10:03,299 WARN org.apache.hadoop.hbase.util.Sleeper: We
  slept
   86246ms, ten times longer than scheduled: 1000
 
  This is a pause that lasted more than a minute, the process was in
  that state (GC, swapping, mix of all of them) for some reason and it
  was long enough to expire the ZooKeeper session (since from its point
  of view the region server stopped responding).
 
  The NPE is just a side-effect, it is caused by the huge pause.
 
  It's well worth upgrading, but it won't solve your pausing issues. I
  can only recommend closer monitoring, setting swappiness to 0 and
  giving more memory to HBase (if available).
 
  J-D
 
  On Wed, Jul 14, 2010 at 3:03 PM, Dmitry Chechik dmi...@tellapart.com
  wrote:
   Hi all,
   We've been having issues for a few days with HBase region servers
  crashing
   when under load from mapreduce jobs.
   There are a few different errors in the region server logs - I've
  attached a
   sample log of 4 different region servers crashing within an hour of
  each
 

Getting data from Hbase from client/remote computer

2010-08-26 Thread Shuja Rehman
Hi All

I am new to hbase client API and want to know how to get data from hbase
from cleint/remote machine. The target is to develop a java program which
should connect to hbase server and then get results from it.

Anyone have any example???

Thanks

-- 
Regards
Shuja-ur-Rehman Baig
http://pk.linkedin.com/in/shujamughal
Cell: +92 3214207445


Re: Region splits in 0.89...

2010-08-26 Thread Todd Lipcon
Hey Vidhya,

Anything interesting in the logs on the master?

It's possible that the master is slowly assigning out all the daughter
regions, and it's just taking a really long time since you loaded so many.

-Todd

On Thu, Aug 26, 2010 at 11:59 AM, Vidhyashankar Venkataraman 
vidhy...@yahoo-inc.com wrote:

 My hbase table issued a mass split after I loaded regions with greater
 sizes than maxfilesize.. (my bad..)

 Now, when I try accessing the master through the web interface, it just
 hangs...

 And, if I scan the META, I get the parent regions set to offline.. And the
 child regions have random byte keys (that was part of the HBASE 2515).. But
 one thing that troubles me is that even the split child regions don't seem
 to have been assigned to any server according to META:  I am not sure if
 this is expected..


  DocDB,03617973,12column=info:regioninfo,
 timestamp=1282810230593, value=REGION = {NAME =
  82331439883.6a66354e2c587
 'DocDB,03617973,1282331439883.6a66354e2c58779278942a8e5cf152ca.'
  79278942a8e5cf152ca., STARTKEY = '03617973',
 ENDKEY = '03751972', ENCODED =
  
 6a66354e2c58779278942a8e5cf152ca, OFFLINE = true, SPLIT = true, TABLE
   = {{NAME = 'DocDB', MAX_FILESIZE =
 '4294967296', FAMILIES = [{NAME =
'bigColumn', BLOOMFILTER = 'NONE',
 REPLICATION_SCOPE = '0', VERSIONS
  = '1', COMPRESSION = 'NONE', TTL =
 '2147483647', BLOCKSIZE = '1048576
  ', IN_MEMORY = 'false', BLOCKCACHE =
 'false'}]}}
  DocDB,03617973,12column=info:server,
 timestamp=1282810230593, value=
  82331439883.6a66354e2c587
  79278942a8e5cf152ca.
  DocDB,03617973,12  column=info:serverstartcode,
 timestamp=1282810230593, value=
  82331439883.6a66354e2c587
  79278942a8e5cf152ca.
  DocDB,03617973,12column=info:splitA,
 timestamp=1282810230593, value=\x00\x10036849
  82331439883.6a66354e2c587
  63\x00\x00\x00\x01*\xADr\xB4\x82FDocDB,03617973,1282810229890.57f
  79278942a8e5cf152ca.
  2eed2d43cece270244a39233f5a5a.\x00\x1003617973\x00\x00\x00\x05\x0

  5DocDB\x00\x00\x00\x00\x00\x03\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05fals

  e\x00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x0CMAX_FILESIZE

 \x00\x00\x00\x0A4294967296\x00\x00\x00\x01\x08\x09bigColumn\x00\x00\x00\x

 08\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x04NONE\x00\x00\x00\x11REPLICAT

 ION_SCOPE\x00\x00\x00\x010\x00\x00\x00\x0BCOMPRESSION\x00\x00\x00\x04NONE

 \x00\x00\x00\x08VERSIONS\x00\x00\x00\x011\x00\x00\x00\x03TTL\x00\x00\x00\

 x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x071048576\x00\x00\x00

 \x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\
   x05false0\xF1\xD4\xBB
  DocDB,03617973,12column=info:splitA_checked,
 timestamp=1282815236171, value=\x01
  82331439883.6a66354e2c587
  79278942a8e5cf152ca.
  DocDB,03617973,12   column=info:splitB,
 timestamp=1282810230593, value=\x00\x10037519
  82331439883.6a66354e2c587
  72\x00\x00\x00\x01*\xADr\xB4\x82FDocDB,03684963,1282810229890.5fe
  79278942a8e5cf152ca.
  1f7c4de1640642fb8454a1b8c623c.\x00\x1003684963\x00\x00\x00\x05\x0

 5DocDB\x00\x00\x00\x00\x00\x03\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05fals

  e\x00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x0CMAX_FILESIZE

 \x00\x00\x00\x0A4294967296\x00\x00\x00\x01\x08\x09bigColumn\x00\x00\x00\x

 08\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x04NONE\x00\x00\x00\x11REPLICAT

 ION_SCOPE\x00\x00\x00\x010\x00\x00\x00\x0BCOMPRESSION\x00\x00\x00\x04NONE

 \x00\x00\x00\x08VERSIONS\x00\x00\x00\x011\x00\x00\x00\x03TTL\x00\x00\x00\

 x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x071048576\x00\x00\x00

 \x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\
   x05false\xB54N\x1E
  DocDB,03617973,12  column=info:splitB_checked,
 timestamp=1282815236219, value=\x01
  82331439883.6a66354e2c587
  79278942a8e5cf152ca.




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Region splits in 0.89...

2010-08-26 Thread Vidhyashankar Venkataraman
 It's possible that the master is slowly assigning out all the daughter
 regions, and it's just taking a really long time since you loaded so many.
I forgot to add it the last time.. But I think that's what is happening.. 
Master getting choked up because of a flash crowd of splits..
(Anyways, I just wanted to verify if the META table scan was a possible 
output)...

I will reload the data with some altered configs..

Thank you
Vidhya

On 8/26/10 1:16 PM, Todd Lipcon t...@cloudera.com wrote:

Hey Vidhya,

Anything interesting in the logs on the master?

It's possible that the master is slowly assigning out all the daughter
regions, and it's just taking a really long time since you loaded so many.

-Todd

On Thu, Aug 26, 2010 at 11:59 AM, Vidhyashankar Venkataraman 
vidhy...@yahoo-inc.com wrote:

 My hbase table issued a mass split after I loaded regions with greater
 sizes than maxfilesize.. (my bad..)

 Now, when I try accessing the master through the web interface, it just
 hangs...

 And, if I scan the META, I get the parent regions set to offline.. And the
 child regions have random byte keys (that was part of the HBASE 2515).. But
 one thing that troubles me is that even the split child regions don't seem
 to have been assigned to any server according to META:  I am not sure if
 this is expected..


  DocDB,03617973,12column=info:regioninfo,
 timestamp=1282810230593, value=REGION = {NAME =
  82331439883.6a66354e2c587
 'DocDB,03617973,1282331439883.6a66354e2c58779278942a8e5cf152ca.'
  79278942a8e5cf152ca., STARTKEY = '03617973',
 ENDKEY = '03751972', ENCODED =
  
 6a66354e2c58779278942a8e5cf152ca, OFFLINE = true, SPLIT = true, TABLE
   = {{NAME = 'DocDB', MAX_FILESIZE =
 '4294967296', FAMILIES = [{NAME =
'bigColumn', BLOOMFILTER = 'NONE',
 REPLICATION_SCOPE = '0', VERSIONS
  = '1', COMPRESSION = 'NONE', TTL =
 '2147483647', BLOCKSIZE = '1048576
  ', IN_MEMORY = 'false', BLOCKCACHE =
 'false'}]}}
  DocDB,03617973,12column=info:server,
 timestamp=1282810230593, value=
  82331439883.6a66354e2c587
  79278942a8e5cf152ca.
  DocDB,03617973,12  column=info:serverstartcode,
 timestamp=1282810230593, value=
  82331439883.6a66354e2c587
  79278942a8e5cf152ca.
  DocDB,03617973,12column=info:splitA,
 timestamp=1282810230593, value=\x00\x10036849
  82331439883.6a66354e2c587
  63\x00\x00\x00\x01*\xADr\xB4\x82FDocDB,03617973,1282810229890.57f
  79278942a8e5cf152ca.
  2eed2d43cece270244a39233f5a5a.\x00\x1003617973\x00\x00\x00\x05\x0

  5DocDB\x00\x00\x00\x00\x00\x03\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05fals

  e\x00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x0CMAX_FILESIZE

 \x00\x00\x00\x0A4294967296\x00\x00\x00\x01\x08\x09bigColumn\x00\x00\x00\x

 08\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x04NONE\x00\x00\x00\x11REPLICAT

 ION_SCOPE\x00\x00\x00\x010\x00\x00\x00\x0BCOMPRESSION\x00\x00\x00\x04NONE

 \x00\x00\x00\x08VERSIONS\x00\x00\x00\x011\x00\x00\x00\x03TTL\x00\x00\x00\

 x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x071048576\x00\x00\x00

 \x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\
   x05false0\xF1\xD4\xBB
  DocDB,03617973,12column=info:splitA_checked,
 timestamp=1282815236171, value=\x01
  82331439883.6a66354e2c587
  79278942a8e5cf152ca.
  DocDB,03617973,12   column=info:splitB,
 timestamp=1282810230593, value=\x00\x10037519
  82331439883.6a66354e2c587
  72\x00\x00\x00\x01*\xADr\xB4\x82FDocDB,03684963,1282810229890.5fe
  79278942a8e5cf152ca.
  1f7c4de1640642fb8454a1b8c623c.\x00\x1003684963\x00\x00\x00\x05\x0

 5DocDB\x00\x00\x00\x00\x00\x03\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05fals

  e\x00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x0CMAX_FILESIZE

 \x00\x00\x00\x0A4294967296\x00\x00\x00\x01\x08\x09bigColumn\x00\x00\x00\x

 08\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x04NONE\x00\x00\x00\x11REPLICAT

 ION_SCOPE\x00\x00\x00\x010\x00\x00\x00\x0BCOMPRESSION\x00\x00\x00\x04NONE

 \x00\x00\x00\x08VERSIONS\x00\x00\x00\x011\x00\x00\x00\x03TTL\x00\x00\x00\

 x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x071048576\x00\x00\x00

 \x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\
   x05false\xB54N\x1E
  DocDB,03617973,12  column=info:splitB_checked,
 timestamp=1282815236219, value=\x01
  82331439883.6a66354e2c587
  79278942a8e5cf152ca.




--
Todd Lipcon
Software Engineer, Cloudera



Re: Getting data from Hbase from client/remote computer

2010-08-26 Thread Jean-Daniel Cryans
Check the documentation:

http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/package-summary.html#overview

J-D

On Thu, Aug 26, 2010 at 12:41 PM, Shuja Rehman shujamug...@gmail.com wrote:
 Hi All

 I am new to hbase client API and want to know how to get data from hbase
 from cleint/remote machine. The target is to develop a java program which
 should connect to hbase server and then get results from it.

 Anyone have any example???

 Thanks

 --
 Regards
 Shuja-ur-Rehman Baig
 http://pk.linkedin.com/in/shujamughal
 Cell: +92 3214207445



jobtracker.jsp

2010-08-26 Thread Venkatesh

 

 I'm running map/reduce jobs from java app (table mapper  reducer) in true 
distributed
mode..I don't see anything in jobtracker page..Map/reduce job runs fine..Am I 
missing some config?

thanks
venkatesh




Re: jobtracker.jsp

2010-08-26 Thread Jeff Zhang
So what's the log in your client side ?


On Thu, Aug 26, 2010 at 6:23 PM, Venkatesh vramanatha...@aol.com wrote:



  I'm running map/reduce jobs from java app (table mapper  reducer) in true 
 distributed
 mode..I don't see anything in jobtracker page..Map/reduce job runs fine..Am I 
 missing some config?

 thanks
 venkatesh






-- 
Best Regards

Jeff Zhang


RE: Getting data from Hbase from client/remote computer

2010-08-26 Thread xiujin yang

Another way, 

By Rest

http://wiki.apache.org/hadoop/Hbase/Stargate

Xiujin Yang

 Date: Thu, 26 Aug 2010 13:33:00 -0700
 Subject: Re: Getting data from Hbase from client/remote computer
 From: jdcry...@apache.org
 To: user@hbase.apache.org
 
 Check the documentation:
 
 http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/package-summary.html#overview
 
 J-D
 
 On Thu, Aug 26, 2010 at 12:41 PM, Shuja Rehman shujamug...@gmail.com wrote:
  Hi All
 
  I am new to hbase client API and want to know how to get data from hbase
  from cleint/remote machine. The target is to develop a java program which
  should connect to hbase server and then get results from it.
 
  Anyone have any example???
 
  Thanks
 
  --
  Regards
  Shuja-ur-Rehman Baig
  http://pk.linkedin.com/in/shujamughal
  Cell: +92 3214207445
 
  

RE: jobtracker.jsp

2010-08-26 Thread xiujin yang

Hi

When I run Hbase performance, I met the same problem. 
When job are run on local, it don't show up on job list. 

Best

Xiujin Yang. 

 To: user@hbase.apache.org
 Subject: Re: jobtracker.jsp
 Date: Thu, 26 Aug 2010 22:30:09 -0400
 From: vramanatha...@aol.com
 
 
  yeah..log says it's running Locally..i've to figure out why..
 
 2010-08-26 08:49:01,491 INFO Thread-16 org.apache.hadoop.mapred.MapTask - 
 Starting flush of map output
 2010-08-26 08:49:01,578 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - 
 Task:attempt_local_0001_m_00_0 is done. And is in the process of commiting
 2010-08-26 08:49:01,586 INFO Thread-16 
 org.apache.hadoop.mapred.LocalJobRunner -
 2010-08-26 08:49:01,587 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - 
 Task 'attempt_local_0001_m_00_0' done.
 2010-08-26 08:49:01,613 INFO Thread-16 
 org.apache.hadoop.mapred.LocalJobRunner -
 2010-08-26 08:49:01,630 INFO Thread-16 org.apache.hadoop.mapred.Merger - 
 Merging 1 sorted segments
 2010-08-26 08:49:01,640 INFO Thread-16 org.apache.hadoop.mapred.Merger - Down 
 to the last merge-pass, with 0 segments left of total size: 0 bytes
 2010-08-26 08:49:01,640 INFO Thread-16 
 org.apache.hadoop.mapred.LocalJobRunner -
 2010-08-26 08:49:01,658 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - 
 Task:attempt_local_0001_r_00_0 is done. And is in the process of commiting
 2010-08-26 08:49:01,659 INFO Thread-16 
 org.apache.hadoop.mapred.LocalJobRunner - reduce  reduce
 2010-08-26 08:49:01,660 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - 
 Task 'attempt_local_0001_r_00_0' done.
 
  
 
 
  
 
  
 
 -Original Message-
 From: Jeff Zhang zjf...@gmail.com
 To: user@hbase.apache.org
 Sent: Thu, Aug 26, 2010 9:42 pm
 Subject: Re: jobtracker.jsp
 
 
 So what's the log in your client side ?
 
 
 On Thu, Aug 26, 2010 at 6:23 PM, Venkatesh vramanatha...@aol.com wrote:
 
 
 
   I'm running map/reduce jobs from java app (table mapper  reducer) in true 
 distributed
  mode..I don't see anything in jobtracker page..Map/reduce job runs fine..Am 
  I 
 missing some config?
 
  thanks
  venkatesh
 
 
 
 
 
 
 -- 
 Best Regards
 
 Jeff Zhang
 
  
  

Re: Out Of Memory on region servers upon bulk import

2010-08-26 Thread Todd Lipcon
Hi Martin,

Can you paste your conf?

Have you by any chance upped your handler count a lot? Each handler takes up
an amount of RAM equal to the largest Puts you do. With normal write buffer
sizes, you're looking at around 2MB per handler, so while it sounds nice to
bump the handler count up to a really high number, you can get OOMEs like
you're seeing.

Thanks
-Todd

On Thu, Aug 26, 2010 at 9:31 AM, Martin Arnandze marnan...@gmail.comwrote:

 I provide the answers below.
 Thanks!
  Martin

 On Aug 26, 2010, at 11:45 AM, Stack wrote:

  On Thu, Aug 26, 2010 at 8:07 AM, Martin Arnandze marnan...@gmail.com
 wrote:
  Hi,
   I'm doing an experiment on an 8 node cluster, each of which has 6GB of
 RAM allocated to hbase region server. Basically, doing a bulk import
 processing large files,
 
 
  How large?

 about 10 million records each a few Kb.

 
  Unless very large, it should not be OOMEing.
 
  but some imports require to do gets and scans as well. In the master
  UI I see that the heap used gets very close to the 6GB limit, but I
  know hbase is eager for memory and will use the heap as much as
  possible.I use block caching. Looking at similar posts I see that
  modifying the handler count and memory store upper/ower limits may be
  key to solving this issue. Nevertheless I wanted to ask if there is a
  way to estimate the extra memory used by hbase that makes it crash and
  if there are other configuration settings I should be looking into to
  prevent OOME. The job runs correctly for some time but region servers
  eventually crash.
 
  More information about the cluster:
 
  - All nodes have 16GM total memory.
  - 7 nodes running region server (6GB) +  datanodes (1GB) + task trackers
 (1GB Heap).  Map reduce jobs running w/ 756MB tops each.
 
  Good.  How many MR child tasks can run on each node concurrently?

 three mappers and two reducers

 
  - 1 node running hbase master (2GB Heap allocated), namenode (4GB),
 Secondary Namenode (4GB), JobTracker (4GB) and Master (2GB).
  - 3 of the nodes have zookeeper running with 512MB Heap
 
  Many thanks,
Martin
 
 
 
  Can we see the lines before the below is thrown?   Also, do a listing
  (ls -r) on this region in hdfs and lets see if anything pops out about
  files sizes, etc.  You'll need to manually map the below region name
  to its encoded name to figure the region but the encoded name should
  be earlier in the log.  You'll do something like:
 
  bin/hbase fs -lsr /hbase/table_import/REGION_ENCODED_NAME

 /usr/lib/hadoop-0.20/bin/hadoop fs -lsr /hbase/table_import/1698505444
 -rw-r--r--   3 hadoop supergroup   1450 2010-08-25 23:17
 /hbase/table_import/1698505444/.regioninfo
 drwxr-xr-x   - hadoop supergroup  0 2010-08-26 11:07
 /hbase/table_importl/1698505444/fam
 -rw-r--r--   3 hadoop supergroup4244491 2010-08-26 11:07
 /hbase/table_import/1698505444/fam/5785049964186428982
 -rw-r--r--   3 hadoop supergroup  180147216 2010-08-26 06:09
 /hbase/table_import/1698505444/fam/705757673046090229


 Previous log:

 010-08-26 07:18:14,691 INFO org.apache.hadoop.hbase.regionserver.HRegion:
 region
 table_import,f1cbb42c-b6ae-404d-800c-043da5409441-9223370754623831807WkmpwnRDmveKYzWEfw/tb4GpP9yHDl+/G7OCaZWEgrmGcW+XEF131YDTQwDqZsO93tDicdPcOdRq\x0AU7zDBqoxpA==,1282790086498/1451783432
 available; sequence id is 1518500874
 2010-08-26 07:18:14,691 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
 table_import,d1e50232ac85a0a965e48647de5dc6ce-92233707546248658079F073MJ/gGEEs6mwkLsY/lLH+QvHGVBhBavAz0HSPEEKY+NrjTTzHUJdPtuJ0lXqz2i2Qs2DmFkz\x0A5P2broA7Gg==,128278399
 2010-08-26 07:18:14,691 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
 Creating region
 table_import,d1e50232ac85a0a965e48647de5dc6ce-92233707546248658079F073MJ/gGEEs6mwkLsY/lLH+QvHGVBhBavAz0HSPEEKY+NrjTTzHUJdPtuJ0lXqz2i2Qs2DmFkz\x0A5P2broA7Gg==,128278399,
 encoded=1510556231
 2010-08-26 07:18:21,085 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
 Total=958.38617MB (1004940736), Free=238.2MB (249864000), Max=1196.675MB
 (1254804736), Counts: Blocks=115717, Access=51364517, Hit=231796,
 Miss=51132721, Evictions=15, Evicted=218920, Ratios: Hit
 Ratio=0.45127649791538715%, Miss Ratio=99.54872131347656%,
 Evicted/Run=14594.6669921875
 2010-08-26 07:18:27,659 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 loaded /hbase/table_import/1510556231/fam/2639910770219077750,
 isReference=false, sequence id=1518500860, length=200014693,
 majorCompaction=false
 2010-08-26 07:18:35,188 INFO org.apache.hadoop.hbase.regionserver.HRegion:
 region
 table_import,d1e50232ac85a0a965e48647de5dc6ce-92233707546248658079F073MJ/gGEEs6mwkLsY/lLH+QvHGVBhBavAz0HSPEEKY+NrjTTzHUJdPtuJ0lXqz2i2Qs2DmFkz\x0A5P2broA7Gg==,128278399/1510556231
 available; sequence id is 1518500861
 2010-08-26 07:18:35,188 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
 

Re: region servers crashing

2010-08-26 Thread Stack
Looking in RS log I see this:

#
2010-08-26 13:34:11,056 WARN
org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 36 on
60020 took 148265ms appending an edit to hlog; editcount=2990
#
2010-08-26 13:34:11,056 WARN
org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 86 on
60020 took 148265ms appending an edit to hlog; editcount=2991
#


That it took near 3 minutes appending the log is probably because we
were under a stop-the-world GC pause.

 Regarding CMSInitiatingOccupancyFraction, we used to have it set to 88% and
 we thought that by setting it lower we'd kick off more frequent (but
 smaller) GC collections, and reduce the chance of any one of them pausing.


Isn't 88% almost the default?  Come down more I'd say.  I see earlier
you ran with 50% CMSInitiatingOccupancyFraction?  That didn't work?
Starting earlier, you'll pay in CPU but should help.  Set
UseCMSInitiatingOccupancyOnly too so the GC will consider your
CMSInitiatingOccupancyFraction setting only before it starts the CMS
full GC [1].

How many cores do you have?  If = 4, you should add -XX:+CMSIncrementalMode.

Update your JVM.  u21 has some fixes to help with heap fragmentation (
 -XX:+DoEscapeAnalysis  probably won't work on u21, IIRC).

I just noticed you are running on EC2.  Use bigger nodes?

You are also running old hbase which had a particular way of scanning
in a manner that was RAM expensive.  Update to 0.20.6 hbase.

Doing this will have no effect on the zk lease, the thing that is
responsible for servers going down:

 We tried increasing hbase.regionserver.lease.period to 2 minutes but that 
 didn't seem to make a difference here.

Up the ticktime, this setting:

  property
namehbase.zookeeper.property.tickTime/name
value3000/value
descriptionProperty from ZooKeeper's config zoo.cfg.
The number of milliseconds of each tick.  See
zookeeper.session.timeout description.
/description
  /property

Also up the below (see description for relation between below and the
above ticktime).

  property
namezookeeper.session.timeout/name
value6/value
descriptionZooKeeper session timeout.
  HBase passes this to the zk quorum as suggested maximum time for a
  session.  See
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
  The client sends a requested timeout, the server responds with the
  timeout that it can give the client. The current implementation
  requires that the timeout be a minimum of 2 times the tickTime
  (as set in the server configuration) and a maximum of 20 times
  the tickTime. Set the zk ticktime with hbase.zookeeper.property.tickTime.
  In milliseconds.
/description
  /property

St.Ack

1. http://markmail.org/thread/e43gybkrcecg5rxo



 So, if this isn't a GC issue, is there anything else it could be, based on
 the logs?

 Thanks,

 - Dmitry

 On Thu, Aug 26, 2010 at 12:37 PM, Ryan Rawson ryano...@gmail.com wrote:

 Ok I didnt see your logs earlier - normally attachments are filtered
 out and we use pastebin for logs.

 I am not seeing any large pauses in your gc logs.  Not sure if the log
 is complete enough or what...

 -ryan

 On Thu, Aug 26, 2010 at 12:32 PM, Ryan Rawson ryano...@gmail.com wrote:
  Without gc logs you cannot diagnose what you suspect are gc issues...
  make sure you are logging and then check them out.  If you are running
  a recent JVM you can use -XX:+PrintGCDateStamps and get better log
  entries.
 
  Also you cannot swap at all, even 1 page of swapping in a java process
  can be killer.  Combined with the hypervisor stealing your CPU you can
  have a lot of elapsed wall time with not very many cpu slices being
  executed.  Consider vmstat and top to diagnose that one issue.
 
  On the GC issue, the one setting you are using which is initiating
  occupancy fraction is set kind of low. This means you will kick in the
  GC once you hit 50% of your memory usage.  You might consider testing
  with that set to a medium level, say 75% or so.
 
  -ryan
 
  On Thu, Aug 26, 2010 at 12:17 PM, Dmitry Chechik dmi...@tellapart.com
 wrote:
  Hi all,
  We're still seeing these crashes pretty frequently. Attached is the
 error
  from the regionserver logs as well as a GC dump of the last hour of the
  regionserver:
  2010-08-26 13:34:10,855 WARN org.apache.hadoop.hbase.util.Sleeper: We
 slept
  157041ms, ten times longer than scheduled: 1
  2010-08-26 13:34:10,925 WARN org.apache.hadoop.hbase.util.Sleeper: We
 slept
  148602ms, ten times longer than scheduled: 1000
  2010-08-26 13:34:10,925 WARN
  org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to
  master for 148602 milliseconds - retrying
  Since our workload is mostly scans in mapreduce, we've turned off block
  caching as per https://issues.apache.org/jira/browse/HBASE-2252 in case
 that
  had anything to do with it.
  We've also decreased NewSize and MaxNewSize and decreased
  CMSInitiatingOccupancyFraction, 

Re: jobtracker.jsp

2010-08-26 Thread Venkatesh

 Thanks J-D

I figured I did n't have mapred-site.xml in my WEB-INF/classes directory 
(classpth)
I copied that from the cluster ..that fixed part of it..Now i don't have 
zookeper in hadoop-env.sh:HADOOP_CLASSPATH
I distinctly looked at this link a while ago.. it did n't have zookeper listed 
.(i've everything else i.e hbase-*.)
perhaps i had a old link

can all the config in mapred-site.xml be added to hbase-site.xml?..It kind of 
works with them being separate..
just wondering..

Have one more question..I also have trouble stopping 
namenode/datanode/jobtracker..to make this classpath effective
Is there a force shutdown option? (other than kill -9)..?


 venkatesh


 

 

-Original Message-
From: Jean-Daniel Cryans jdcry...@apache.org
To: user@hbase.apache.org
Sent: Fri, Aug 27, 2010 12:10 am
Subject: Re: jobtracker.jsp


HBase needs to know about the job tracker, it could be on the same
machine or distant, and that's taken care by giving HBase mapred's
configurations. Here's the relevant documentation :
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

J-D

2010/8/26 xiujin yang xiujiny...@hotmail.com:

 Hi

 When I run Hbase performance, I met the same problem.
 When job are run on local, it don't show up on job list.

 Best

 Xiujin Yang.

 To: user@hbase.apache.org
 Subject: Re: jobtracker.jsp
 Date: Thu, 26 Aug 2010 22:30:09 -0400
 From: vramanatha...@aol.com


  yeah..log says it's running Locally..i've to figure out why..

 2010-08-26 08:49:01,491 INFO Thread-16 org.apache.hadoop.mapred.MapTask - 
Starting flush of map output
 2010-08-26 08:49:01,578 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - 
Task:attempt_local_0001_m_00_0 is done. And is in the process of commiting
 2010-08-26 08:49:01,586 INFO Thread-16 
 org.apache.hadoop.mapred.LocalJobRunner 
-
 2010-08-26 08:49:01,587 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - 
Task 'attempt_local_0001_m_00_0' done.
 2010-08-26 08:49:01,613 INFO Thread-16 
 org.apache.hadoop.mapred.LocalJobRunner 
-
 2010-08-26 08:49:01,630 INFO Thread-16 org.apache.hadoop.mapred.Merger - 
Merging 1 sorted segments
 2010-08-26 08:49:01,640 INFO Thread-16 org.apache.hadoop.mapred.Merger - 
 Down 
to the last merge-pass, with 0 segments left of total size: 0 bytes
 2010-08-26 08:49:01,640 INFO Thread-16 
 org.apache.hadoop.mapred.LocalJobRunner 
-
 2010-08-26 08:49:01,658 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - 
Task:attempt_local_0001_r_00_0 is done. And is in the process of commiting
 2010-08-26 08:49:01,659 INFO Thread-16 
 org.apache.hadoop.mapred.LocalJobRunner 
- reduce  reduce
 2010-08-26 08:49:01,660 INFO Thread-16 org.apache.hadoop.mapred.TaskRunner - 
Task 'attempt_local_0001_r_00_0' done.








 -Original Message-
 From: Jeff Zhang zjf...@gmail.com
 To: user@hbase.apache.org
 Sent: Thu, Aug 26, 2010 9:42 pm
 Subject: Re: jobtracker.jsp


 So what's the log in your client side ?


 On Thu, Aug 26, 2010 at 6:23 PM, Venkatesh vramanatha...@aol.com wrote:
 
 
 
   I'm running map/reduce jobs from java app (table mapper  reducer) in true
 distributed
  mode..I don't see anything in jobtracker page..Map/reduce job runs 
  fine..Am 
I
 missing some config?
 
  thanks
  venkatesh
 
 
 



 --
 Best Regards

 Jeff Zhang