Re: question about RegionManager

2010-09-07 Thread Tao Xie
But when I directly load data into HDFS using HDFS API, the disks are
balanced.
I use hadoop-0.20.2.

2010/9/7 Todd Lipcon t...@cloudera.com

 On Mon, Sep 6, 2010 at 9:08 PM, Jonathan Gray jg...@facebook.com wrote:

  You're looking at sizes on disk?  Then this has nothing to do with HBase
  load balancing.
 
  HBase does not move blocks around on the HDFS layer or deal with which
  physical disks are used, that is completely the responsibility of HDFS.
 
  Periodically HBase will perform major compactions on regions which causes
  data to be rewritten.  This creates new files so could change what is in
  HDFS.
 

 There are some bugs in HDFS in 0.20 which can create this out-of-balance
 scenario.

 If you use CDH3b2 you should have a few patches which help to rectify the
 situation, in particular HDFS-611.

 Thanks
 -Todd


 
  JG
 
   -Original Message-
   From: Tao Xie [mailto:xietao.mail...@gmail.com]
   Sent: Monday, September 06, 2010 8:38 PM
   To: user@hbase.apache.org
   Subject: Re: question about RegionManager
  
   Actually, I'm a newbie of HBase. I went to read the code of assigning
   region
   because I met a load imbalance problem in my hbase cluster. I run 1+6
   nodes
   hbase cluster, 1 node as master  client, the other nodes as region
   server
   and data nodes. I run YCSB to insert records. In the inserting time, I
   find
   the data written to data nodes have different data size on disks.  I
   think
   HDFS is doing well in balancing write. So is this problem due to HBase?
  
   Btw, after finished writing for minutes, the disks get balanced
   finally. I
   think maybe there is a LoadBalance like deamon thread working on this.
   Can
   anyone explain this? Many thanks.
  
   After inserting 160M 1k records, my six datanodes are greatly
   imbalanced.
  
   10.1.0.125: /dev/sdb1 280G   89G  178G  34% /mnt/DP_disk1
  
   10.1.0.125: /dev/sdc1 280G   91G  176G  35% /mnt/DP_disk2
  
   10.1.0.125: /dev/sdd1 280G   91G  176G  34% /mnt/DP_disk3
  
   10.1.0.121: /dev/sdb1 280G   15G  251G   6% /mnt/DP_disk1
  
   10.1.0.121: /dev/sdc1 280G   16G  250G   6% /mnt/DP_disk2
  
   10.1.0.121: /dev/sdd1 280G   15G  251G   6% /mnt/DP_disk3
  
   10.1.0.122: /dev/sdb1 280G   15G  251G   6% /mnt/DP_disk1
  
   10.1.0.122: /dev/sdc1 280G   15G  252G   6% /mnt/DP_disk2
  
   10.1.0.122: /dev/sdd1 280G   13G  253G   5% /mnt/DP_disk3
  
   10.1.0.124: /dev/sdb1 280G   14G  253G   5% /mnt/DP_disk1
  
   10.1.0.124: /dev/sdc1 280G   15G  252G   6% /mnt/DP_disk2
  
   10.1.0.124: /dev/sdd1 280G   14G  253G   6% /mnt/DP_disk3
  
   10.1.0.123: /dev/sdb1 280G   66G  200G  25% /mnt/DP_disk1
  
   10.1.0.123: /dev/sdc1 280G   65G  201G  25% /mnt/DP_disk2
  
   10.1.0.123: /dev/sdd1 280G   65G  202G  25% /mnt/DP_disk3
  
   10.1.0.126: /dev/sdb1 280G   14G  252G   6% /mnt/DP_disk1
  
   10.1.0.126: /dev/sdc1 280G   14G  252G   6% /mnt/DP_disk2
  
   10.1.0.126: /dev/sdd1 280G   13G  253G   5% /mnt/DP_disk3
  
   2010/9/7 Tao Xie xietao.mail...@gmail.com
  
I have a look at the following method in 0.89. Is the the following
   line
correct ?
   
nRegions *= e.getValue().size();
   
   
private int regionsToGiveOtherServers(final int numUnassignedRegions,
final HServerLoad thisServersLoad) {
SortedMapHServerLoad, SetString lightServers =
  new TreeMapHServerLoad, SetString();
this.master.getLightServers(thisServersLoad, lightServers);
// Examine the list of servers that are more lightly loaded than
   this
one.
// Pretend that we will assign regions to these more lightly
   loaded
servers
// until they reach load equal with ours. Then, see how many
   regions
are left
// unassigned. That is how many regions we should assign to this
server.
int nRegions = 0;
for (Map.EntryHServerLoad, SetString e:
   lightServers.entrySet()) {
  HServerLoad lightLoad = new HServerLoad(e.getKey());
  do {
lightLoad.setNumberOfRegions(lightLoad.getNumberOfRegions() +
   1);
nRegions += 1;
  } while (lightLoad.compareTo(thisServersLoad) = 0
   nRegions  numUnassignedRegions);
  nRegions *= e.getValue().size();
  if (nRegions = numUnassignedRegions) {
break;
  }
}
return nRegions;
  }
   
   
   
2010/9/7 Jonathan Gray jg...@facebook.com
   
That code does actually exist in the latest 0.89 release.
   
It was a protection put in place to guard against a weird behavior
   that we
had seen during load balancing.
   
As Ryan suggests, this code was in need of a rewrite and was just
committed last week to trunk/0.90.  If you're interested in the new
   load

Re: Limits on HBase

2010-09-07 Thread Himanshu Vashishtha
but yes you will not be having different versions of those objects as they
are not stored as such in a table. So, that's the down side. In case your
objects are write once read multi types, I think it should work.

Let's see what others say :)

~Himanshu


On Tue, Sep 7, 2010 at 12:49 AM, Himanshu Vashishtha vashishth...@gmail.com
 wrote:

 Assuming you will be using hdfs as the file system: wouldn't saving those
 large objects in the fs and keeping a pointer to them in a hbase table serve
 the purpose.

 [I haven't done it myself but I can't see it not working. In fact, I
 remember reading it somewhere in the list.]

 ~Himanshu


 On Mon, Sep 6, 2010 at 11:40 PM, William Kang weliam.cl...@gmail.comwrote:

 Hi JG,
 Thanks for your reply. As far as I have read in Hbase's documentation and
 wiki, the cell size is not supposed to be larger than 10 MB. For the row,
 I
 am not quite sure, but it looks like 256 MB is the upper limit. I am
 considering store some binary data used to be stored in RDBM blob field.
 The
 size of those binary objects may vary from hundreds of KB to hundreds of
 MB.
 What would be a good way to use Hbase for it? We really want to use hbase
 to
 avoid that scaling problem.
 Many thanks.


 William

 On Mon, Sep 6, 2010 at 7:10 PM, Jonathan Gray jg...@facebook.com wrote:

  I'm not sure what you mean by optimized cell size or whether you're
 just
  asking about practical limits?
 
  HBase is generally used with cells in the range of tens of bytes to
  hundreds of kilobytes.  However, I have used it with cells that are
 several
  megabytes, up to about 50MB.  Up at that level, I have seen some weird
  performance issues.
 
  The most important thing is to be sure to tweak all of your settings.
  If
  you have 20MB cells, you need to be sure to increase the flush size
 beyond
  64MB and the split size beyond 256MB.  You also need enough memory to
  support all this large object allocation.
 
  And of course, test test test.  That's the easiest way to see if what
 you
  want to do will work :)
 
  When you run into problems, e-mail the list.
 
  As far as row size is concerned, the only issue is that a row can never
  span multiple regions so a given row can only be in one region and thus
 be
  hosted on one server at a time.
 
  JG
 
   -Original Message-
   From: William Kang [mailto:weliam.cl...@gmail.com]
   Sent: Monday, September 06, 2010 1:57 PM
   To: hbase-user
   Subject: Limits on HBase
  
   Hi folks,
   I know this question may have been asked many times, but I am
 wondering
   if
   there is any update on the optimized cell size (in megabytes) and row
   size
   (in megabytes)? Many thanks.
  
  
   William
 





Client Side buffering vs WAL

2010-09-07 Thread Michael Segel

Hi,

Came across a problem that I need to walk through.

On the client side, when you instantiate an HTable object, you can specify 
HTable.setAutoFlush(true/false).  Setting the boolean value to true means that 
when you execute a put(), the write is not buffered on the client and will be 
written directly to HBase. This overrides the client side buffering that you 
can set in your configuration files.


While for many applications its ok for the app to buffer up its writes, however 
there's a set of apps where you don't want to do this. That is when your app 
writes a record to HBase, you want it exposed ASAP.

On the server side, you have the Write Ahead Log.

If I understand the WAL, it abstracts the actual process of writing to disk so 
that as far as your application is concerned, when you write to the WAL, its in 
HBase.

So, my question is how long does it take for a record in the WAL to be written 
to Disk?

Also if a record is in the WAL, if I did a get() will the record be found?

Its possible that in a m/r job that client side buffering could mean that it 
could take a relatively 'long' time to actually have a record written to HBase, 
where as once the record is written to the WAL, it should be consistent in the 
time it takes to be written to disk for access by other HBase apps.

Or what am I missing?

Thx

-Mike



  

Hbase Backups

2010-09-07 Thread Alexey Kovyrin
Hi guys,

More and more data in our company is moving from mysql tables to hbase
and more and more worried I am about the no backups situation with
that data. I've started looking for possible solutions to backup the
data and found two major options:
1) distcp of /hbase directory somewhere
2) HBASE-1684

So, I have a few questions for hbase users:
1) How do you backup your small (up to a hundred gb) tables?
2) How do you backup your huge (terabytes in size) tables?

And a question for hbase developers: what kind of problems could cause
a distcp from a non-locked hbase table (there is no way to lock table
writes while backing it up AFAIU)? I understand I could lose writes
made after I begin the backup, but if my distcp takes an hour to
complete, I imagine lots of things will happen on the filesystem
during this period of time. Will hbase be able to recover from this
kind of mess?

Thanks a lot for your comments.

-- 
Alexey Kovyrin
http://kovyrin.net/


Re: Client Side buffering vs WAL

2010-09-07 Thread Jean-Daniel Cryans
I think Lars explains it best:
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html

Short version: writing to the WAL is a backup solution if the region
server dies, because it's the MemStore that's being used for reads
(not the WAL). If you autoFlush, then everyone can read the data if
once your put() call returns without errors.

J-D

On Tue, Sep 7, 2010 at 7:44 AM, Michael Segel michael_se...@hotmail.com wrote:

 Hi,

 Came across a problem that I need to walk through.

 On the client side, when you instantiate an HTable object, you can specify 
 HTable.setAutoFlush(true/false).  Setting the boolean value to true means 
 that when you execute a put(), the write is not buffered on the client and 
 will be written directly to HBase. This overrides the client side buffering 
 that you can set in your configuration files.


 While for many applications its ok for the app to buffer up its writes, 
 however there's a set of apps where you don't want to do this. That is when 
 your app writes a record to HBase, you want it exposed ASAP.

 On the server side, you have the Write Ahead Log.

 If I understand the WAL, it abstracts the actual process of writing to disk 
 so that as far as your application is concerned, when you write to the WAL, 
 its in HBase.

 So, my question is how long does it take for a record in the WAL to be 
 written to Disk?

 Also if a record is in the WAL, if I did a get() will the record be found?

 Its possible that in a m/r job that client side buffering could mean that it 
 could take a relatively 'long' time to actually have a record written to 
 HBase, where as once the record is written to the WAL, it should be 
 consistent in the time it takes to be written to disk for access by other 
 HBase apps.

 Or what am I missing?

 Thx

 -Mike






Re: Hbase Backups

2010-09-07 Thread Jean-Daniel Cryans
If you are asking about current solutions, then yes you can distcp
but I would consider that a last resort solution for the reasons you
described (yes, you could end up with an inconsistent state that
requires manual fixing). Also it completely bypasses row locks.

Another choice is using the Export MR job, using the start time option
to do incremental backups. But then you have to distcp the result of
that MR. And it's not a point in time that you are snapshotting,
since it doesn't lock all rows (and you don't really want that hehe).

Since you are on 0.89, you can use cluster replication. This will keep
an almost up-to-date replica on another cluster. Cons are that it
requires another cluster (may be a good thing to have in any case),
and it's still experimental so you could run into issues. See
http://hbase.apache.org/docs/r0.89.20100726/replication.html

In the future there's HBASE-50 that should also be useful.

J-D

On Tue, Sep 7, 2010 at 9:27 AM, Alexey Kovyrin ale...@kovyrin.net wrote:
 Hi guys,

 More and more data in our company is moving from mysql tables to hbase
 and more and more worried I am about the no backups situation with
 that data. I've started looking for possible solutions to backup the
 data and found two major options:
 1) distcp of /hbase directory somewhere
 2) HBASE-1684

 So, I have a few questions for hbase users:
 1) How do you backup your small (up to a hundred gb) tables?
 2) How do you backup your huge (terabytes in size) tables?

 And a question for hbase developers: what kind of problems could cause
 a distcp from a non-locked hbase table (there is no way to lock table
 writes while backing it up AFAIU)? I understand I could lose writes
 made after I begin the backup, but if my distcp takes an hour to
 complete, I imagine lots of things will happen on the filesystem
 during this period of time. Will hbase be able to recover from this
 kind of mess?

 Thanks a lot for your comments.

 --
 Alexey Kovyrin
 http://kovyrin.net/



RE: regionserver skew

2010-09-07 Thread Sharma, Avani
Stack,

I don't think that is my case. I am doing random reads across the namespace and 
the way the table is designed, they should be distributed across region 
servers. As I understand, rows are sorted by the key and we should design the 
table such that we fetch data across regions and I have tried to achieve the 
same. If there is something else you want me to read, please point me to it. I 
have read the Hbase Architecture doc and also the one Lars George has posted

I have one 2G file and other smaller ones on the cluster, but currently I am 
fetching data from this 2G lookup only. 
The number of regions is as follows:
Server1: regions=41, 2G heap , also the hbase master, regionserver, namenode, 
tasktracker, jobtracker, datanode
Server2: regions=36, 4G heap , datanode, tasktracker and regionserver
Server3: regions=37 - this server gets 0 requests or 0 hitRatio, 4G heap , 
datanode, tasktracker and regionserver
Total:114

That link mentioned that some servers have 0 hitRatio and says that is 
acceptable (?) , but that's for inserts- I am not sure if same applies to reads.
http://search-hadoop.com/m/ESeeZ1B082l
How do I confirm where the .META is hosted. Currently, I look the master log 
and check the machine it is hitting for .META table.

My main concern is that before the upgrade to 0.20.6,  .5M rows took 520 
seconds (which you though was slow) on this 3-node cluster and now, after the 
upgrade and whatever other changes hbase/hdfs went through, it takes nearly an 
hour to do the same (with the same data and same rows being fetched). There is 
something really wrong with HDFS/Hbase here.
I need help with diagnosing this. Let me know if you need any logs from me for 
this. I did send some logs last time. Did you get a chance to look at those?

Thanks.

-Original Message-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Monday, September 06, 2010 12:04 PM
To: user@hbase.apache.org
Subject: Re: regionserver skew

On Fri, Sep 3, 2010 at 6:22 PM, Sharma, Avani agsha...@ebay.com wrote:
 I read on the mailing list that the region server that has .META table 
 handles more requests. That sounds okay, but in my case the 3rd regionserver 
 has 0 requests! And I feel that's what slowing down the read performance. 
 Also the hit ratio at the other regionserver is 87% or so. Only the one that 
 hosts .META has 95+% hit ratio.


Are your reads distributed across the whole namespace or are they only
fetching some subset? If a subset, it can be the case that the subset
is totally hosted by a single regionserver and while your test is
running, its only pulling form this single server.  Is that your case?
 (You do understand how rows are distributed on an hbase cluster?)

Also,  how many regions do you have?  You said you have 2G of data
total at one stage.  That likely does not make for many regions.  If
so, it could also be the case that all the server that is not fielding
requests may not be actually carrying data, or little data.  Is this
your case?

St.Ack


RE: Limits on HBase

2010-09-07 Thread Andrew Purtell
In addition to what Jon said please be aware that if compression is specified 
in the table schema, it happens at the store file level -- compression happens 
after write I/O, before read I/O, so if you transmit a 100MB object that 
compresses to 30MB, the performance impact is that of 100MB, not 30MB. 

I also try not to go above 50MB as largest cell size, for the same reason. I 
have tried storing objects larger than 100MB but this can cause out of memory 
issues on busy regionservers no matter the size of the heap. When/if HBase RPC 
can send large objects in smaller chunks, this will be less of an issue. 

Best regards,

- Andy

Why is this email five sentences or less?
http://five.sentenc.es/


--- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote:

 From: Jonathan Gray jg...@facebook.com
 Subject: RE: Limits on HBase
 To: user@hbase.apache.org user@hbase.apache.org
 Date: Monday, September 6, 2010, 4:10 PM
 I'm not sure what you mean by
 optimized cell size or whether you're just asking about
 practical limits?
 
 HBase is generally used with cells in the range of tens of
 bytes to hundreds of kilobytes.  However, I have used
 it with cells that are several megabytes, up to about
 50MB.  Up at that level, I have seen some weird
 performance issues.
 
 The most important thing is to be sure to tweak all of your
 settings.  If you have 20MB cells, you need to be sure
 to increase the flush size beyond 64MB and the split size
 beyond 256MB.  You also need enough memory to support
 all this large object allocation.
 
 And of course, test test test.  That's the easiest way
 to see if what you want to do will work :)
 
 When you run into problems, e-mail the list.
 
 As far as row size is concerned, the only issue is that a
 row can never span multiple regions so a given row can only
 be in one region and thus be hosted on one server at a
 time.
 
 JG
 
  -Original Message-
  From: William Kang [mailto:weliam.cl...@gmail.com]
  Sent: Monday, September 06, 2010 1:57 PM
  To: hbase-user
  Subject: Limits on HBase
  
  Hi folks,
  I know this question may have been asked many times,
 but I am wondering
  if
  there is any update on the optimized cell size (in
 megabytes) and row
  size
  (in megabytes)? Many thanks.
  
  
  William
 






Re: question about RegionManager

2010-09-07 Thread Todd Lipcon
On Mon, Sep 6, 2010 at 11:34 PM, Tao Xie xietao.mail...@gmail.com wrote:

 But when I directly load data into HDFS using HDFS API, the disks are
 balanced.
 I use hadoop-0.20.2.


Yes, the bugs occur when processing a large volume of block deletions. See
HADOOP-5124 and HDFS-611. HBase's compactions cause a larger deletion rate
than typical HDFS usage.

-Todd


 2010/9/7 Todd Lipcon t...@cloudera.com

  On Mon, Sep 6, 2010 at 9:08 PM, Jonathan Gray jg...@facebook.com
 wrote:
 
   You're looking at sizes on disk?  Then this has nothing to do with
 HBase
   load balancing.
  
   HBase does not move blocks around on the HDFS layer or deal with which
   physical disks are used, that is completely the responsibility of HDFS.
  
   Periodically HBase will perform major compactions on regions which
 causes
   data to be rewritten.  This creates new files so could change what is
 in
   HDFS.
  
 
  There are some bugs in HDFS in 0.20 which can create this out-of-balance
  scenario.
 
  If you use CDH3b2 you should have a few patches which help to rectify the
  situation, in particular HDFS-611.
 
  Thanks
  -Todd
 
 
  
   JG
  
-Original Message-
From: Tao Xie [mailto:xietao.mail...@gmail.com]
Sent: Monday, September 06, 2010 8:38 PM
To: user@hbase.apache.org
Subject: Re: question about RegionManager
   
Actually, I'm a newbie of HBase. I went to read the code of assigning
region
because I met a load imbalance problem in my hbase cluster. I run 1+6
nodes
hbase cluster, 1 node as master  client, the other nodes as region
server
and data nodes. I run YCSB to insert records. In the inserting time,
 I
find
the data written to data nodes have different data size on disks.  I
think
HDFS is doing well in balancing write. So is this problem due to
 HBase?
   
Btw, after finished writing for minutes, the disks get balanced
finally. I
think maybe there is a LoadBalance like deamon thread working on
 this.
Can
anyone explain this? Many thanks.
   
After inserting 160M 1k records, my six datanodes are greatly
imbalanced.
   
10.1.0.125: /dev/sdb1 280G   89G  178G  34%
 /mnt/DP_disk1
   
10.1.0.125: /dev/sdc1 280G   91G  176G  35%
 /mnt/DP_disk2
   
10.1.0.125: /dev/sdd1 280G   91G  176G  34%
 /mnt/DP_disk3
   
10.1.0.121: /dev/sdb1 280G   15G  251G   6%
 /mnt/DP_disk1
   
10.1.0.121: /dev/sdc1 280G   16G  250G   6%
 /mnt/DP_disk2
   
10.1.0.121: /dev/sdd1 280G   15G  251G   6%
 /mnt/DP_disk3
   
10.1.0.122: /dev/sdb1 280G   15G  251G   6%
 /mnt/DP_disk1
   
10.1.0.122: /dev/sdc1 280G   15G  252G   6%
 /mnt/DP_disk2
   
10.1.0.122: /dev/sdd1 280G   13G  253G   5%
 /mnt/DP_disk3
   
10.1.0.124: /dev/sdb1 280G   14G  253G   5%
 /mnt/DP_disk1
   
10.1.0.124: /dev/sdc1 280G   15G  252G   6%
 /mnt/DP_disk2
   
10.1.0.124: /dev/sdd1 280G   14G  253G   6%
 /mnt/DP_disk3
   
10.1.0.123: /dev/sdb1 280G   66G  200G  25%
 /mnt/DP_disk1
   
10.1.0.123: /dev/sdc1 280G   65G  201G  25%
 /mnt/DP_disk2
   
10.1.0.123: /dev/sdd1 280G   65G  202G  25%
 /mnt/DP_disk3
   
10.1.0.126: /dev/sdb1 280G   14G  252G   6%
 /mnt/DP_disk1
   
10.1.0.126: /dev/sdc1 280G   14G  252G   6%
 /mnt/DP_disk2
   
10.1.0.126: /dev/sdd1 280G   13G  253G   5%
 /mnt/DP_disk3
   
2010/9/7 Tao Xie xietao.mail...@gmail.com
   
 I have a look at the following method in 0.89. Is the the following
line
 correct ?

 nRegions *= e.getValue().size();


 private int regionsToGiveOtherServers(final int
 numUnassignedRegions,
 final HServerLoad thisServersLoad) {
 SortedMapHServerLoad, SetString lightServers =
   new TreeMapHServerLoad, SetString();
 this.master.getLightServers(thisServersLoad, lightServers);
 // Examine the list of servers that are more lightly loaded
 than
this
 one.
 // Pretend that we will assign regions to these more lightly
loaded
 servers
 // until they reach load equal with ours. Then, see how many
regions
 are left
 // unassigned. That is how many regions we should assign to
 this
 server.
 int nRegions = 0;
 for (Map.EntryHServerLoad, SetString e:
lightServers.entrySet()) {
   HServerLoad lightLoad = new HServerLoad(e.getKey());
   do {
 lightLoad.setNumberOfRegions(lightLoad.getNumberOfRegions()
 +
1);
 nRegions += 1;
   } while (lightLoad.compareTo(thisServersLoad) = 0
nRegions  numUnassignedRegions);
   nRegions *= e.getValue().size();
   if (nRegions = numUnassignedRegions) {
 break;
   }
 }
 return 

stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Jian Lu
Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and 
still running?  I was able to started / stopped hbase in the past two months.  
Now it suddenly stops working.

I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system.  I 
downloaded hbase-0.20.4 and run on a standalone mode 
(http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description)

Thanks!
Jack.



Re: stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Alexey Kovyrin
Never worked for me (and I believe there was a JIRA for that).

On Tue, Sep 7, 2010 at 5:44 PM, Jian Lu j...@local.com wrote:
 Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and 
 still running?  I was able to started / stopped hbase in the past two months. 
  Now it suddenly stops working.

 I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system.  I 
 downloaded hbase-0.20.4 and run on a standalone mode 
 (http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description)

 Thanks!
 Jack.





-- 
Alexey Kovyrin
http://kovyrin.net/


Re: stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Stack
Check the master log.  It'll usually say what its waiting on.  At this
stage, just kill your servers.  Try kill PID first.  If that doesn't
work, try kill -9 PID.  Also, update your hbase to 0.20.6.
St.Ack

On Tue, Sep 7, 2010 at 2:44 PM, Jian Lu j...@local.com wrote:
 Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and 
 still running?  I was able to started / stopped hbase in the past two months. 
  Now it suddenly stops working.

 I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system.  I 
 downloaded hbase-0.20.4 and run on a standalone mode 
 (http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description)

 Thanks!
 Jack.




Re: stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Venkatesh

 Don't know if this helps..but here are couple of reasons when I had the issue 
 how i resolved it
- If zookeeper is not running (or do not have the quorum) in a cluster setup, 
hbase does not go down..bring up zookeeper
- Make sure pid file is not under /tmp...somtimes files get cleaned out of 
tmp..Change *env.sh to point to diff dir.


 


 

 

-Original Message-
From: Jian Lu j...@local.com
To: user@hbase.apache.org user@hbase.apache.org
Sent: Tue, Sep 7, 2010 5:44 pm
Subject: stop-hbase.sh takes forever (never stops)


Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and 
still running?  I was able to started / stopped hbase in the past two months.  
Now it suddenly stops working.

I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system.  I 
downloaded hbase-0.20.4 and run on a standalone mode 
(http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description)

Thanks!
Jack.


 


RE: Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Buttler, David
Hi Ron,
The first thing that jumps out at me is that you are getting localhost as the 
address for your zookeeper server.  This is almost certainly wrong.  You should 
be getting a list of your zookeeper quorum here.  Until you fix that nothing 
will work.

You need something like the following in your hbase-site.xml file (and your 
hbase-site.xml file should be in the classpath of all of the jobs you expect to 
run against your cluster):
property
namehbase.zookeeper.property.clientPort/name
value2181/value
description the port at which the clients will connect /description
/property
  property
namehbase.zookeeper.quorum/name
valuenode-01,node-02,node-03,node-04,node-05/value
descriptionComma separated list of servers in the ZooKeeper Quorum.
For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com.
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
/description
  /property

Let me know if that helps,
Dave

-Original Message-
From: Taylor, Ronald C [mailto:ronald.tay...@pnl.gov]
Sent: Tuesday, September 07, 2010 3:18 PM
To: 'hbase-u...@hadoop.apache.org'
Cc: Taylor, Ronald C; Witteveen, Tim
Subject: Question on Hbase 0.89 - interactive shell works, programs don't - 
could use help


Hello folks,

We've just installed Hbase 0.89 on a 24-node cluster running Hadoop 0.20.2 here 
at our government lab.

Got a problem. The Hbase interactive shell works fine. I  can create a table 
with a column family, add a couple rows, get the rows back out.  Also, the 
Hbase web site on our cluster at

   http://*h01.emsl.pnl.gov:60010/master.jsp

 doesn't appear (to our untrained eyes) to show anything going wrong

However, the Hbase programs that I used on another cluster that ran an earlier 
version of Hbase no longer run. I altered such a program to use the new API, 
and it compiles fine. However, when I try to run it, I get the error msgs seen 
below.

So - I downloaded the sample 0.89 Hbase program from the Hbase web site and 
tried that, simply altering the table name used to peptideTable, column 
family to f1, and column to name.

The interactive shell shows that the table and data are there . But the 
slightly altered program from the Hbase web site, while compiling fine, again 
shows the same errors as I got using my own Hbase program. I've tried running 
the programs in both my own 'rtaylor' account, and in the 'hbase' account - I 
get the same errors.

So my colleague Tim and I think we missed something in the install.

I have appended the test program in full below, followed by the error msgs that 
it generated. Lastly, I have appended a screen dump of the contents of the web 
page at
  http://*h01.emsl.pnl.gov:60010/master.jsp

 on our cluster.

 We would very much appreciate some guidance.

   Cheers,
Ron Taylor
___
Ronald Taylor, Ph.D.
Computational Biology  Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.tay...@pnl.gov


%

Contents of MyLittleHBaseClient.java:


import java.io.IOException;

// javac MyLittleHBaseClient.java
// javac -Xlint MyLittleHBaseClient.java

// java MyLittleHBaseClient

import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;


// Class that has nothing but a main.
// Does a Put, Get and a Scan against an hbase table.

public class MyLittleHBaseClient {

public static void main(String[] args) throws IOException {

// You need a configuration object to tell the client where to connect.
// When you create a HBaseConfiguration, it reads in whatever you've set
// into your hbase-site.xml and in hbase-default.xml, as long as these 
can
// be found on the CLASSPATH
HBaseConfiguration config = new HBaseConfiguration();

// This instantiates an HTable object that connects you to
// the myLittleHBaseTable table.
HTable table = new HTable(config, peptideTable);

// To add to a row, use Put.  A Put constructor takes the name of the 
row
// you want to insert into as a byte array.  In HBase, the Bytes class 
has
// utility for converting all kinds of java types to byte arrays.  In 
the
// below, we are converting the String myLittleRow into a byte array 
to
// use as a 

RE: stop-hbase.sh takes forever (never stops)

2010-09-07 Thread Jian Lu
Thanks gentlemen!  It works now.  I manually killed the three PID found in /tmp 
dir, and changed all /tmp in hbase-env.sh to other dir. Thanks again!

-Original Message-
From: Venkatesh [mailto:vramanatha...@aol.com] 
Sent: Tuesday, September 07, 2010 3:13 PM
To: user@hbase.apache.org
Subject: Re: stop-hbase.sh takes forever (never stops)


 Don't know if this helps..but here are couple of reasons when I had the issue 
 how i resolved it
- If zookeeper is not running (or do not have the quorum) in a cluster setup, 
hbase does not go down..bring up zookeeper
- Make sure pid file is not under /tmp...somtimes files get cleaned out of 
tmp..Change *env.sh to point to diff dir.


 


 

 

-Original Message-
From: Jian Lu j...@local.com
To: user@hbase.apache.org user@hbase.apache.org
Sent: Tue, Sep 7, 2010 5:44 pm
Subject: stop-hbase.sh takes forever (never stops)


Hi, could someone please tell me why stop-hbase.sh takes more than 24 hrs and 
still running?  I was able to started / stopped hbase in the past two months.  
Now it suddenly stops working.

I am running hbase-0.20.4 with Linux 64-bit CPU / 64-bit operating system.  I 
downloaded hbase-0.20.4 and run on a standalone mode 
(http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description)

Thanks!
Jack.


 


Re: Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Jeff Whiting
 We had a weird problem when we accidentally kept old jars (0.20.4) around and tried to connect to 
hbase 0.89.  Zookeeper would connect but no data would be sent.  That may not be your problem, but 
it is something to watch out for.


~Jeff

On 9/7/2010 4:18 PM, Taylor, Ronald C wrote:

Hello folks,

We've just installed Hbase 0.89 on a 24-node cluster running Hadoop 0.20.2 here 
at our government lab.

Got a problem. The Hbase interactive shell works fine. I  can create a table 
with a column family, add a couple rows, get the rows back out.  Also, the 
Hbase web site on our cluster at

http://h01.emsl.pnl.gov:60010/master.jsp

  doesn't appear (to our untrained eyes) to show anything going wrong

However, the Hbase programs that I used on another cluster that ran an earlier 
version of Hbase no longer run. I altered such a program to use the new API, 
and it compiles fine. However, when I try to run it, I get the error msgs seen 
below.

So - I downloaded the sample 0.89 Hbase program from the Hbase web site and tried that, simply altering the 
table name used to peptideTable, column family to f1, and column to name.

The interactive shell shows that the table and data are there . But the 
slightly altered program from the Hbase web site, while compiling fine, again 
shows the same errors as I got using my own Hbase program. I've tried running 
the programs in both my own 'rtaylor' account, and in the 'hbase' account - I 
get the same errors.

So my colleague Tim and I think we missed something in the install.

I have appended the test program in full below, followed by the error msgs that 
it generated. Lastly, I have appended a screen dump of the contents of the web 
page at
   http://h01.emsl.pnl.gov:60010/master.jsp

  on our cluster.

  We would very much appreciate some guidance.

Cheers,
 Ron Taylor
___
Ronald Taylor, Ph.D.
Computational Biology  Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.tay...@pnl.gov


%

Contents of MyLittleHBaseClient.java:


import java.io.IOException;

// javac MyLittleHBaseClient.java
// javac -Xlint MyLittleHBaseClient.java

// java MyLittleHBaseClient

import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;


// Class that has nothing but a main.
// Does a Put, Get and a Scan against an hbase table.

public class MyLittleHBaseClient {

 public static void main(String[] args) throws IOException {

 // You need a configuration object to tell the client where to connect.
 // When you create a HBaseConfiguration, it reads in whatever you've 
set
 // into your hbase-site.xml and in hbase-default.xml, as long as these 
can
 // be found on the CLASSPATH
 HBaseConfiguration config = new HBaseConfiguration();

 // This instantiates an HTable object that connects you to
 // the myLittleHBaseTable table.
 HTable table = new HTable(config, peptideTable);

 // To add to a row, use Put.  A Put constructor takes the name of the 
row
 // you want to insert into as a byte array.  In HBase, the Bytes class 
has
 // utility for converting all kinds of java types to byte arrays.  In 
the
 // below, we are converting the String myLittleRow into a byte array 
to
 // use as a row key for our update. Once you have a Put instance, you 
can
 // adorn it by setting the names of columns you want to update on the 
row,
 // the timestamp to use in your update, etc.If no timestamp, the server
 // applies current time to the edits.
 //
 Put p = new Put(Bytes.toBytes(2001));

 // To set the value you'd like to update in the row 'myLittleRow', 
specify
 // the column family, column qualifier, and value of the table cell 
you'd
 // like to update.  The column family must already exist in your table
 // schema.  The qualifier can be anything.  All must be specified as 
byte
 // arrays as hbase is all about byte arrays.  Lets pretend the table
 // 'myLittleHBaseTable' was created with a family 'myLittleFamily'.
 //
 p.add(Bytes.toBytes(f1), Bytes.toBytes(name),
   Bytes.toBytes(p2001));

 // Once you've adorned your Put instance with all the updates you want 
to
 // make, to commit it do the following (The HTable#put method takes the
 // Put instance you've been building and pushes the changes you made 
into
 // hbase)

Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Taylor, Ronald C

J-D, David, and Jeff,

Thanks for getting back to me so quickly. Problem has been resolved. I added
   /home/hbase/hbase/conf
 to my CLASSPATH var,

 and made sure that both these files:
  hbase-default.xml
 and
  hbase-site.xml

 in the
/home/hbase/hbase/conf
 directory use the values below for setting the quorum (using the h02,h03, etc 
nodes on our cluster):

  property
namehbase.zookeeper.quorum/name
valueh02,h03,h04,h05,h06,h07,h08,h09,h10/value

descriptionComma separated list of servers in the ZooKeeper Quorum.
For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com.
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
/description
  /property

This appears to have fixed the problem. Thanks again.
Ron

___
Ronald Taylor, Ph.D.
Computational Biology  Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.tay...@pnl.gov


-Original Message-
From: Buttler, David [mailto:buttl...@llnl.gov]
Sent: Tuesday, September 07, 2010 3:24 PM
To: user@hbase.apache.org; 'hbase-u...@hadoop.apache.org'
Cc: Witteveen, Tim
Subject: RE: Question on Hbase 0.89 - interactive shell works, programs don't - 
could use help

Hi Ron,
The first thing that jumps out at me is that you are getting localhost as the 
address for your zookeeper server.  This is almost certainly wrong.  You should 
be getting a list of your zookeeper quorum here.  Until you fix that nothing 
will work.

You need something like the following in your hbase-site.xml file (and your 
hbase-site.xml file should be in the classpath of all of the jobs you expect to 
run against your cluster):
property
namehbase.zookeeper.property.clientPort/name
value2181/value
description the port at which the clients will connect /description 
/property
  property
namehbase.zookeeper.quorum/name
valuenode-01,node-02,node-03,node-04,node-05/value
descriptionComma separated list of servers in the ZooKeeper Quorum.
For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com.
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
/description
  /property

Let me know if that helps,
Dave

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
Cryans
Sent: Tuesday, September 07, 2010 3:23 PM
To: user@hbase.apache.org
Subject: Re: Question on Hbase 0.89 - interactive shell works, programs don't - 
could use help

Your client is trying to connect to a local zookeeper ensemble (grep for 
connectString in the message). This means that the client doesn't know about 
the proper configurations in order to connect to the cluster. Either put your 
hbase-site.xml on the client's classpath or set the proper settings on the 
HBaseConfiguration object.

J-D

On Tue, Sep 7, 2010 at 3:18 PM, Taylor, Ronald C ronald.tay...@pnl.gov wrote:

 Hello folks,

 We've just installed Hbase 0.89 on a 24-node cluster running Hadoop 0.20.2 
 here at our government lab.

 Got a problem. The Hbase interactive shell works fine. I  can create a
 table with a column family, add a couple rows, get the rows back out.
 Also, the Hbase web site on our cluster at

   http://h01.emsl.pnl.gov:60010/master.jsp

  doesn't appear (to our untrained eyes) to show anything going wrong

 However, the Hbase programs that I used on another cluster that ran an 
 earlier version of Hbase no longer run. I altered such a program to use the 
 new API, and it compiles fine. However, when I try to run it, I get the error 
 msgs seen below.

 So - I downloaded the sample 0.89 Hbase program from the Hbase web site and 
 tried that, simply altering the table name used to peptideTable, column 
 family to f1, and column to name.

 The interactive shell shows that the table and data are there . But the 
 slightly altered program from the Hbase web site, while compiling fine, again 
 shows the same errors as I got using my own Hbase program. I've tried running 
 the programs in both my own 'rtaylor' account, and in the 'hbase' account - I 
 get the same errors.

 So my colleague Tim and I think we missed something in the install.

 I have appended the test program in full below, followed by the error
 msgs that it generated. Lastly, I have appended a screen dump of the
 contents of the web page at
  http://h01.emsl.pnl.gov:60010/master.jsp

  on 

RE: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Buttler, David
Are you sure you want 9 peers in zookeeper?  I think the standard advice is to 
have:
* 1 peer for clusters of size  10
* 5 peers for medium size clusters (10-40)
* 1 peer per rack for large clusters

9 seems like overkill for a cluster that has 25 nodes.  Zookeeper should 
probably have its own disk on each device (which will reduce your potential 
storage space), and it has to write to disk on every peer before a zookeeper 
write will succeed -- more peers means that the cost per write is higher.

Dave



-Original Message-
From: Taylor, Ronald C [mailto:ronald.tay...@pnl.gov]
Sent: Tuesday, September 07, 2010 4:40 PM
To: 'user@hbase.apache.org'
Cc: Taylor, Ronald C; Witteveen, Tim
Subject: Solved - Question on Hbase 0.89 - interactive shell works, programs 
don't - could use help


J-D, David, and Jeff,

Thanks for getting back to me so quickly. Problem has been resolved. I added
   /home/hbase/hbase/conf
 to my CLASSPATH var,

 and made sure that both these files:
  hbase-default.xml
 and
  hbase-site.xml

 in the
/home/hbase/hbase/conf
 directory use the values below for setting the quorum (using the h02,h03, etc 
nodes on our cluster):

  property
namehbase.zookeeper.quorum/name
valueh02,h03,h04,h05,h06,h07,h08,h09,h10/value

descriptionComma separated list of servers in the ZooKeeper Quorum.
For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com.
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
/description
  /property

This appears to have fixed the problem. Thanks again.
Ron

___
Ronald Taylor, Ph.D.
Computational Biology  Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.tay...@pnl.gov


-Original Message-
From: Buttler, David [mailto:buttl...@llnl.gov]
Sent: Tuesday, September 07, 2010 3:24 PM
To: user@hbase.apache.org; 'hbase-u...@hadoop.apache.org'
Cc: Witteveen, Tim
Subject: RE: Question on Hbase 0.89 - interactive shell works, programs don't - 
could use help

Hi Ron,
The first thing that jumps out at me is that you are getting localhost as the 
address for your zookeeper server.  This is almost certainly wrong.  You should 
be getting a list of your zookeeper quorum here.  Until you fix that nothing 
will work.

You need something like the following in your hbase-site.xml file (and your 
hbase-site.xml file should be in the classpath of all of the jobs you expect to 
run against your cluster):
property
namehbase.zookeeper.property.clientPort/name
value2181/value
description the port at which the clients will connect /description 
/property
  property
namehbase.zookeeper.quorum/name
valuenode-01,node-02,node-03,node-04,node-05/value
descriptionComma separated list of servers in the ZooKeeper Quorum.
For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com.
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
/description
  /property

Let me know if that helps,
Dave

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
Cryans
Sent: Tuesday, September 07, 2010 3:23 PM
To: user@hbase.apache.org
Subject: Re: Question on Hbase 0.89 - interactive shell works, programs don't - 
could use help

Your client is trying to connect to a local zookeeper ensemble (grep for 
connectString in the message). This means that the client doesn't know about 
the proper configurations in order to connect to the cluster. Either put your 
hbase-site.xml on the client's classpath or set the proper settings on the 
HBaseConfiguration object.

J-D

On Tue, Sep 7, 2010 at 3:18 PM, Taylor, Ronald C ronald.tay...@pnl.gov wrote:

 Hello folks,

 We've just installed Hbase 0.89 on a 24-node cluster running Hadoop 0.20.2 
 here at our government lab.

 Got a problem. The Hbase interactive shell works fine. I  can create a
 table with a column family, add a couple rows, get the rows back out.
 Also, the Hbase web site on our cluster at

   http://*h01.emsl.pnl.gov:60010/master.jsp

  doesn't appear (to our untrained eyes) to show anything going wrong

 However, the Hbase programs that I used on another cluster that ran an 
 earlier version of Hbase no longer run. I altered such a program to use the 
 new API, and it compiles fine. However, when I try to run it, I get the error 
 msgs seen below.

 So - I 

RE: Solved - Question on Hbase 0.89 - interactive shell works, programs don't - could use help

2010-09-07 Thread Taylor, Ronald C

Thanks - I'll talk to Tim as to cutting down on the zookeeper peers. At the 
moment we at least don't have to worry about storage space - we have 25 Tb of 
disk on each node - 600 Tb total to play with, which is plenty for us. (I'd 
trade some of that disk capacity for more RAM per node, but have to work with 
the cluster we were given for testing purposes - hopefully we'll expand in the 
future.)

Ron

___
Ronald Taylor, Ph.D.
Computational Biology  Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.tay...@pnl.gov

-Original Message-
From: Buttler, David [mailto:buttl...@llnl.gov]
Sent: Tuesday, September 07, 2010 4:47 PM
To: user@hbase.apache.org
Cc: Witteveen, Tim
Subject: RE: Solved - Question on Hbase 0.89 - interactive shell works, 
programs don't - could use help

Are you sure you want 9 peers in zookeeper?  I think the standard advice is to 
have:
* 1 peer for clusters of size  10
* 5 peers for medium size clusters (10-40)
* 1 peer per rack for large clusters

9 seems like overkill for a cluster that has 25 nodes.  Zookeeper should 
probably have its own disk on each device (which will reduce your potential 
storage space), and it has to write to disk on every peer before a zookeeper 
write will succeed -- more peers means that the cost per write is higher.

Dave



-Original Message-
From: Taylor, Ronald C [mailto:ronald.tay...@pnl.gov]
Sent: Tuesday, September 07, 2010 4:40 PM
To: 'user@hbase.apache.org'
Cc: Taylor, Ronald C; Witteveen, Tim
Subject: Solved - Question on Hbase 0.89 - interactive shell works, programs 
don't - could use help


J-D, David, and Jeff,

Thanks for getting back to me so quickly. Problem has been resolved. I added
   /home/hbase/hbase/conf
 to my CLASSPATH var,

 and made sure that both these files:
  hbase-default.xml
 and
  hbase-site.xml

 in the
/home/hbase/hbase/conf
 directory use the values below for setting the quorum (using the h02,h03, etc 
nodes on our cluster):

  property
namehbase.zookeeper.quorum/name
valueh02,h03,h04,h05,h06,h07,h08,h09,h10/value

descriptionComma separated list of servers in the ZooKeeper Quorum.
For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com.
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
/description
  /property

This appears to have fixed the problem. Thanks again.
Ron

___
Ronald Taylor, Ph.D.
Computational Biology  Bioinformatics Group Pacific Northwest National 
Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.tay...@pnl.gov


-Original Message-
From: Buttler, David [mailto:buttl...@llnl.gov]
Sent: Tuesday, September 07, 2010 3:24 PM
To: user@hbase.apache.org; 'hbase-u...@hadoop.apache.org'
Cc: Witteveen, Tim
Subject: RE: Question on Hbase 0.89 - interactive shell works, programs don't - 
could use help

Hi Ron,
The first thing that jumps out at me is that you are getting localhost as the 
address for your zookeeper server.  This is almost certainly wrong.  You should 
be getting a list of your zookeeper quorum here.  Until you fix that nothing 
will work.

You need something like the following in your hbase-site.xml file (and your 
hbase-site.xml file should be in the classpath of all of the jobs you expect to 
run against your cluster):
property
namehbase.zookeeper.property.clientPort/name
value2181/value
description the port at which the clients will connect /description 
/property
  property
namehbase.zookeeper.quorum/name
valuenode-01,node-02,node-03,node-04,node-05/value
descriptionComma separated list of servers in the ZooKeeper Quorum.
For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com.
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
/description
  /property

Let me know if that helps,
Dave

-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
Cryans
Sent: Tuesday, September 07, 2010 3:23 PM
To: user@hbase.apache.org
Subject: Re: Question on Hbase 0.89 - interactive shell works, programs don't - 
could use help

Your client is trying to connect to a local zookeeper ensemble (grep for 
connectString in the message). This means that the client doesn't know about 
the 

Re: Limits on HBase

2010-09-07 Thread William Kang
Hi,
Thanks for your reply. How about the row size? I read that a row should not
be larger than the hdfs file on region server which is 256M in default. Is
it right? Many thanks.


William

On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org wrote:

 In addition to what Jon said please be aware that if compression is
 specified in the table schema, it happens at the store file level --
 compression happens after write I/O, before read I/O, so if you transmit a
 100MB object that compresses to 30MB, the performance impact is that of
 100MB, not 30MB.

 I also try not to go above 50MB as largest cell size, for the same reason.
 I have tried storing objects larger than 100MB but this can cause out of
 memory issues on busy regionservers no matter the size of the heap. When/if
 HBase RPC can send large objects in smaller chunks, this will be less of an
 issue.

 Best regards,

- Andy

 Why is this email five sentences or less?
 http://five.sentenc.es/


 --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote:

  From: Jonathan Gray jg...@facebook.com
  Subject: RE: Limits on HBase
  To: user@hbase.apache.org user@hbase.apache.org
  Date: Monday, September 6, 2010, 4:10 PM
  I'm not sure what you mean by
  optimized cell size or whether you're just asking about
  practical limits?
 
  HBase is generally used with cells in the range of tens of
  bytes to hundreds of kilobytes.  However, I have used
  it with cells that are several megabytes, up to about
  50MB.  Up at that level, I have seen some weird
  performance issues.
 
  The most important thing is to be sure to tweak all of your
  settings.  If you have 20MB cells, you need to be sure
  to increase the flush size beyond 64MB and the split size
  beyond 256MB.  You also need enough memory to support
  all this large object allocation.
 
  And of course, test test test.  That's the easiest way
  to see if what you want to do will work :)
 
  When you run into problems, e-mail the list.
 
  As far as row size is concerned, the only issue is that a
  row can never span multiple regions so a given row can only
  be in one region and thus be hosted on one server at a
  time.
 
  JG
 
   -Original Message-
   From: William Kang [mailto:weliam.cl...@gmail.com]
   Sent: Monday, September 06, 2010 1:57 PM
   To: hbase-user
   Subject: Limits on HBase
  
   Hi folks,
   I know this question may have been asked many times,
  but I am wondering
   if
   there is any update on the optimized cell size (in
  megabytes) and row
   size
   (in megabytes)? Many thanks.
  
  
   William
 







Re: thrift for hbase in CDH3 broken ?

2010-09-07 Thread Igor Ranitovic

Jinsong Hu wrote:

I tried, this doesn't work. I noticed
$transport-open();
is missing in this code. so I added it.


Yup. Sorry about that. Copy and paste error :(

following code first successfully print all tables, then in the line 
getRow(), it throws exception, even with ruby client, the row data is there




  $transport-open();


  my @names=$client-getTableNames();

  print  Dumper(@names);
  print \n;

my $row = $client-getRow('table12345', key123);

  print  Dumper($row);
  print \n;



  $transport-close();



So you can scan META table on the master, but can fetch a row from a RS.
Are there any firewalls in place ? Are you running thrift servers on the 
same nodes as region servers? What kind of exception do you get?


i.


--
From: Igor Ranitovic irani...@gmail.com
Sent: Friday, September 03, 2010 11:45 AM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


Not should what the test code is...would this test your setup?


#!/usr/bin/env perl

use strict;
use warnings;

use Thrift::BinaryProtocol;
use Thrift::BufferedTransport;
use Thrift::Socket;
use Hbase::Hbase;
use Data::Dumper;

my $sock = Thrift::Socket-new('127.0.0.1', '9090');
$sock-setRecvTimeout(6);
my $transport = Thrift::BufferedTransport-new($sock);
my $protocol = Thrift::BinaryProtocol-new($transport);
my $client = Hbase::HbaseClient-new($protocol);

my $row = $client-getRow('table_test', 'row_123');
print Dumper($row);

$transport-close();


BTW, I am not sure why you would want to use java to talk to the HBase 
via the thirft server.


i.


Jinsong Hu wrote:

by the way, does anybody have a perl version of the test code ?

Jimmy

--
From: Jinsong Hu jinsong...@hotmail.com
Sent: Friday, September 03, 2010 11:17 AM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?



I tried your code and indeed it works. but the java version doesn't 
work. so it looks like it is a bug

of the java library supplied by the thrift-0.2.0 version.

Jimmy.
--
From: Alexey Kovyrin ale...@kovyrin.net
Sent: Friday, September 03, 2010 12:31 AM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


yes, Centos 5.5 + CDH3b2

On Fri, Sep 3, 2010 at 3:26 AM, Jinsong Hu jinsong...@hotmail.com 
wrote:

are you using CDH3 distribution ?

Jinsong


--
From: Alexey Kovyrin ale...@kovyrin.net
Sent: Friday, September 03, 2010 12:04 AM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


http://github.com/kovyrin/hbase-thrift-client-examples - just wrote
this example and tested it in our cluster, works as expected.
For this to work you'd need to install rubygems and thrift gem (gem
install thrift).

On Fri, Sep 3, 2010 at 12:01 AM, Jinsong Hu jinsong...@hotmail.com
wrote:


Can you send me some ruby test code and so I can try against the 
latest

CDH3
?

Jimmy.

--
From: Alexey Kovyrin ale...@kovyrin.net
Sent: Thursday, September 02, 2010 8:15 PM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


We use it in Scribd.com. All clients are ruby web apps.

On Thu, Sep 2, 2010 at 10:49 PM, Todd Lipcon 
t...@cloudera.com wrote:


On Thu, Sep 2, 2010 at 5:35 PM, Jinsong Hu 
jinsong...@hotmail.com

wrote:


Yes, I confirmed that it is indeed thrift server.

and the fact that the API

Listbyte[] tableNamesList=client.getTableNames();


   for (byte [] name : tableNamesList)
   {
   System.out.println(new String(name));
   }



successfully printed all table names shows that it is indeed 
thrift

server.

if it is hue, it won't print the table names.

Ah, sorry, I missed that in your original message. Not sure 
what's up,

then


- we don't have any changes in CDH that would affect this. 
Anyone here

used
thrift on 0.89.20100621?

-Todd




Jimmy.

--
From: Todd Lipcon t...@cloudera.com
Sent: Thursday, September 02, 2010 5:18 PM

To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


 Hi Jinsong,


Are you sure that the port you're connecting to is indeed 
the thrift

server?

Unfortunately both the HBase thrift server and the Hue namenode
plugin
listen on port 9090, so you might be having an issue where 
your HBase

client
is trying to connect to the Namenode server instead of HBase.

You can verify the ports using a command like /sbin/fuser 
-n tcp

9090
to
see which pid has it open, then cross reference against sudo 
jps.


Thanks
-Todd

On Thu, Sep 2, 2010 at 4:40 PM, Jinsong Hu 
jinsong...@hotmail.com

wrote:

 Hi, There,


 I am trying to test and see if thrift for hbase works. I 
followed

the
example from

http://www.workhabit.com/labs/centos-55-and-thriftscribe
http://incubator.apache.org/thrift/

RE: Limits on HBase

2010-09-07 Thread Jonathan Gray
You can go way beyond the max region split / split size.  HBase will never 
split the region once it is a single row, even if beyond the split size.

Also, if you're using large values, you should have region sizes much larger 
than the default.  It's common to run with 1-2GB regions in many cases.

What you may have seen are recommendations that if your cell values are 
approaching the default block size on HDFS (64MB), you should consider putting 
the data directly into HDFS rather than HBase.

JG

 -Original Message-
 From: William Kang [mailto:weliam.cl...@gmail.com]
 Sent: Tuesday, September 07, 2010 7:36 PM
 To: user@hbase.apache.org; apurt...@apache.org
 Subject: Re: Limits on HBase
 
 Hi,
 Thanks for your reply. How about the row size? I read that a row should
 not
 be larger than the hdfs file on region server which is 256M in default.
 Is
 it right? Many thanks.
 
 
 William
 
 On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org
 wrote:
 
  In addition to what Jon said please be aware that if compression is
  specified in the table schema, it happens at the store file level --
  compression happens after write I/O, before read I/O, so if you
 transmit a
  100MB object that compresses to 30MB, the performance impact is that
 of
  100MB, not 30MB.
 
  I also try not to go above 50MB as largest cell size, for the same
 reason.
  I have tried storing objects larger than 100MB but this can cause out
 of
  memory issues on busy regionservers no matter the size of the heap.
 When/if
  HBase RPC can send large objects in smaller chunks, this will be less
 of an
  issue.
 
  Best regards,
 
 - Andy
 
  Why is this email five sentences or less?
  http://five.sentenc.es/
 
 
  --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote:
 
   From: Jonathan Gray jg...@facebook.com
   Subject: RE: Limits on HBase
   To: user@hbase.apache.org user@hbase.apache.org
   Date: Monday, September 6, 2010, 4:10 PM
   I'm not sure what you mean by
   optimized cell size or whether you're just asking about
   practical limits?
  
   HBase is generally used with cells in the range of tens of
   bytes to hundreds of kilobytes.  However, I have used
   it with cells that are several megabytes, up to about
   50MB.  Up at that level, I have seen some weird
   performance issues.
  
   The most important thing is to be sure to tweak all of your
   settings.  If you have 20MB cells, you need to be sure
   to increase the flush size beyond 64MB and the split size
   beyond 256MB.  You also need enough memory to support
   all this large object allocation.
  
   And of course, test test test.  That's the easiest way
   to see if what you want to do will work :)
  
   When you run into problems, e-mail the list.
  
   As far as row size is concerned, the only issue is that a
   row can never span multiple regions so a given row can only
   be in one region and thus be hosted on one server at a
   time.
  
   JG
  
-Original Message-
From: William Kang [mailto:weliam.cl...@gmail.com]
Sent: Monday, September 06, 2010 1:57 PM
To: hbase-user
Subject: Limits on HBase
   
Hi folks,
I know this question may have been asked many times,
   but I am wondering
if
there is any update on the optimized cell size (in
   megabytes) and row
size
(in megabytes)? Many thanks.
   
   
William
  
 
 
 
 
 


Re: thrift for hbase in CDH3 broken ?

2010-09-07 Thread Jinsong Hu
There is no firewall. As you can see, on the same client machine, I am able 
to get the ruby version of the code to work.
This confirms that the thrift server is not the problem. Basically I am just 
trying to fetch the same row of data

as that of the ruby program.

I am not running thrift server on the same regionserver. I am running the 
thrift server on a standalone machine

that is configured to point to the zookeeper for the hbase cluster.

since the ruby version of the client code works,  I would assume that the 
thrift server is not the problem.
I also tried java version and it doesn't work either. in the previous post 
somebody asked why I use java.
The reason is because I want to test and see if the thrift server works. I 
never managed to get java working,

even until now.

Have you gotten the perl version to work ? Have you been able to read a row 
of data using perl ?


Jimmy.

--
From: Igor Ranitovic irani...@gmail.com
Sent: Tuesday, September 07, 2010 8:18 PM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


Jinsong Hu wrote:

I tried, this doesn't work. I noticed
$transport-open();
is missing in this code. so I added it.


Yup. Sorry about that. Copy and paste error :(

following code first successfully print all tables, then in the line 
getRow(), it throws exception, even with ruby client, the row data is 
there




  $transport-open();


  my @names=$client-getTableNames();

  print  Dumper(@names);
  print \n;

my $row = $client-getRow('table12345', key123);

  print  Dumper($row);
  print \n;



  $transport-close();



So you can scan META table on the master, but can fetch a row from a RS.
Are there any firewalls in place ? Are you running thrift servers on the 
same nodes as region servers? What kind of exception do you get?


i.


--
From: Igor Ranitovic irani...@gmail.com
Sent: Friday, September 03, 2010 11:45 AM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


Not should what the test code is...would this test your setup?


#!/usr/bin/env perl

use strict;
use warnings;

use Thrift::BinaryProtocol;
use Thrift::BufferedTransport;
use Thrift::Socket;
use Hbase::Hbase;
use Data::Dumper;

my $sock = Thrift::Socket-new('127.0.0.1', '9090');
$sock-setRecvTimeout(6);
my $transport = Thrift::BufferedTransport-new($sock);
my $protocol = Thrift::BinaryProtocol-new($transport);
my $client = Hbase::HbaseClient-new($protocol);

my $row = $client-getRow('table_test', 'row_123');
print Dumper($row);

$transport-close();


BTW, I am not sure why you would want to use java to talk to the HBase 
via the thirft server.


i.


Jinsong Hu wrote:

by the way, does anybody have a perl version of the test code ?

Jimmy

--
From: Jinsong Hu jinsong...@hotmail.com
Sent: Friday, September 03, 2010 11:17 AM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?



I tried your code and indeed it works. but the java version doesn't 
work. so it looks like it is a bug

of the java library supplied by the thrift-0.2.0 version.

Jimmy.
--
From: Alexey Kovyrin ale...@kovyrin.net
Sent: Friday, September 03, 2010 12:31 AM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


yes, Centos 5.5 + CDH3b2

On Fri, Sep 3, 2010 at 3:26 AM, Jinsong Hu jinsong...@hotmail.com 
wrote:

are you using CDH3 distribution ?

Jinsong


--
From: Alexey Kovyrin ale...@kovyrin.net
Sent: Friday, September 03, 2010 12:04 AM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


http://github.com/kovyrin/hbase-thrift-client-examples - just wrote
this example and tested it in our cluster, works as expected.
For this to work you'd need to install rubygems and thrift gem (gem
install thrift).

On Fri, Sep 3, 2010 at 12:01 AM, Jinsong Hu 
jinsong...@hotmail.com

wrote:


Can you send me some ruby test code and so I can try against the 
latest

CDH3
?

Jimmy.

--
From: Alexey Kovyrin ale...@kovyrin.net
Sent: Thursday, September 02, 2010 8:15 PM
To: user@hbase.apache.org
Subject: Re: thrift for hbase in CDH3 broken ?


We use it in Scribd.com. All clients are ruby web apps.

On Thu, Sep 2, 2010 at 10:49 PM, Todd Lipcon t...@cloudera.com 
wrote:


On Thu, Sep 2, 2010 at 5:35 PM, Jinsong Hu 
jinsong...@hotmail.com

wrote:


Yes, I confirmed that it is indeed thrift server.

and the fact that the API

Listbyte[] tableNamesList=client.getTableNames();


   for (byte [] name : tableNamesList)
   {
   System.out.println(new String(name));
   }



successfully printed all table names shows that it is indeed 
thrift

server.

if it is hue, it won't print the table names.

Ah, sorry, I missed that in your original message. Not 

Re: Limits on HBase

2010-09-07 Thread William Kang
Hi,
What's the performance looks like if we put large cell in HDFS vs local file
system? Random access to HDFS would be slow, right?


William

On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray jg...@facebook.com wrote:

 You can go way beyond the max region split / split size.  HBase will never
 split the region once it is a single row, even if beyond the split size.

 Also, if you're using large values, you should have region sizes much
 larger than the default.  It's common to run with 1-2GB regions in many
 cases.

 What you may have seen are recommendations that if your cell values are
 approaching the default block size on HDFS (64MB), you should consider
 putting the data directly into HDFS rather than HBase.

 JG

  -Original Message-
  From: William Kang [mailto:weliam.cl...@gmail.com]
  Sent: Tuesday, September 07, 2010 7:36 PM
  To: user@hbase.apache.org; apurt...@apache.org
  Subject: Re: Limits on HBase
 
  Hi,
  Thanks for your reply. How about the row size? I read that a row should
  not
  be larger than the hdfs file on region server which is 256M in default.
  Is
  it right? Many thanks.
 
 
  William
 
  On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org
  wrote:
 
   In addition to what Jon said please be aware that if compression is
   specified in the table schema, it happens at the store file level --
   compression happens after write I/O, before read I/O, so if you
  transmit a
   100MB object that compresses to 30MB, the performance impact is that
  of
   100MB, not 30MB.
  
   I also try not to go above 50MB as largest cell size, for the same
  reason.
   I have tried storing objects larger than 100MB but this can cause out
  of
   memory issues on busy regionservers no matter the size of the heap.
  When/if
   HBase RPC can send large objects in smaller chunks, this will be less
  of an
   issue.
  
   Best regards,
  
  - Andy
  
   Why is this email five sentences or less?
   http://five.sentenc.es/
  
  
   --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote:
  
From: Jonathan Gray jg...@facebook.com
Subject: RE: Limits on HBase
To: user@hbase.apache.org user@hbase.apache.org
Date: Monday, September 6, 2010, 4:10 PM
I'm not sure what you mean by
optimized cell size or whether you're just asking about
practical limits?
   
HBase is generally used with cells in the range of tens of
bytes to hundreds of kilobytes.  However, I have used
it with cells that are several megabytes, up to about
50MB.  Up at that level, I have seen some weird
performance issues.
   
The most important thing is to be sure to tweak all of your
settings.  If you have 20MB cells, you need to be sure
to increase the flush size beyond 64MB and the split size
beyond 256MB.  You also need enough memory to support
all this large object allocation.
   
And of course, test test test.  That's the easiest way
to see if what you want to do will work :)
   
When you run into problems, e-mail the list.
   
As far as row size is concerned, the only issue is that a
row can never span multiple regions so a given row can only
be in one region and thus be hosted on one server at a
time.
   
JG
   
 -Original Message-
 From: William Kang [mailto:weliam.cl...@gmail.com]
 Sent: Monday, September 06, 2010 1:57 PM
 To: hbase-user
 Subject: Limits on HBase

 Hi folks,
 I know this question may have been asked many times,
but I am wondering
 if
 there is any update on the optimized cell size (in
megabytes) and row
 size
 (in megabytes)? Many thanks.


 William
   
  
  
  
  
  



Re: Limits on HBase

2010-09-07 Thread Ryan Rawson
There are 2 definitions of random access:
1) within a file (hdfs can be less than ideal)
2) randomly getting an entire file (not usually considered random gets)

for the latter, streaming an entire file from HDFS is actually pretty
good.  You can see performances of substantial percentages (think
80%+) of the raw disk perf.  I benched hdfs and got 90MB/sec last year
some time just writing raw files.

-ryan


On Tue, Sep 7, 2010 at 9:07 PM, William Kang weliam.cl...@gmail.com wrote:
 Hi,
 What's the performance looks like if we put large cell in HDFS vs local file
 system? Random access to HDFS would be slow, right?


 William

 On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray jg...@facebook.com wrote:

 You can go way beyond the max region split / split size.  HBase will never
 split the region once it is a single row, even if beyond the split size.

 Also, if you're using large values, you should have region sizes much
 larger than the default.  It's common to run with 1-2GB regions in many
 cases.

 What you may have seen are recommendations that if your cell values are
 approaching the default block size on HDFS (64MB), you should consider
 putting the data directly into HDFS rather than HBase.

 JG

  -Original Message-
  From: William Kang [mailto:weliam.cl...@gmail.com]
  Sent: Tuesday, September 07, 2010 7:36 PM
  To: user@hbase.apache.org; apurt...@apache.org
  Subject: Re: Limits on HBase
 
  Hi,
  Thanks for your reply. How about the row size? I read that a row should
  not
  be larger than the hdfs file on region server which is 256M in default.
  Is
  it right? Many thanks.
 
 
  William
 
  On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell apurt...@apache.org
  wrote:
 
   In addition to what Jon said please be aware that if compression is
   specified in the table schema, it happens at the store file level --
   compression happens after write I/O, before read I/O, so if you
  transmit a
   100MB object that compresses to 30MB, the performance impact is that
  of
   100MB, not 30MB.
  
   I also try not to go above 50MB as largest cell size, for the same
  reason.
   I have tried storing objects larger than 100MB but this can cause out
  of
   memory issues on busy regionservers no matter the size of the heap.
  When/if
   HBase RPC can send large objects in smaller chunks, this will be less
  of an
   issue.
  
   Best regards,
  
      - Andy
  
   Why is this email five sentences or less?
   http://five.sentenc.es/
  
  
   --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote:
  
From: Jonathan Gray jg...@facebook.com
Subject: RE: Limits on HBase
To: user@hbase.apache.org user@hbase.apache.org
Date: Monday, September 6, 2010, 4:10 PM
I'm not sure what you mean by
optimized cell size or whether you're just asking about
practical limits?
   
HBase is generally used with cells in the range of tens of
bytes to hundreds of kilobytes.  However, I have used
it with cells that are several megabytes, up to about
50MB.  Up at that level, I have seen some weird
performance issues.
   
The most important thing is to be sure to tweak all of your
settings.  If you have 20MB cells, you need to be sure
to increase the flush size beyond 64MB and the split size
beyond 256MB.  You also need enough memory to support
all this large object allocation.
   
And of course, test test test.  That's the easiest way
to see if what you want to do will work :)
   
When you run into problems, e-mail the list.
   
As far as row size is concerned, the only issue is that a
row can never span multiple regions so a given row can only
be in one region and thus be hosted on one server at a
time.
   
JG
   
 -Original Message-
 From: William Kang [mailto:weliam.cl...@gmail.com]
 Sent: Monday, September 06, 2010 1:57 PM
 To: hbase-user
 Subject: Limits on HBase

 Hi folks,
 I know this question may have been asked many times,
but I am wondering
 if
 there is any update on the optimized cell size (in
megabytes) and row
 size
 (in megabytes)? Many thanks.


 William