performance regression after hbase restart

2011-01-20 Thread Tao Xie
hi, I know regions will be reassigned when hbase cluster restarts. My regionserver and my datanode sit on same physical node. So in my tests after I restart hbase cluster, performance number drops, I guess this is due to data locality problem. But in a further experiment, I increase the

impact of total region numbers?

2011-01-17 Thread Tao Xie
For example, I have total some data and I can tune hbase.hregion.max.filesize to increase/decrease total region number, rite? I want to know if the region number has performance impact to random read tests. I observed that in my ycsb test, with larger hfile size, I got better tput and smaller

Re: impact of total region numbers?

2011-01-17 Thread Tao Xie
to increase hbase.region.mstore.flush.size to keep the number of HFile generations smaller. Thanks, -- Tatsuya Kawano (Mr.) Tokyo, Japan On Jan 18, 2011, at 11:20 AM, Tao Xie xietao.mail...@gmail.com wrote: For example, I have total some data and I can tune hbase.hregion.max.filesize

Re: Will all HFiles managed by a regionserver kept open

2011-01-14 Thread Tao Xie
retrieving data from disk is the most dominant element, until you are fully cached in which case other factors inside the regionserver become dominant. at this point copying memory, gc, algorithmic complexity, etc become important. On Wed, Jan 12, 2011 at 10:54 PM, Tao Xie xietao.mail...@gmail.com

Will all HFiles managed by a regionserver kept open

2011-01-12 Thread Tao Xie
hi, I know generally regionserver manages HRegions and in the HDFS layer data in HRegion are stored as HFile format. I want to know whether HFiles are all open and things lke block index are all loaded first to improve lookup performance? If so, what will happen if exceeding memory limit? Thanks.

Re: Will all HFiles managed by a regionserver kept open

2011-01-12 Thread Tao Xie
includes loading up of the file index and metadata. In our experience, this overhead has been small. Its currently not accounted for in our general memory-counting. We should for sure add it. St.Ack On Wed, Jan 12, 2011 at 7:51 PM, Tao Xie xietao.mail...@gmail.com wrote: hi, I know generally

does hbase has row cache?

2010-12-15 Thread Tao Xie
I see there is a block cache percentage configuration in hbase-site.xml. I wonder if there is a row cache that stores k,v pairs. Thanks.

Re: NoServerForRegionException when intensive insertions

2010-12-12 Thread Tao Xie
, it can take a while for regions to re-online. There could be another issue in the way of the region re-onlining. Grepping around in the logs as per above should give a clue. St.Ack On Thu, Dec 9, 2010 at 10:00 PM, Tao Xie xietao.mail...@gmail.com wrote: hi, all I met this exception when I

NoServerForRegionException when intensive insertions

2010-12-09 Thread Tao Xie
hi, all I met this exception when I doing intensive insertions using YCSB. Anybody give me some clues on this? I use hbase 0.20.6. com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server -- nothing found, no 'location' returned,

Re: HBase random access in HDFS and block indices

2010-11-02 Thread Tao Xie
I read the code and my understanding is when a RS starts StoreFiles of each Region will be instantiated. Then HFile.reader.loadFileInfo() will read the the index and file info. So each StoreFile is opened only once and block index are cached. The cache miss are for blocks. I mean for random Get

Re: Nodes up, Master sees 0 ReigonServers

2010-10-26 Thread Tao Xie
I once have same problem. Finally I find RS are not started. 2010/10/26 Bradford Stephens bradfordsteph...@gmail.com Hey datamigos, I'm having trouble getting a finicky .20.6 cluster to behave. The Master, Zookeeper, and ReigonServers all seem to be happy -- except the Master doesn't see

Re: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

2010-10-20 Thread Tao Xie
I also have similar result with YCSB. I disabled block cache (set to 0) and got better throughput than default. In my case my dataset is 160M records and block cache hit ratio is very low, so frequent cache eviction causes long time pause. 2010/10/21 Ryan Rawson ryano...@gmail.com Our own

hmaster reports 0 region servers

2010-10-14 Thread Tao Xie
I applied the patch for HBASE-2939. (The patch is for 0.89 but my code is 0.20.6, I checked the patch found it only changed one connection thread at client side to a pool strategy.) But when I rebuild the source and start hbase cluster. The master cannot recognize regionservers though they are

Question regarding data location in hdfs after hbase restarts

2010-10-11 Thread Tao Xie
hi, all I set hdfs replica=1 when running hbase. And DN and RS co-exists on each slave node. So the data in the regions managed by RS will be stored on its local data node, rite? But when I restart hbase and hbase client does gets on RS, datanode will read data from remote data nodes. Does that

a zookeeper question

2010-09-28 Thread Tao Xie
Maybe a stupid question. I have set export HBASE_MANAGES_ZK=true and provide one ZK in hbase-site.xml. In my example, I only set the server sr114 as zk. But I still find zookeeper will check other quorum servers. I wonder where the server lists it reads. Confused about this. Anybody can give me a

Re: a zookeeper question

2010-09-28 Thread Tao Xie
Resolved. A stupid error I made. Sorry for this. 2010/9/28 Tao Xie xietao.mail...@gmail.com Maybe a stupid question. I have set export HBASE_MANAGES_ZK=true and provide one ZK in hbase-site.xml. In my example, I only set the server sr114 as zk. But I still find zookeeper will check other

anybody running ycsb?

2010-09-25 Thread Tao Xie
I want to reproduce the results in the ycsb paper. I run hbase 0.20.6 and hadoop 0.20.2. My cluster is like this: 1 Node as HMaster + ZK 6 Nodes as DN, RS 1 Node as Hbase client. I think this environment is something like the one used by the paper. When I run tests like workloadb with 100

block cache

2010-09-19 Thread Tao Xie
Now my scenario is running ycsb doing heavy read. I compared the results of setting hfile.block.cache.size to 0.2 with 0. I found with the factor 0 the hbase metric 'get_avg_time' is even smaller. Maybe I should turn off block cache in such scenario. I wonder if there are performance tests show

Re: block cache

2010-09-19 Thread Tao Xie
to take as long as 500 ms. I will attach a snippet of that if necessary. Thanks. 2010/9/19 Ryan Rawson ryano...@gmail.com What does your GC situation look like? On Sun, Sep 19, 2010 at 1:05 AM, Tao Xie xietao.mail...@gmail.com wrote: Now my scenario is running ycsb doing heavy read. I

Re: block cache

2010-09-19 Thread Tao Xie
Here is the gc log: http://pastebin.com/1bGZvMri 2010/9/19 Ryan Rawson ryano...@gmail.com I'd love to see a GC log, and yes it can be possible for ParNew to take a long long time. Thanks, -ryan On Sun, Sep 19, 2010 at 1:20 AM, Tao Xie xietao.mail...@gmail.com wrote: At first when I

how about zookeeper overhead?

2010-09-13 Thread Tao Xie
I see the following recommendation in http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements It is recommended to run a ZooKeeper quorum of 3, 5 or 7 machines, and give each ZooKeeper server around 1GB of RAM, and if possible, its own dedicated disk. For very heavily loaded

ycsb test on hbase

2010-09-09 Thread Tao Xie
hi, all I use YCSB to measure the insert/read latency of hbase. I found there will be 0 records inserted in up to 10 seconds during the insertion procedure. See the following result at 1514 second. I want to know why this occurs. Is this due to compaction? And I also want to know why the ops/sec

Re: ycsb test on hbase

2010-09-09 Thread Tao Xie
probably be smoother, but do you really have a use case that requires it or just poking? J-D On Thu, Sep 9, 2010 at 7:32 PM, Tao Xie xietao.mail...@gmail.com wrote: hi, all I use YCSB to measure the insert/read latency of hbase. I found there will be 0 records inserted in up to 10 seconds

Re: question about RegionManager

2010-09-07 Thread Tao Xie
change what is in HDFS. There are some bugs in HDFS in 0.20 which can create this out-of-balance scenario. If you use CDH3b2 you should have a few patches which help to rectify the situation, in particular HDFS-611. Thanks -Todd JG -Original Message- From: Tao Xie

Re: question about RegionManager

2010-09-06 Thread Tao Xie
I have a look at the following method in 0.89. Is the the following line correct ? nRegions *= e.getValue().size(); private int regionsToGiveOtherServers(final int numUnassignedRegions, final HServerLoad thisServersLoad) { SortedMapHServerLoad, SetString lightServers = new

Re: question about RegionManager

2010-09-06 Thread Tao Xie
280G 14G 252G 6% /mnt/DP_disk1 10.1.0.126: /dev/sdc1 280G 14G 252G 6% /mnt/DP_disk2 10.1.0.126: /dev/sdd1 280G 13G 253G 5% /mnt/DP_disk3 2010/9/7 Tao Xie xietao.mail...@gmail.com I have a look at the following method in 0.89