re: Hmaster and HRegionServer disappearance reason to ask

2012-07-05 Thread Gaojinchao
Did you check http://hbase.apache.org/book.html#perf.os.swap;? -邮件原件- 发件人: Pablo Musa [mailto:pa...@psafe.com] 发送时间: 2012年7月6日 5:38 收件人: user@hbase.apache.org 主题: RE: Hmaster and HRegionServer disappearance reason to ask I am having the same problem. I tried N different things but I

re: HBase 0.94.0 is available for download

2012-05-16 Thread Gaojinchao
Great job! -邮件原件- 发件人: lars hofhansl [mailto:lhofha...@yahoo.com] 发送时间: 2012年5月16日 13:08 收件人: hbase-user 主题: ANN: HBase 0.94.0 is available for download The HBase Team is pleased to announce the release of HBase 0.94.0. Download it from your favorite Apache mirror [1]. HBase 0.94.0 is

re: gc pause killing regionserver

2012-03-08 Thread Gaojinchao
We encountered the same thing. we set swappiness priority to 0. But swap is still working. So we disable swap. -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans 发送时间: 2012年3月9日 6:29 收件人: user@hbase.apache.org 主题: Re: gc pause killing regionserver When

Re: HBase coprocessors blog posted

2012-02-01 Thread Gaojinchao
Great job! We will use the feature! -邮件原件- 发件人: Mingjie Lai [mailto:m...@apache.org] 发送时间: 2012年2月1日 16:26 收件人: user@hbase.apache.org; d...@hbase.apache.org 主题: HBase coprocessors blog posted Hi hbasers. A hbase blog regarding coprocessors has been posted to apache blog site. Here is

Re: HBase 0.92.0 is available for download

2012-01-23 Thread Gaojinchao
Good job! Good luck! :) 0.92.0 is released at Chinese new year(Year of the Dragon)! May this new year bring more success to our Community! -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack 发送时间: 2012年1月24日 7:57 收件人: Hbase-User; gene...@hadoop.apache.org 主题: ANN: HBase

A question about HBase accesses the HDFS

2012-01-08 Thread Gaojinchao
In DN logs, There are a lot of 48 millis timeout exceptions when some scans finished, But Region server have no exceptions. I analyzed the flow of HBase and Hadoop. I found we are using API readBuffer(byte buf[], int off, int len), It means we need read the whole block data, if we stop

Re: Read speed down after long running

2011-12-28 Thread Gaojinchao
I think you need check the threaddump(Client and RS) and resources(memory, IO and network) of your cluster. -邮件原件- 发件人: Lars H [mailto:lhofha...@yahoo.com] 发送时间: 2011年12月28日 0:32 收件人: user@hbase.apache.org 抄送: hbase-u...@hadoop.apache.org 主题: Re: Read speed down after long running When

Re: ANN: HBase 0.90.5RC0 available for download

2011-12-22 Thread Gaojinchao
We are using this version, It is fine . -邮件原件- 发件人: Ramkrishna S Vasudevan [mailto:ramkrishna.vasude...@huawei.com] 发送时间: 2011年12月23日 10:01 收件人: user@hbase.apache.org; 'lars hofhansl' 主题: RE: ANN: HBase 0.90.5RC0 available for download +1.. -Original Message- From:

FeedbackRe: Suspected memory leak

2011-12-03 Thread Gaojinchao
Thank you for your help. This issue appears to be a configuration problem: 1. HBase client uses NIO(socket) API that uses the direct memory. 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have full gc, all direct memory can't reclaim. Unfortunately, using GC

Suspected memory leak

2011-11-29 Thread Gaojinchao
In HBaseClient proceess, I found heap has been increased. I used command 'cat smaps' to get the heap size. It seems in case when the threads pool in HTable has released the no using thread, if you use putlist api to put data again, the memory is increased. Who has experience in this case? Below

Re: A question about dfs.replication.min setting.

2011-11-26 Thread Gaojinchao
are available to satisfy dfs.replication.min - and thereby cause things to timeout/fail. (Think the problem is to do with use of sync, but am not sure yet -- general writes work properly with that config, by retrying enough times to get locations). On 26-Nov-2011, at 12:14 PM, Gaojinchao wrote

Re: snappy compression

2011-11-25 Thread Gaojinchao
in applying the SNAPPY patch @ https://issues.apache.org/jira/browse/HBASE-3691 to 0.90.3 ? 2011/11/25 Gaojinchao gaojinc...@huawei.com: You can search maillist about topic Snappy for 0.90.4. -邮件原件- 发件人: saurabh@gmail.com [mailto:saurabh@gmail.com] 代表 Sam Seigal 发送时间: 2011年11

A question about dfs.replication.min setting.

2011-11-25 Thread Gaojinchao
When HBase use HDFS system file. How do we set dfs.replication.min? who can share relevant experience? Currently on our environment, We use the default value: dfs.replication :3 dfs.replication.min: 1 I found some block lost when the IO is very busy.

re: How to set region parameter?

2011-11-09 Thread Gaojinchao
You can search topic region size/count per regionserver in mailist. http://search-hadoop.com/ -邮件原件- 发件人: 吕鹏 [mailto:lvpengd...@gmail.com] 发送时间: 2011年11月10日 10:26 收件人: user@hbase.apache.org 主题: Re: How to set region parameter? Thanks a lot for you help. But for question 2 and 3, the

Hmaster can't start for the latest trunk version

2011-10-30 Thread Gaojinchao
The latest trunk version. Throw this logs: 2011-10-31 00:09:09,549 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.FileNotFoundException: File does not exist: hdfs://C3S31:9000/hbase at

Re: A requirement to change time of the Hbase cluster.

2011-10-25 Thread Gaojinchao
Perhaps we should. add a choice of supporting incremental meta-data. All the timestamp is incremental, These data do not rely on the system time. -邮件原件- 发件人: Gaojinchao [mailto:gaojinc...@huawei.com] 发送时间: 2011年10月25日 12:33 收件人: user@hbase.apache.org 主题: A requirement to change time

Re: A requirement to change time of the Hbase cluster.

2011-10-25 Thread Gaojinchao
servers so that you have an accurate clock for your network... You shouldn't have to restart your cluster unless your clocks are all way off... Sent from a remote device. Please excuse any typos... Mike Segel On Oct 25, 2011, at 5:14 AM, Gaojinchao gaojinc...@huawei.com wrote: Perhaps we should

A requirement to change time of the Hbase cluster.

2011-10-24 Thread Gaojinchao
Hi all, We have a requirement to change time of the Hbase cluster. The scene is the cluster changes the ntp server(my customer may do this), We are ready to do this: 1. stop the cluster 2. change the ntp server 3. start the cluster. But the cluster may move to one ntp server which system is

A question about system time.

2011-09-20 Thread Gaojinchao
1. When Hbase run a long time, We need change the system time that is two hours before current time. How can I do ? 2. When one region server loses connection to the ntp server and need wait a long time to recover . If the system time is modified to before current time. Who has experience in

Re: Calculating the optimal number of regions (WAS - Re: big compaction queue size)

2011-09-08 Thread Gaojinchao
was fixed (it should be by family), but this would have an effect too. Else   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3)) Same comments. J-D 2011/9/6 Gaojinchao gaojinc...@huawei.com: Hi J-D Should we can give a formula about active regions per node and up to book ?   I

RE: Site and Book updated

2011-09-07 Thread Gaojinchao
Everything will be fine when you get used to it. My company logo changed as well. :) -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack 发送时间: 2011年9月8日 7:10 收件人: user@hbase.apache.org 主题: Re: Site and Book updated the logo don't look too bad, does it? On Wed, Sep 7,

Re: big compaction queue size

2011-09-06 Thread Gaojinchao
Do you check your cluster has flushed many little hfiles? How much memory is used ? It seems that region server has too many active regions You can use this command check it: grep Finished memstore flush *regionserver* -c grep global heap pressure *regionserver* -c -邮件原件- 发件人: Xu-Feng

Re: big compaction queue size

2011-09-06 Thread Gaojinchao
Hi J-D Should we can give a formula about active regions per node and up to book ? I think many people encounter the same problem. I think the formula is: If( (Hlognumber*hdfsblock) (HBASE_HEAPSIZE *memstore.lowerLimit) ) Active Regions = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size

Hbase can't balance.

2011-09-04 Thread Gaojinchao
Version: 0.90.4 Cluster : 40 boxes As I saw below logs. It said that balance couldn't work because of a dead RS. I dug deeply and found two issues: 1. shutdownhandler didn't clear numProcessing deal with some exceptions. It seems whatever exceptions we should clear the flag or close

Re: TestMasterFailover fails occasionally

2011-08-18 Thread Gaojinchao
Want to make an issue Gaojinchao? St.Ack On Wed, Aug 17, 2011 at 1:21 AM, Gaojinchao gaojinc...@huawei.com wrote: It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever.  void processFailover() throws

TestMasterFailover fails occasionally

2011-08-17 Thread Gaojinchao
It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi =

re: Root table couldn't be opened

2011-08-16 Thread Gaojinchao
, Aug 15, 2011 at 9:23 PM, Gaojinchao gaojinc...@huawei.com wrote: Why did the master replay its logs if it did not exit? Sorry. Which logs? Zk is expired because of gc. But region server isn't shutdown. Right, but it probably went down soon after it came out of GC, right? (I like how you

Re: Root table couldn't be opened

2011-08-15 Thread Gaojinchao
committed today. We run 0.90.4 patched with hbase-4168. Please try the patch. On Aug 10, 2011, at 7:05 PM, Gaojinchao gaojinc...@huawei.com wrote: In my cluster(version 0.90.3) , The root table couldn't be opened when one region server crashed because of gc. The logs show: // Master

re: Sequential column reading in the big row. Is it possible?

2011-08-15 Thread Gaojinchao
You can reference api of setBatch. It can set the maximum number of values to return for each call to next() -邮件原件- 发件人: Andrey Gomzin [mailto:gomzind...@gmail.com] 发送时间: 2011年8月15日 16:16 收件人: user@hbase.apache.org 主题: Sequential column reading in the big row. Is it possible? Hi! I use

Re: A question about timestamp

2011-08-15 Thread Gaojinchao
@hbase.apache.org 主题: Re: A question about timestamp On Sun, Aug 14, 2011 at 6:50 PM, Gaojinchao gaojinc...@huawei.com wrote: Sometimes we may change time zone for our produce, Hbase will can't work well. How you mean? HBase default timestamping uses this function: http://download.oracle.com/javase/1,5.0/docs

re: Root table couldn't be opened

2011-08-15 Thread Gaojinchao
:05 PM, Gaojinchao gaojinc...@huawei.com wrote: In my cluster(version 0.90.3) , The root table couldn't be opened when one region server crashed because of gc. The logs show: // Master assigned the root table to 82 2011-07-28 21:34:34,710 DEBUG

A question about timestamp

2011-08-14 Thread Gaojinchao
Sometimes we may change time zone for our produce, Hbase will can't work well. can we add a switch that uses UTC time?

Root table couldn't be opened

2011-08-10 Thread Gaojinchao
In my cluster(version 0.90.3) , The root table couldn't be opened when one region server crashed because of gc. The logs show: // Master assigned the root table to 82 2011-07-28 21:34:34,710 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region -ROOT-,,0.70236052 on

Re: Design/Schema questions

2011-07-26 Thread Gaojinchao
Hi stack: I have a similar case. If there is a lot of columns(about several thousand ) in one columnfamily. Does it effect the throughput(we use api put/scan ). Do you has this experience ? Thanks. -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack 发送时间:

Re: What will cause Hbase losing data?

2011-07-21 Thread Gaojinchao
In my experience, losing data is when the region server crashed. The Hlog can't save to hdfs. -邮件原件- 发件人: seven garfee [mailto:garfee.se...@gmail.com] 发送时间: 2011年7月21日 15:10 收件人: user@hbase.apache.org 主题: What will cause Hbase losing data? I know that ,if not using hadoop-0.20-append

Hmaster crashed

2011-07-18 Thread Gaojinchao
I verified the issue HBASE-4064 and created about 100K regions . The master couldn't startup. Logs showed zk's session was exception. Who can give me a hint? Thanks. Logs: 2011-07-18 16:11:15,432 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6-0x2313bf64d1d Retrieved 93

RE: Difficulty in use Hbase fully distributed mode

2011-07-18 Thread Gaojinchao
You need use ntp. You can reference: http://hbase.apache.org/book/os.html#ntp Hi Gan, Thanks for your reply.. Now Hmaster and one of my region server working properly but another one region server(slave) not working . i am using the three node cluster master: namenode ,

Hlog appending takes more time

2011-06-14 Thread Gaojinchao
In my performance cluster(hbase vesion 0.90.3). There are a lot of warn logs, below as: 2011-05-21 11:35:10,325 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 43 on 20020 took 11569 ms appending an edit to hlog; editcount=23862, len~=4.8k 2011-05-21 11:50:23,610 WARN

Re: A question about LeaseExpiredException

2011-06-12 Thread Gaojinchao
: A question about LeaseExpiredException Usually that happens when the region server is considered dead and the master moves the logs away, there should be clues in the master log and probably more relevant info in the region server log either after or before what you pasted. J-D 2011/6/8 Gaojinchao

Re: a question about log level

2011-06-09 Thread Gaojinchao
On Wed, Jun 8, 2011 at 9:02 PM, Gaojinchao gaojinc...@huawei.com wrote: How should we set the log level for production ? Do anyone have some experience? I want to use information.

a question about log level

2011-06-08 Thread Gaojinchao
How should we set the log level for production ? Do anyone have some experience? I want to use information.

re: about disposing Hbase process

2011-05-31 Thread Gaojinchao
)? 3. Is there anyone who has used below configure for product? -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack 发送时间: 2011年6月1日 4:13 收件人: user@hbase.apache.org 主题: Re: about disposing Hbase process Sorry Gao, what is your question? St.Ack 2011/5/31 Gaojinchao gaojinc

about unit test coverage

2011-05-27 Thread Gaojinchao
I use clover plugin to get hbase unit test coverage but the report result is not correct. my hbase version is 0.90.3 and in my pom file,I define Clover plugin like this: ... build pluginManagement plugins plugin groupIdcom.atlassian.maven.plugins/groupId

Re: About RegionServer checkin

2011-05-26 Thread Gaojinchao
Sorry, I hate my poor English . I give a description again: Master add regionserver to onlineServers in two case: 1. Add a machine to the cluster, It includes cluster startup or add a new machine. Master can get region server information from api regionServerStartup and add to onlineServers

Re: About RegionServer checkin

2011-05-26 Thread Gaojinchao
,150900,1305944335445.70541f0abda274708e12570c52aa7f1d. Can you paste more log before the above line ? Master log around 2011-05-23 10:21:37,400 would help too. On Thu, May 26, 2011 at 2:35 AM, Gaojinchao gaojinc...@huawei.com wrote: Sorry, I hate my poor English . I give a description again: Master add regionserver

Re: About RegionServer checkin

2011-05-26 Thread Gaojinchao
, regionCount=1344, userLoad=true For the above log, ServerManager would execute this in recordNewServer(): LOG.info(Registering server= + serverName); this.onlineServers.put(serverName, hsl); It was called by regionServerReport(). 2011/5/26 Gaojinchao gaojinc...@huawei.com Thanks

Re: About RegionServer checkin

2011-05-25 Thread Gaojinchao
PM, Gaojinchao gaojinc...@huawei.com wrote: If Region server checkin after call this.serverManager.waitForRegionServers(). It seems that regionServerReport shouldn't add Region server to onlineServers. Otherwise The region may be opened again. I don't follow? Usually what happens

Re: a question storefileIndexSize

2011-05-25 Thread Gaojinchao
storefileIndexSize 2011/5/24 Gaojinchao gaojinc...@huawei.com: Stack, Thanks for your reply. block size is default. My Key length is 26 bytes and value is 300~400 bytes. Is it big keys and small values ? Looks like you have 'small' keys. It looks like the index is about 1MB per storefile

Re: about TestRollingRestart

2011-05-25 Thread Gaojinchao
is a nice ugly test!) St.Ack On Tue, May 24, 2011 at 2:31 AM, Gaojinchao gaojinc...@huawei.com wrote: hbase.master.assignment.timeoutmonitor.timeout should be set higher in TestRollingRestart case. It is killed sometimes when we run all case. This is my analysis,. Is there anyone who encounter

a question storefileIndexSize

2011-05-24 Thread Gaojinchao
My observation is that storefileIndexSize is large. Is there a way to reduce it ? Region server metric: requests=11447, regions=10394, stores=10394, storefiles=3103, storefileIndexSize=3717, memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0, usedHeap=6916, maxHeap=8165,

Re: a question storefileIndexSize

2011-05-24 Thread Gaojinchao
storefileIndexSize What Ted says or you could change the hfile block size; currently its 64k. Make it bigger? Do you have big keys and small values? If so, can you make do with smaller keys? That would help with index size too. St.Ack On Tue, May 24, 2011 at 5:29 AM, Gaojinchao gaojinc...@huawei.com

About RegionServer checkin

2011-05-24 Thread Gaojinchao
If Region server checkin after call this.serverManager.waitForRegionServers(). It seems that regionServerReport shouldn't add Region server to onlineServers. Otherwise The region may be opened again. In my cluster: 2011-05-23 10:56:30,726 INFO org.apache.hadoop.hbase.master.HMaster: Master

Re: Hmaster has some warn logs.

2011-05-13 Thread Gaojinchao
-邮件原件- 发件人: Gaojinchao [mailto:gaojinc...@huawei.com] 发送时间: 2011年5月13日 8:33 收件人: user@hbase.apache.org 主题: re: Hmaster has some warn logs. It can't reproduce, I need dig from logs and code. -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans 发送时间: 2011年5

Re: A question about client

2011-05-10 Thread Gaojinchao
On Mon, May 9, 2011 at 5:22 AM, Gaojinchao gaojinc...@huawei.com wrote:    I used ycsb to put data and threw exception.    Who can give me some suggestion?   Hbase Code:      // Cut the cache so that we only get the part that could contain      // regions that match our key

Re: Hmaster is OutOfMemory

2011-05-10 Thread Gaojinchao
If the cluster has 100K regions , restart cluster, Master will need a lot of memory. -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack 发送时间: 2011年5月10日 13:58 收件人: user@hbase.apache.org 主题: Re: Hmaster is OutOfMemory 2011/5/9 Gaojinchao gaojinc...@huawei.com: Hbase

A question about client

2011-05-09 Thread Gaojinchao
I used ycsb to put data and threw exception. Who can give me some suggestion? Hbase Code: // Cut the cache so that we only get the part that could contain // regions that match our key SoftValueSortedMapbyte[], HRegionLocation matchingRegions =

Re: Hmaster is OutOfMemory

2011-05-09 Thread Gaojinchao
-D 2011/5/8 Gaojinchao gaojinc...@huawei.com: Hbase version 0.90.2: Hmaster has 8G memory, It seems like not enough ? why it needs so much memory?(50K region) Other issue. Log is error: see http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9 should be see http://wiki.apache.org/hadoop

A question about unit test

2011-05-05 Thread Gaojinchao
A test case was timeout. Who has the experience for this. Thanks message priority=info![CDATA[[exec] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.101 sec]]/message message priority=info![CDATA[[exec] Running

A question for release 0.90.3

2011-05-04 Thread Gaojinchao
Our release is meant for production deploy next week. I have merged some issue to 0.90.2 and verified it. Can the version 0.90.3 release this week? if it can, I will use 0.90.3 and verify it next week.

Re: A question about Metrics

2011-05-02 Thread Gaojinchao
see same over here. Please file an issue. If you have a patch too, that'd be great. St.Ack On Thu, Apr 28, 2011 at 8:55 PM, Gaojinchao gaojinc...@huawei.com wrote: request always is zero in WebUI for region server Metrics request=0.0, regions=36, stores=36, storefiles=148, storefileIndexSize

About parameter

2011-04-28 Thread Gaojinchao
In my test cluster. It can't assign Meta table.(one Hmaster and two region server). I find assigned meta region timed out and reopened. I think we should set default value (hbase.master.assignment.timeoutmonitor.timeout) bigger. 2011-04-14 11:48:19,240 INFO

A question about create table with regions in hbase version 0.90.3

2011-04-25 Thread Gaojinchao
I merge issue HBASE-3744 to 0.90.2 and test it. Find that Creating table fails when region server shutdown Does it need try to one more times for putting Meta data? public static void addRegionToMeta(CatalogTracker catalogTracker, HRegionInfo regionInfo) throws IOException { Put put

Re: A question about create table with regions in hbase version 0.90.3

2011-04-25 Thread Gaojinchao
shutting down ? Thanks 2011/4/25 Gaojinchao gaojinc...@huawei.com I merge issue HBASE-3744 to 0.90.2 and test it. Find that Creating table fails when region server shutdown Does it need try to one more times for putting Meta data? public static void addRegionToMeta(CatalogTracker catalogTracker

Re: A question about create table with regions in hbase version 0.90.3

2011-04-25 Thread Gaojinchao
...@gmail.com] 发送时间: 2011年4月25日 21:25 收件人: user@hbase.apache.org 主题: Re: A question about create table with regions in hbase version 0.90.3 Can you give more detail as to how many region servers were shutting down ? Thanks 2011/4/25 Gaojinchao gaojinc...@huawei.com I merge issue HBASE-3744 to 0.90.2

Re: Creating table with regions failed when zk crashed.

2011-04-23 Thread Gaojinchao
here? As much as I enjoy reading logs, I also enjoy short descriptions of the context of what I'm looking at. J-D On Thu, Apr 21, 2011 at 8:36 PM, Gaojinchao gaojinc...@huawei.com wrote: Is there any issue about this ? 2011-04-21 14:48:24,676 INFO org.apache.hadoop.hbase.regionserver.HRegion

Re: hbase 0.90.2 - incredibly slow response

2011-04-22 Thread Gaojinchao
It seem likes my case. My test data: Puts:75090 ops/s, average latency:2.7 ms. scan:494 ops/s ,average latency:1356 ms. (HMaster 1 name node, 3 zoo keeper, 7 Region server/Data node) about my test, some schema may be slower in version 0.90.2. How do you design your schema? If there is any

Creating table with regions failed when zk crashed.

2011-04-21 Thread Gaojinchao
Is there any issue about this ? 2011-04-21 14:48:24,676 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed wfan_1,3238613814230765,1303367938566.63a83bdc9b55ec115cbc9b4bbe318214. 2011-04-21 14:48:24,676 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 4 on

re: A question about Hmaster startup.

2011-04-19 Thread Gaojinchao
, 2011 at 9:26 PM, Gaojinchao gaojinc...@huawei.com wrote: Sorry. My queston is: If HMaster is started after NN without starting DN in Hbase 090.2 then HMaster is not able to start due to AlreadyCreatedException for /hbase/hbase.version. In hbase version 0.90.1, It will wait for data node

Re: A question about Hmaster startup.

2011-04-19 Thread Gaojinchao
) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:91) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:347) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) -邮件原件- 发件人: Gaojinchao [mailto:gaojinc...@huawei.com

A question about Hmaster restarted.

2011-04-18 Thread Gaojinchao
I created table with some regions. Hmaster had crashed because of one region server crashed. I dig the code. It may be a bug. Startup or create table use this code. In startup case need to shutdown itself. But ,create table need to reassign. long maxWaitTime = System.currentTimeMillis() +

A question about Hmaster startup.

2011-04-18 Thread Gaojinchao
Why delete this code ? // Are there any data nodes up yet? // Currently the safe mode check falls through if the namenode is up but no // datanodes have reported in yet. try { while (dfs.getDataNodeStats().length == 0) { LOG.info(Waiting for dfs to come up...);

Re: A question about Hmaster startup.

2011-04-18 Thread Gaojinchao
as part of HBASE-1816 Master rewrite, and in the comments I can read: + Move fs methods out of HMaster to FSUtils. And if you you look at FSUtils, you'll see how it's done now. J-D On Mon, Apr 18, 2011 at 8:48 PM, Gaojinchao gaojinc...@huawei.com wrote: Why delete this code ? // Are there any data

Re: a lots of error about Region has been OPEN for too long

2011-04-16 Thread Gaojinchao
is sitting in the region server's queue of regions to be opened. Very often, when there's a lot of regions to open (and worse if the RS has to replay recovered edits), some regions in that state will timeout. This needs more thinking. J-D 2011/4/13 Gaojinchao gaojinc...@huawei.com: In hbase

Re: a lots of error about Region has been OPEN for too long

2011-04-15 Thread Gaojinchao
Gaojinchao gaojinc...@huawei.com: In hbase version 0.90.1 . Is there any experience ? Hmaster Logs : 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything 2011-04-08 16:33

Re: a lots of error about Region has been OPEN for too long

2011-04-15 Thread Gaojinchao
been OPEN for too long Yeah it's not clear from the logs why it did that, and looking through my logs I can't see any occurrence. Can you reproduce it easily? J-D On Fri, Apr 15, 2011 at 12:46 AM, Gaojinchao gaojinc...@huawei.com wrote: Thanks for your reply. Hbase version : 0.90.1 I find

a lots of error about Region has been OPEN for too long

2011-04-13 Thread Gaojinchao
In hbase version 0.90.1 . Is there any experience ? Hmaster Logs : 2011-04-08 16:33:09,384 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything 2011-04-08 16:33:09,384 ERROR

Re: some regions can't be assigned

2011-04-06 Thread Gaojinchao
it's VERY uncommon to run into that sort of situation, but you are very welcomed to try to root them out and submit the relevant jiras. J-D 2011/3/18 Gaojinchao gaojinc...@huawei.com: My cluster is 1'master and 2 region servers(RS1, RS2) In scenario as follow: 1、 start up the cluster

Re: Master can't exit when open port failed

2011-04-06 Thread Gaojinchao
and startServiceThreads should be throw exception. About port use, Can Hbase use the feature port reuse ? -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack 发送时间: 2011年4月7日 11:48 收件人: user@hbase.apache.org 抄送: Gaojinchao; Chenjian 主题: Re: Master can't exit when open port failed

A question about region hot spot

2011-04-01 Thread Gaojinchao
In hbase version 0.20.6, If contiguous regions, do not assign adjacent regions in same region server. So it can break daughters of splits in same region server and avoid hot spot. The performance can improve. In version 0.90.1, daughter is opened in region server that his parent is opened. In

RE: A lot of data is lost when name node crashed

2011-03-31 Thread Gaojinchao
Thanks, please submit a patch and I can try to test it. Jira is : https://issues.apache.org/jira/browse/HBASE-3722 -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans 发送时间: 2011年4月1日 1:20 收件人: Gaojinchao; user@hbase.apache.org 主题: Re: A lot of data is lost

A lot of data is lost when name node crashed

2011-03-29 Thread Gaojinchao
I do some performance test for hbase version 0.90.1 when the name node crashed, I find some data lost. I'm not sure exactly what arose it. It seems like split logs failed. I think the master should shutdown itself when HDFS crashed. The logs is : 2011-03-22 13:21:55,056 WARN

Re: Hmaster had crashed as disabling table

2011-03-28 Thread Gaojinchao
{ LOG.info(Master startup proceeding: master failover); this.assignmentManager.processFailover(); -- when master restart or Failover, it will refresh user regions. } -邮件原件- 发件人: Gaojinchao [mailto:gaojinc...@huawei.com] 发送时间: 2011年3月28日 11:41 收件人: user@hbase.apache.org 主题

Hmaster had crashed as disabling table

2011-03-27 Thread Gaojinchao
Operation step: 1, startup cluster with HA master 2, the active master crashed while it is creating table with region 3, backup master become active. 4, I want to drop the table 5, the active master crashed I can't drop the table whatever I do ? The log as: 2011-03-28 10:51:58,347 INFO

it does't use the parameter about hbase.regionserver.flushlogentries

2011-03-22 Thread Gaojinchao
I want to improve the writing performance through adjusting hbase.regionserver.flushlogentries. But it seems no changing. In version 0.90.1, why does remove this parameter from file hbase-default.xml?

some regions can't be assigned

2011-03-18 Thread Gaojinchao
My cluster is 1’master and 2 region servers(RS1, RS2) In scenario as follow: 1、 start up the cluster and create table with some regions 2、 kill RS1 and RS2 3、 wait 30 minutes 4、 start up RS1 5、 wait about 3 hours 6、 start up RS2 I find some regions can’t be assigned The master print

It is a bug in function balanceCluster

2011-03-04 Thread Gaojinchao
In balanceCluster function , It should be leastloaded= + serversByLoad.firstKey ().getLoad().getNumberOfRegions()) if(serversByLoad.lastKey().getLoad().getNumberOfRegions() = max serversByLoad.firstKey().getLoad().getNumberOfRegions() = min) { // Skipped because no server outside

all Regions is lost when the cluster restarted

2011-03-03 Thread Gaojinchao
Hbase version : 0.90.1 Hdfs verison : 0.20.1( that don't have append feature) How can I fix this lost regions ? I use hbck checking Log as: Summary: -ROOT- is okay. Number of regions: 1 Deployed on: c3s7.site:60020 .META. is okay. Number of regions: 1 Deployed on:

about hbase load-balancing

2011-03-02 Thread Gaojinchao
hbase version: 0.90.0, 0.90.1 My cluster is 5 RS and 1 hmaster , I create table with multiple regions(50 regions per RS) the parallel is improved, but when the one RS is restarted. it may be a large number of adjacent regions is reassigned to the restarted one. it reduce the parallelism. eg:

How to set up multiple HBase Masters for higher availability in hbase version 0.90 ?

2011-02-16 Thread Gaojinchao
Hi In my cluster need use higher availability future, but the slave hmaster always print logs as: 2011-02-16 19:23:21,158 INFO org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized 2011-02-16 19:23:21,158 DEBUG org.apache.hadoop.hbase.master.HMaster: HMaster started in backup