[jira] [Created] (HBASE-5797) Add the active master address on the web of regionserver

2012-04-15 Thread chunhui shen (Created) (JIRA)
Add the active master address on the web of regionserver


 Key: HBASE-5797
 URL: https://issues.apache.org/jira/browse/HBASE-5797
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen
Assignee: chunhui shen


Add the active master address on the web of regionserver, so we can easily know 
the master server name and jump to master web from regionserver web.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5689) Skip RecoveredEdits may cause data loss

2012-03-30 Thread chunhui shen (Created) (JIRA)
Skip RecoveredEdits may cause data loss
---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen


Let's see the following scenario:
1.Region is on the server A
2.put KV(r1-v1) to the region
3.move region from server A to server B
4.put KV(r2-v2) to the region
5.move region from server B to server A
6.put KV(r3-v3) to the region
7.kill -9 server B and start it
8.kill -9 server A and start it 
9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
KV(r3-v3) is lost.

Let's analyse the upper scenario from the code:
1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same hlog 
file on server A.
2.when we split server B's hlog file in the process of ServerShutdownHandler, 
we create one RecoveredEdits file f1 for the region.
2.when we split server A's hlog file in the process of ServerShutdownHandler, 
we create another RecoveredEdits file f2 for the region.
3.however, RecoveredEdits file f2 will be skiped when initializing region
HRegion#replayRecoveredEditsIfAny
{code}
 for (Path edits: files) {
  if (edits == null || !this.fs.exists(edits)) {
LOG.warn(Null or non-existent edits file:  + edits);
continue;
  }
  if (isZeroLengthThenDelete(this.fs, edits)) continue;

  if (checkSafeToSkip) {
Path higher = files.higher(edits);
long maxSeqId = Long.MAX_VALUE;
if (higher != null) {
  // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
  String fileName = higher.getName();
  maxSeqId = Math.abs(Long.parseLong(fileName));
}
if (maxSeqId = minSeqId) {
  String msg = Maximum possible sequenceid for this log is  + maxSeqId
  + , skipped the whole file, path= + edits;
  LOG.debug(msg);
  continue;
} else {
  checkSafeToSkip = false;
}
  }
{code}

 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5672) TestLruBlockCache#testBackgroundEvictionThread fails occasionally

2012-03-28 Thread chunhui shen (Created) (JIRA)
TestLruBlockCache#testBackgroundEvictionThread fails occasionally
-

 Key: HBASE-5672
 URL: https://issues.apache.org/jira/browse/HBASE-5672
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen


We find TestLruBlockCache#testBackgroundEvictionThread fails occasionally.

I think it's a problem of the test case.
Because runEviction() only do evictionThread.evict():
{code}
public void evict() {
  synchronized(this) {
this.notify(); // FindBugs NN_NAKED_NOTIFY
  }
}
{code}
However when we call evictionThread.evict(), the evictionThread may haven't 
been in run() in the TestLruBlockCache#testBackgroundEvictionThread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5624) Aborting regionserver when splitting region, may cause daughter region not assigned by ServerShutdownHandler.

2012-03-22 Thread chunhui shen (Created) (JIRA)
Aborting regionserver when splitting region, may cause daughter region not 
assigned by ServerShutdownHandler.
-

 Key: HBASE-5624
 URL: https://issues.apache.org/jira/browse/HBASE-5624
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen


If one region is splitting when regionserver is stoping.
The following code may executed in SplitTransaction#openDaughters.
{code}
// TODO: Is this check needed here?
if (stopped || stopping) {
  // add 2nd daughter first (see HBASE-4335)
  MetaEditor.addDaughter(server.getCatalogTracker(),
  b.getRegionInfo(), null);
  MetaEditor.addDaughter(server.getCatalogTracker(),
  a.getRegionInfo(), null);
  LOG.info(Not opening daughters  +
  b.getRegionInfo().getRegionNameAsString() +
   and  +
  a.getRegionInfo().getRegionNameAsString() +
   because stopping= + stopping + , stopped= + stopped);
} 
{code}

So, for the two daughter regions, their location are both null in .META.

When ServerShutdownHandler process the dead server, it will not assign these 
two daughter regions since their location(info:server) are null in .META. by 
MetaReader.getServerUserRegions().


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-12 Thread chunhui shen (Created) (JIRA)
Multi concurrent flushcache() for one region could cause data loss
--

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen


We could call HRegion#flushcache() concurrently now through 
HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
However, we find if HRegion#internalFlushcache() is called concurrently by 
multi thread, HRegion.memstoreSize will be calculated wrong.
At the end of HRegion#internalFlushcache(), we will do 
this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative 
and prevent next flush if we close this region.

Logs in RS for region e9d827913a056e696c39bc569ea3

2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Started memstore flush for 
writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
f99f., current region memstore size 128.0m
2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
memsize=59.6m, filesize=31.2m
2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Started memstore flush for 
writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
f99f., current region memstore size 134.8m
2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
memsize=68.5m, filesize=26.6m
2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~128.1m for region 
writetest1,,1331454657410.e9d827913a
056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
requested=false
2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Started memstore flush for 
writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
f99f., current region memstore size 6.8m
2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
memsize=3.1m, filesize=1.6m
2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
memsize=3.6m, filesize=1.4m
2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~134.8m for region 
writetest1,,1331454657410.e9d827913a
056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
requested=true
2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
memsize=47.4k, filesize=25.6k
2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
memsize=47.8k, filesize=19.3k
2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~6.8m for region 
writetest1,,1331454657410.e9d827913a05
6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId

2012-03-11 Thread chunhui shen (Created) (JIRA)
HRegionInfo#compareTo add the comparison of regionId


 Key: HBASE-5563
 URL: https://issues.apache.org/jira/browse/HBASE-5563
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen


In the one region multi assigned case,  we could find that two regions have the 
same table name, same startKey, same endKey, and different regionId, so these 
two regions are same in TreeMap but different in HashMap.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5528) Retry splitting log if failed in the process of ServerShutdownHandler, and abort master when retries exhausted

2012-03-05 Thread chunhui shen (Created) (JIRA)
Retry splitting log if failed in the process of ServerShutdownHandler, and 
abort master when retries exhausted
--

 Key: HBASE-5528
 URL: https://issues.apache.org/jira/browse/HBASE-5528
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5528.patch

We will retry splitting log if failed in splitLogAfterStartup when master 
starts.
However, there is no retry for failed splitting log in the process of 
ServerShutdownHandler.

Also, if we finally failed to split log, we should abort master even if 
filesystem is ok to prevent data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5501) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-29 Thread chunhui shen (Created) (JIRA)
Handle potential data loss due to concurrent processing of processFaileOver and 
ServerShutdownHandler
-

 Key: HBASE-5501
 URL: https://issues.apache.org/jira/browse/HBASE-5501
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen


In a live cluster, we do the following step
1.kill the master;
1.start the master, and master is initializing;
3.master complete splitLog
4.kill the META server
5.master start assigning ROOT and META
6.Now meta region data will loss since we may assign meta region before SSH 
finish split log for dead META server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5454) Refuse operations from Admin befor master is initialized

2012-02-22 Thread chunhui shen (Created) (JIRA)
Refuse operations from Admin befor master is initialized


 Key: HBASE-5454
 URL: https://issues.apache.org/jira/browse/HBASE-5454
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen


In our testing environment,
When master is initializing, we found conflict problems between 
master#assignAllUserRegions and EnableTable event, causing assigning region 
throw exception so that master abort itself.

We think we'd better refuse operations from Admin, such as CreateTable, 
EnableTable,etc, It could reduce error.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)

2012-02-16 Thread chunhui shen (Created) (JIRA)
StartupBulkAssigner would cause a lot of timeout on RIT when assigning large 
numbers of regions (timeout = 3 mins)
--

 Key: HBASE-5422
 URL: https://issues.apache.org/jira/browse/HBASE-5422
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen


In our produce environment
We find a lot of timeout on RIT when cluster up, there are about 7w regions in 
the cluster( 25 regionservers ).

First, we could see the following log:(See the region 
33cf229845b1009aa8a3f7b0f85c9bd0)
master's log
2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x348f4a94723da5 Async create of unassigned node for 
33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 
2012-02-13 18:07:42,560 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: 
rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
state=OFFLINE, ts=1329127661409, 
server=r03f11025.yh.aliyun.com,60020,1329127549907 
2012-02-13 18:07:42,996 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: 
rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
state=OFFLINE, ts=1329127661409 
2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out: 
item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
state=PENDING_OPEN, ts=1329127662996
2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been PENDING_OPEN for too long, reassigning 
region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, 
server=r03f11025.yh.aliyun.com,60020,1329127549907, 
region=33cf229845b1009aa8a3f7b0f85c9bd0 
2012-02-13 18:38:07,310 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 
2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x348f4a94723da5 Deleting existing unassigned node for 
33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 
2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 
33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 
2012-02-13 18:38:07,573 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on 
r03f11025.yh.aliyun.com,60020,1329127549907 
2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so 
generated a random one; 
hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, 
dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) 
available servers 
2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to 
r01b05043.yh.aliyun.com,60020,1329127549041 
2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out: 
item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
state=PENDING_OPEN, ts=1329132528086 
2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been PENDING_OPEN for too long, reassigning 
region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 



Regionserver's log
2012-02-13 18:07:43,537 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
2012-02-13 18:11:16,560 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open 
of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 





Through the RS's log, we could find it is larger than 3mins from receive 
openRegion request to start processing openRegion, causing timeout on RIT in 
master for the region.

Let's see the code of StartupBulkAssigner, we could find regionPlans are not 
added when assigning regions, therefore, when one region opened, it will not 
updateTimers of other regions whose destination is the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Created] (HBASE-5423) Regionserver may block on waitOnAllRegionsToClose when aborting

2012-02-16 Thread chunhui shen (Created) (JIRA)
Regionserver may block on waitOnAllRegionsToClose when aborting
---

 Key: HBASE-5423
 URL: https://issues.apache.org/jira/browse/HBASE-5423
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen


If closeRegion throws any exception (It would be caused by FS ) when RS is 
aborting, 
RS will block forever on waitOnAllRegionsToClose().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region is assigned before completing split log, it would cause data loss

2012-01-10 Thread chunhui shen (Created) (JIRA)
Concurrent processing of processFaileOver and ServerShutdownHandler  may cause 
region is assigned before completing split log, it would cause data loss
---

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen


If master's processing its failover and ServerShutdownHandler's processing 
happen concurrently, it may appear following  case.
1.master completing splitLogAfterStartup()
2.RegionserverA restarts, and ServerShutdownHandler is processing.
3.master starts to rebuildUserRegions, and RegionserverA is considered as dead 
server.
4.master starts to assign regions of RegionserverA because it is a dead server 
by step3.

However, when doing step4(assigning region), ServerShutdownHandler may be doing 
split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5165) Concurrent processing of DeleteTableHandler and ServerShutdownHandler may cause deleted region to assign again

2012-01-09 Thread chunhui shen (Created) (JIRA)
Concurrent processing of DeleteTableHandler and ServerShutdownHandler may cause 
deleted region to assign again
--

 Key: HBASE-5165
 URL: https://issues.apache.org/jira/browse/HBASE-5165
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: chunhui shen


Concurrent processing of DeleteTableHandler and ServerShutdownHandler may cause 
following situation
1.Table has already be disabled.
2.ServerShutdownHandler is doing MetaReader.getServerUserRegions.
3.When step2 is processing or is completed just now, DeleteTableHandler starts 
to delete region(Remove region from META and Delete region from FS)
4.DeleteTableHandler set table enabled.
4.ServerShutdownHandler is starting to assign region which is alread deleted by 
DeleteTableHandler.

The result of above operations is producing an invalid record in .META.  and 
can't be fixed by hbck 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-03 Thread chunhui shen (Created) (JIRA)
MajorCompaction may affect scan's correctness
-

 Key: HBASE-5121
 URL: https://issues.apache.org/jira/browse/HBASE-5121
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen


In our test, there are two families' keyvalue for one row.

But we could find a infrequent problem when doing scan's next if 
majorCompaction happens concurrently.
In the client's two continuous doing scan.next():
1.First time, scan's next returns the result where family A is null.
2.Second time, scan's next returns the result where family B is null.
The two next()'s result have the same row.

If there are more families, I think the scenario will be more strange...

We find the reason is that storescanner.peek() is changed after majorCompaction 
if there are delete type KeyValue.
This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap 
is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5100) Rollback of split would cause closed region to opened

2011-12-27 Thread chunhui shen (Created) (JIRA)
Rollback of split would cause closed region to opened 
--

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen


If master sending close region to rs and region's split transaction 
concurrently happen,
it may cause closed region to opened. 

See the detailed code in SplitTransaction#createDaughters
{code}
ListStoreFile hstoreFilesToSplit = null;
try{
  hstoreFilesToSplit = this.parent.close(false);
  if (hstoreFilesToSplit == null) {
// The region was closed by a concurrent thread.  We can't continue
// with the split, instead we must just abandon the split.  If we
// reopen or split this could cause problems because the region has
// probably already been moved to a different server, or is in the
// process of moving to a different server.
throw new IOException(Failed to close region: already closed by  +
  another thread);
  }
} finally {
  this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
}
{code}

when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
this.parent.initialize();

Although this region is not onlined in the regionserver, it may bring some 
potential problem.

For example, in our environment, the closed parent region is rolled back 
sucessfully , and then starting compaction and split again.

The parent region is f892dd6107b6b4130199582abc78e9c1

master log
{code}
2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
 src=dw87.kgb.sqa.cm4,60020,1324827866085, 
dest=dw80.kgb.sqa.cm4,60020,1324827865780
2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 (offlining)
2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, 
load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region 
writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling new unassigned node: 
/hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
(region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
 server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSING, 
server=dw87.kgb.sqa.cm4,60020,1324827866085, 
region=f892dd6107b6b4130199582abc78e9c1
2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSED, 
server=dw87.kgb.sqa.cm4,60020,1324827866085, 
region=f892dd6107b6b4130199582abc78e9c1
2011-12-26 00:24:45,349 DEBUG 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
event for f892dd6107b6b4130199582abc78e9c1
2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 state=CLOSED, ts=1324830285347
2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
region=f892dd6107b6b4130199582abc78e9c1
2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Found an existing plan for 
writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780, 
load=(requests=0, regions=1, usedHeap=0, maxHeap=0)
2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Using pre-existing plan for region 
writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.;
 

[jira] [Created] (HBASE-5020) MetaReader#fullScan doesn't stop scanning when vistor returns false in 0.90 version

2011-12-13 Thread chunhui shen (Created) (JIRA)
MetaReader#fullScan doesn't  stop scanning when vistor returns false in 0.90 
version


 Key: HBASE-5020
 URL: https://issues.apache.org/jira/browse/HBASE-5020
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
 Attachments: hbase-5020.patch

In current 0.90 code,
{code}
 public static void fullScan(CatalogTracker catalogTracker,
  final Visitor visitor, final byte [] startrow)
  throws IOException {
HRegionInterface metaServer =
  catalogTracker.waitForMetaServerConnectionDefault();
Scan scan = new Scan();
if (startrow != null) scan.setStartRow(startrow);
scan.addFamily(HConstants.CATALOG_FAMILY);
long scannerid = metaServer.openScanner(
HRegionInfo.FIRST_META_REGIONINFO.getRegionName(), scan);
try {
  Result data;
  while((data = metaServer.next(scannerid)) != null) {
if (!data.isEmpty()) visitor.visit(data);
  }
} finally {
  metaServer.close(scannerid);
}
return;
  }
{code}

If visitor.visit(data) return false, the scan will not stop;
However, it is not the same as the description of Visitor
{code}
public interface Visitor {
/**
 * Visit the catalog table row.
 * @param r A row from catalog table
 * @return True if we are to proceed scanning the table, else false if
 * we are to stop now.
 */
public boolean visit(final Result r) throws IOException;
  }
{code}


I think it is a miss, and trunk doesn't exist this hole.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4988) MetaServer crash cause all splitting regionserver abort

2011-12-08 Thread chunhui shen (Created) (JIRA)
MetaServer crash cause all splitting regionserver abort
---

 Key: HBASE-4988
 URL: https://issues.apache.org/jira/browse/HBASE-4988
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen


If metaserver crash now,
All the splitting regionserver will abort theirself.
Becasue the code
{code}
this.journal.add(JournalEntry.PONR);
MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
{code}
If the JournalEntry is PONR, split's roll back will abort itselef.

It is terrible in huge putting environment when metaserver crash


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4899) Region would be assigned twice easily with continually killing server and moving region in testing environment

2011-11-29 Thread chunhui shen (Created) (JIRA)
Region would be assigned twice easily with continually  killing server and 
moving region in testing environment
---

 Key: HBASE-4899
 URL: https://issues.apache.org/jira/browse/HBASE-4899
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen


Before assigning region in ServerShutdownHandler#process, it will check whether 
region is in RIT,
however, this checking doesn't work as the excepted in the following case:
1.move region A from server B to server C
2.kill server B
3.start server B immediately

Let's see what happen in the code for the above case
{code}
for step1:
1.1 server B close the region A,
1.2 master setOffline for region 
A,(AssignmentManager#setOffline:this.regions.remove(regionInfo))
1.3 server C start to open region A.(Not completed)
for step3:
master ServerShutdownHandler#process() for server B
{
..
splitlog()
...
ListRegionState regionsInTransition =
this.services.getAssignmentManager()
.processServerShutdown(this.serverName);
...
Skip regions that were in transition unless CLOSING or PENDING_CLOSE
...
assign region
}

In fact, when running 
ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName),
 region A is in RIT (step1.3 not completed), but the return ListRegionState 
regionsInTransition doesn't contain it, because region A has removed from 
AssignmentManager.regions by AssignmentManager#setOffline in step 1.2
Therefore, region A will be assigned twice.
{code}

Actually, one server killed and started twice will also easily cause region 
assigned twice.
Exclude the above reason, another probability : 
when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions 
,region is included which is in RIT now.
But after completing MetaReader.getServerUserRegions, the region has been 
opened in other server and is not in RIT now.

In our testing environment where balancing,moving and killing are executed 
periodly, assigning region twice often happens, and it is hateful because it 
will affect other test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss

2011-11-27 Thread chunhui shen (Created) (JIRA)
Region is on service before completing openRegionHanlder, may cause data loss
-

 Key: HBASE-4880
 URL: https://issues.apache.org/jira/browse/HBASE-4880
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen


OpenRegionHandler in regionserver is processed as the following steps:
{code}
1.openregion()(Through it, closed = false, closing = false)
2.addToOnlineRegions(region)
3.update .meta. table 
4.update ZK's node state to RS_ZK_REGION_OPEND
{code}
We can find that region is on service before Step 4.
It means client could put data to this region after step 3.
What will happen if step 4 is failed processing?
It will execute OpenRegionHandler#cleanupFailedOpen which will do closing 
region, and master assign this region to another regionserver.
If closing region is failed, the data which is put between step 3 and step 4 
may loss, because the region has been opend on another regionserver and be put 
new data. Therefore, it may not be recoverd through replayRecoveredEdit() 
because the edit's LogSeqId is smaller than current region SeqId.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

2011-11-27 Thread chunhui shen (Created) (JIRA)
Unhealthy region is on service caused by rollback of region splitting
-

 Key: HBASE-4881
 URL: https://issues.apache.org/jira/browse/HBASE-4881
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen


If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
It will be rollback as the following steps:
{code}
1.case CLOSED_PARENT_REGION:
  this.parent.initialize();
break;
2.case CREATE_SPLIT_DIR:
this.parent.writestate.writesEnabled = true;
cleanupSplitDir(fs, this.splitdir);
break;
3.case SET_SPLITTING_IN_ZK:
if (server != null  server.getZooKeeper() != null) {
  cleanZK(server, this.parent.getRegionInfo());
}
break;
{code}
If this.parent.initialize() throws IOException in step 1,
If check filesystem is ok. it will do nothing.
However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4878) Master crash when spliting hlog may cause data loss

2011-11-26 Thread chunhui shen (Created) (JIRA)
Master crash when spliting hlog may cause data loss
---

 Key: HBASE-4878
 URL: https://issues.apache.org/jira/browse/HBASE-4878
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen


Let's see the code of HlogSplitter#splitLog(final FileStatus[] logfiles)
{code}
private ListPath splitLog(final FileStatus[] logfiles) throws IOException {
 try {
  for (FileStatus log : logfiles) {
  parseHLog(in, logPath, entryBuffers, fs, conf, skipErrors);
 }
 archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
 } finally {
  status.setStatus(Finishing writing output logs and closing down.);
  splits = outputSink.finishWritingAndClose();
}
}
{code}


If master is killed, after finishing archiveLogs(srcDir, corruptedLogs, 
processedLogs, oldLogDir, fs, conf), 
but before finishing splits = outputSink.finishWritingAndClose();
Log date would loss!


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4862) Split hlog and open region currently happend may cause data loss

2011-11-23 Thread chunhui shen (Created) (JIRA)
Split hlog and open region currently happend may cause data loss


 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen


Case Description:
1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
and is appending log entry
2.Regionserver is opening region A now, and in the process 
replayRecoveredEditsIfAny() ,it will delete the file region 
A/recoverd.edits/123456 
3.Split hlog thread catches the io exception, and stop parse this log file 
and if skipError = true , add it to the corrupt logsHowever, data in other 
regions in this log file will loss 
4.Or if skipError = false, it will check filesystem.Of course, the file system 
is ok , and it only prints a error log, continue assigning regions. Therefore, 
data in other log files will also loss!!

The case may happen in the following:
1.Move region from server A to server B
2.kill server A and Server B
3.restart server A and Server B

We could prevent this exception throuth forbiding deleting  recover.edits file 
which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira